65918

How does this while loop exit?

Question:

So, how does this code exit the while statement when the thread is started? (Please do not consider indentation)

class ThreadUrl(threading.Thread): """Threaded Url Grab""" def __init__(self, queue, out_queue): threading.Thread.__init__(self) self.queue = queue self.out_queue = out_queue def run(self): while True: #grabs host from queue host = self.queue.get() #grabs urls of hosts and then grabs chunk of webpage url = urllib2.urlopen(host) chunk = url.read() #place chunk into out queue self.out_queue.put(chunk) #signals to queue job is done self.queue.task_done()

** EDIT <em>*</em>

The code that starts the thread:

def main(): #spawn a pool of threads, and pass them queue instance for i in range(5): t = ThreadUrl(queue) t.setDaemon(True) t.start() queue.join()

Answer1:

It doesn't have to exit the while statement for the code to terminate. All that is happening here is that the thread has consumed everything in the queue at which point queue.join() returns.

As soon as the call to queue.join() in the main code returns the main code will exit and because you marked the thread as a daemon the entire application will exit and your background thread will be killed.

Answer2:

The quick answer: it doesn't, unless an exception is raised anywhere, which depends on the functions/methods called in run.

Of course, there is the possibility, that your thread is suspended/stopped from another thread, which effectively terminates your while loop.

Answer3:

Your code will only breaks if an exception occurs during the execution of the content of the while True loop.... not the better way to exit from a thread, but it could work.

If you want to exit properly from your thread, try to replace the while True with something like while self.continue_loop:

class ThreadUrl(threading.Thread): """Threaded Url Grab""" def __init__(self, queue, out_queue): threading.Thread.__init__(self) self.queue = queue self.out_queue = out_queue self.continue_loop = True def run(self): while self.continue_loop: #grabs host from queue host = self.queue.get() #grabs urls of hosts and then grabs chunk of webpage url = urllib2.urlopen(host) chunk = url.read() #place chunk into out queue self.out_queue.put(chunk) #signals to queue job is done self.queue.task_done()

And to start/stop the threads :

def main(): #spawn a pool of threads, and pass them queue instance threads = [] for i in range(5): t = ThreadUrl(queue, out_queue) t.setDaemon(True) t.start() threads.append(t) for t in threads: t.continue_loop = False t.join() queue.join()

Answer4:

You could pass in block=False or timeout=5 to your self.queue.get() method. This will raise an Queue.Empty exception if no items remains in the queue. Otherwise AFAIK, the self.queue.get() will block the whole loop so even additional break attempts further on would not be reached.

def run(self): while True: #grabs host from queue try: host = self.queue.get(block=False) except Queue.Empty, ex: break #grabs urls of hosts and then grabs chunk of webpage url = urllib2.urlopen(host) chunk = url.read() #place chunk into out queue self.out_queue.put(chunk) #signals to queue job is done self.queue.task_done()

Another approach would be to put a "Stop" flag in the queue after all your other items have been added. Then in the thread put a check for this stop flag and break if found.

Eg.

host = self.queue.get() if host == 'STOP': #Still need to signal that the task is done, else your queue join() will wait forever self.queue.task_done() break

Recommend

  • How to stop daemon thread?
  • How does a python process know when to exit?
  • python- youtube. Get url video list
  • force xpath to return a string lxml
  • Make Urllib2 move through pages
  • Beautiful Soup throws `IndexError`
  • Python - Urllib2 Wait for page to load to scrape data
  • plone.memoize cache depending on function's return value
  • CGI FieldStorage() or Database? Web-Scraping
  • Extract specific columns from a given webpage
  • How to check if the url redirect to another url using Python
  • imgurpython.helpers.error.ImgurClientRateLimitError: Rate-limit exceeded
  • Which is the best way to get the tags from Youtube with python?
  • POST request via urllib/urllib2?
  • Permission to overwrite files
  • How to remove default command line arguments provided by Eclipse?
  • Removing html tags using python?
  • Is there a way to get the process ID of a console program I've just started in the background?
  • Beautiful Soup findAll doesn't find them all
  • How to best manage SMTP clients
  • Unicorn and Rails eat up 2x MySQL connections
  • Simple Distributed Erlang
  • Xmonad multiple submap key combos
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • Could not find rake using whenever rails
  • Python CGI os.system causing malformed header
  • Q promise. Difference between .when and .then
  • Fetching methods from BroadcastReceiver to update UI
  • Symfony2: How to get request parameter
  • what is the difference between the asp.net mvc application and asp.net web application
  • Jquery - Jquery Wysiwyg return html as a string
  • Akka Routing: Reply's send to router ends up as dead letters
  • GridView Sorting works once only
  • Confusion with PayPal's monthly billing cycle
  • SQL merge duplicate rows and join values that are different
  • WPF Applying a trigger on binding failure
  • LevelDB C iterator
  • unknown Exception android
  • Can't mass-assign protected attributes when import data from csv file
  • Checking variable from a different class in C#