37882

python-rq worker closes automatically

Question:

I am implementing python-rq to pass domains in a queue and scrape it using Beautiful Soup. So i am running multiple workers to get the job done. I started 22 workers as of now, and all the 22 workers is registered in the rq dashboard. But after some time the worker stops by itself and is not getting displayed in dashboard. But in webmin, it displays all workers as running. The speed of crawling has also decreased i.e. the workers are not running. I tried running the worker using supervisor and nohup. In both the cases the workers stops by itself.

What is the reason for this? Why does workers stops by itself? And how many workers can we start in a single server?

Along with that, whenever a worker is unregistered from the rq dashboard, the failed count increases. I don't understand why?

Please help me with this. Thank You

Answer1:

Okay I figured out the problem. It was because of worker timeout.

try: --my code goes here-- except Exception, ex: self.error += 1 with open("error.txt", "a") as myfile: myfile.write('\n%s' % sys.exc_info()[0] + "{}".format(self.url)) pass

So according to my code, the next domain is dequeued if 200 url(s) is fetched from each domain. But for some domains there were insufficient number of urls for the condition to terminate (like only 1 or 2 urls).

Since the code catches all the exception and appends to error.txt file. Even the rq timeout exception rq.timeouts.JobTimeoutException was caught and was appended to the file. Thus making the worker to wait for x amount of time, which leads to termination of the worker.

Recommend

  • Replace existing canonical tag with javascript or jquery
  • Deleting Desktop Shortcuts Associated With Network Drives?
  • Running a script from a script in android
  • Running a script from a script in android
  • The file size of jQuery
  • Running K Kestrel in the background
  • mysql how to find the total number of child rows with respect to a parent
  • android - animation by drawing bitmap is not smooth
  • Start a long-running program over SSH [closed]
  • Correctly executing bicubic resampling
  • Permissions error when using cli in Jboss wildfly and docker
  • How to deal with deep level granularization with XACML in enterprise application
  • Unpivot table in SQL Server
  • is OTP needed if my state does not change?
  • Linear gradient not applying in Webkit with d3 generated SVG
  • Unity 5.1 Animator Controller not transitioning
  • How to get file download speed (transfer rate) with php?
  • How to change default stop edit behavior in jtable
  • css background images not always displayed
  • Mercurial: Identify file name after rename
  • What is this strange character in chrome's resource css viewer?
  • Let a function return any type in C++ class
  • You tube videos are not playing
  • Microsoft Excel Pivot miscalculation in Sum for positive and negative numbers
  • Time out Error in send mail
  • CSS - Cannot get one spanned style to override another inherited style and align left
  • Rest Services conventions
  • Suppressing passwd when calling sqlplus from shell script
  • 550 Access denied - Invalid HELO name
  • pillow imaging ImportError
  • Check for zero lines output from command over SSH
  • Problems installing Yesod for Haskell
  • Unable to install Git-core+svn by MacPorts
  • Marklogic : Query response time is very high
  • AJAX Html Editor Extender upload image appearing blank
  • VSO Build — Response status code does not indicate success: 404 (Not Found)
  • ilmerge with a PFX file
  • Why is the size of this struct 32?
  • Window Size for Mac application
  • Menu Color Fade on Hover with Jquery