60111

StormCrawler: Timeout waiting for connection from pool

Question:

We are consistently getting the following error when we increase either the number of threads or the number of executors for Fetcher bolt.

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286) ~[stormjar.jar:?] at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263) ~[stormjar.jar:?] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190) ~[stormjar.jar:?] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[stormjar.jar:?] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[stormjar.jar:?] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71) ~[stormjar.jar:?] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:220) ~[stormjar.jar:?] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164) ~[stormjar.jar:?] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139) ~[stormjar.jar:?] at com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.getProtocolOutput(HttpProtocol.java:206) ~[stormjar.jar:?]

Is this due to a resource leak or some hard limit on the size of the http thread pool? If it is about the thread pool, is there any way to increase the pool size?

Answer1:

There is a max number of connections for the pool set in <a href="https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/protocol/httpclient/HttpProtocol.java#L93" rel="nofollow">HttpProtocol</a>, which is the number of threads used (fetcher.threads.number). Since the pool is static, it is used by all the executors on the same worker. I'd recommend that you use one FetcherBolt instance per worker, it will then be the same value as fetcher.threads.number and you won't have this problem.

Alternatively, you could give the <a href="https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/protocol/okhttp/HttpProtocol.java" rel="nofollow">okhttp protocol</a> a try. It is more robust for open and large-scale crawls. See <a href="https://github.com/DigitalPebble/storm-crawler/wiki/Protocols" rel="nofollow">WIKI page on protocols</a> for a feature comparison.

Recommend

  • Encoding issue with JLine
  • How to specify Data type in WSDL when exposed from CXF
  • Complex trait requirements on struct
  • How to resolve dependencies from one gradle project to another gradle project in my Eclipse workspac
  • integrate POCO library in android ndk
  • in batch how do i use taskkill properly
  • Sybase Error Implicit Conversion from datatype 'VARCHAR' to 'INT' not allowed
  • Are Richfaces and Primefaces compatible with each other?
  • “mvn clean generate-source” could not resolve dependencies
  • In Java, how can I construct a File from a resource?
  • I am receiving HibernateException “No Hibernate Session bound to thread, and configuration does not
  • Getting different value with placeholder over CPU/GPU
  • Why must we declare a variable name when adding a method to a struct in Golang?
  • get path to groovy source file at runtime
  • Struts 2 TextField Tag with an attribute and no value
  • Android Studio Can't Find tools.jar
  • URLConnection doesn't work since API 10 and higher?
  • Question about instantiating object
  • 550 Access denied - Invalid HELO name
  • GAE: Way to get reference to an HttpSession from its ID?
  • Sonar maven jacoco code coverage for Multimodule project
  • Access variable of ScriptContext using Nashorn JavaScript Engine (Java 8)
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • Marklogic : Query response time is very high
  • htaccess add www if not subdomain, if subdomain remove www
  • Parsing a CSV string while ignoring commas inside the individual columns
  • Jetty 9 HashLoginService
  • Problem deserializing objects from cache on MyBatis 3/Java
  • MongoDb aggregation
  • azure media services - The request body is too large and exceeds the maximum permissible limit
  • JSON response opens as a file, but I can't access it with JavaScript
  • Unity3D & Android: Difference between “UnityMain” and “main” threads?
  • How to create a file in java without a extension
  • OpenGL ES texture problem, 4 duplicate columns and horizontal lines (Android)
  • Spring Data JPA custom method causing PropertyReferenceException
  • Accessing IRQ description array within a module and displaying action names
  • Cross-Platform Protobuf Serialization
  • How to format a variable of double type
  • coudnt use logback because of log4j
  • JaxB to read class hierarchy