3736

Repeating 503's messages when querying DBpedia

I'm conducting a series of queries to DBpedia SPARQL endpoint (from inside a loop). The code looks more or less like this:

for (String citySplit : citiesSplit) {
  RepositoryConnection conn = dbpediaEndpoint.getConnection();
  String sparqlQueryLat = " SELECT ?lat ?lon WHERE { "
                        + "<http://dbpedia.org/resource/" + citySplit.trim().replaceAll(" ", "_") + "> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat . "
                        + "<http://dbpedia.org/resource/" + citySplit.trim().replaceAll(" ", "_") + "> <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon ."
                        + "}";
  TupleQuery queryLat = conn.prepareTupleQuery(QueryLanguage.SPARQL, sparqlQueryLat);
  TupleQueryResult resultLat = queryLat.evaluate();
}    


The problem is that, after a few iterations, I get a 503 message:

httpclient.wire.header - << "HTTP/1.1 503 Service Temporarily Unavailable[\r][\n]"
(...)
org.openrdf.query.QueryInterruptedException
    at org.openrdf.http.client.HTTPClient.getTupleQueryResult(HTTPClient.java:1041)
    at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:438)
    at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:413)
    at org.openrdf.repository.http.HTTPTupleQuery.evaluate(HTTPTupleQuery.java:41)


If I understand correctly, this 503 message is from DBpedia. Am I right? The number of consecutive queries that manage to succeed is variable. Sometimes it runs for 13 seconds before getting the message, sometimes 15 minutes. In any case, I don't think this is normal. What could be happening?

Answer1:

The Accessing the DBpedia Data Set over the Web page of the DBpedia wiki says, in section 1.1. Public SPARQL Endpoint says:

Fair Use Policy: Please read this post for information about restrictions on the public DBpedia endpoint. These might also be usefull [sic]: 1, 2.

The linked post says that the public DBpedia SPARQL endpoint implements rate limiting.

The http://dbpedia.org/sparql endpoint has both rate limiting on the number of connections/sec you can make, as well as restrictions on resultset and query time, as per the following settings:

[SPARQL] ResultSetMaxRows = 2000 MaxQueryExecutionTime = 120 MaxQueryCostEstimationTime = 1500

These are in place to make sure that everyone has a equal chance to de-reference data from dbpedia.org, as well as to guard against badly written queries/robots.

I think that it is likely that you are hitting that limit.

Recommend

  • Why are Google Map markers showing up on Firefox by not on Chrome, Safari and Internet Explorer
  • Working with a severely limited interpreted language
  • Redis scan skipping keys
  • How to use CompletableFuture without risking a StackOverflowError?
  • How to use a decaying learning rate with an estimator in tensorflow?
  • WSO2 ESB 4.0.3 - Error installing Data Services feature from 4.0.* repositories
  • How to make http call with file in groovy to upload a file and build arguments
  • Can I call custom javascript from an R jupyter notebook
  • Group variable in cobol
  • 'include' of functions in groovy scripts
  • How do you keep a running instance for Google App Engine
  • It is possible use the same sql azure instance from two different cloud service of two different sub
  • Upper limits for fibonnacci
  • Application bar icon text length
  • Using android opencv apps without downloading opencv sdk manager
  • Opaque reference instead of PImpl. Is it possible?
  • Returning the auto incrementing value after an insert using slick
  • What is the use of a session store?
  • Retaining data after updating application
  • What's the syntax to inherit documentation from another indexer?
  • What is the correct way to synchronize a shared, static object in Java?
  • Sending cookie value via httpget but not getting the desired response
  • How to get data from **Realm database** using **date object**?
  • Is there some graphical way to create my own configuration file on SonarLint?
  • close() was never explicitly called on database
  • How VBA declared Volatility works
  • print() is showing quotation marks in results
  • Content-Length header not returned from Pylons response
  • Play WS (2.2.1): post/put large request
  • NHibernate Validation Localization with S#arp Architecture
  • Seeking advice on Jetty HttpClient Hang
  • How to access EntityManager inside Entity class in EJB3
  • How can I send an e-mail from a vbs script
  • Accessing IRQ description array within a module and displaying action names
  • Join two tables and save into third-sql
  • vba code to select only visible cells in specific column except heading
  • How to model a transition system with SPIN
  • ORA-29908: missing primary invocation for ancillary operator
  • Getting Messege Twice Using IMvxMessenger
  • Converting MP3 duration time