84970

Massive differences in speed with virtually identical XQuery (exist-db) interogations

Question:

I've been porting an SQL database over to exist over the last few weeks, and while I so far managed to get over all previous hindrances, I've now run into something for which neither the official documentation, searching online, nor common sense was able to help.

In short, I have a pretty big collection (about 90k entries, spread over 20 files), with most individual entries looking something like this (this is a massive simplification so I can get the point across):

<document> <document_id>Numerical Value</document_id> <page_id>Some other numerical value</page_id> </document>

I then pass a value through php, lets call it $val The strange part is that when I run the standard query

$p in collection("/db/folder_location")//documentset/document[xs:integer(document_id) eq $val]

No matter what value I pass, it returns all the results in a matter of seconds. If I slightly modify it, however, making it:

$p in collection("/db/folder_location")//documentset/document[xs:integer(page_id) eq $val]

It either takes over 30 seconds to return the values or simply stays locked in a running query and never returns anything. Of all the 30 queries I already converted, this is the only time I ran into this problem and could not find a workaround.

Answer1:

To address the query performance problem, I would suggest some changes to your query and/or the addition of a range index on the document_id and page_id elements.

Your query casts all document_id and page_id elements as xs:integer. This is an inefficient operation given a large dataset. Consider (a) removing this type casting, (b) reversing it (cast $val as xs:string), or (c) adding a range index on these two elements, with type="xs:integer". This latter option will let you remove the casting in your predicate (allowing you to change it to document[document_id eq $val] and document[page_id eq $val]), and the index should greatly speed the lookup.

To add a range index for your query, create a collection configuration document like this:

<collection xmlns="http://exist-db.org/collection-config/1.0"> <index xmlns:xs="http://www.w3.org/2001/XMLSchema"> <range> <create qname="document_id" type="xs:integer"/> <create qname="page_id" type="xs:integer"/> </range> </index> </collection>

If your data is in /db/folder_location, then store this document as collection.xconf in /db/system/config/db/folder_location. Then reindex your collection with xmldb:reindex("/db/system/config/db/folder_location"). As the documentation on range indexes states, with this index definition in place:

<blockquote>

indexes will be used automatically for general or value comparisons as well as string functions like fn:contains, fn:starts-with, fn:ends-with.

</blockquote>

For more on range indexes in eXist, see <a href="https://exist-db.org/exist/apps/doc/newrangeindex.xml" rel="nofollow">https://exist-db.org/exist/apps/doc/newrangeindex.xml</a>. For query optimization techniques, see <a href="https://exist-db.org/exist/apps/doc/tuning.xml" rel="nofollow">https://exist-db.org/exist/apps/doc/tuning.xml</a>. For indexes in general in eXist, see <a href="https://exist-db.org/exist/apps/doc/indexing.xml" rel="nofollow">https://exist-db.org/exist/apps/doc/indexing.xml</a>.

Recommend

  • How do I instantiate Lotus 123 Application
  • Loop Through TD Element in HTML document from inside Frame VBA Excel IE8 Automation
  • Insert folding marks on every page (wkhtmltopdf)
  • Unmarshalling works incorrectly: javax.xml.bind.UnmarshalException: unexpected element
  • Aggregating By Date in Mongodb
  • Advantage of 'one dimensional' hash over array in Perl
  • Selecting multiple elements with Selenium
  • How create references between elements in XML
  • Root element minOccurs or maxOccurs
  • What does a hyphen at end of a term mean
  • smarty nested if condition is not working properly?
  • Java : How to tint this PNG programmatically?
  • HttpListener.IsSupported is false on XP SP3
  • Programmatically access files in Document set in sharepoint using Javascript
  • Group list of tuples by item
  • IE11 throwing “SCRIPT1014: invalid character” where all other browsers work
  • Z3: Convert between FP and BitVector?
  • How can I extract results of aggregate queries in slick?
  • jQuery ready not fired after rails link_to is clicked
  • Date Conversion from yyyy-mm-dd to dd-mm-yyyy
  • Needing to do .toArray() to get output of mongodb .find() on key name not value
  • jQuery .attr() and value
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Counter field in MS Access, how to generate?
  • How would I use PHP exceptions to define a redirect?
  • Fill an image in a square container while keeping aspect ratio
  • Convert array of 8 bytes to signed long in C++
  • Statically linking a C++ library to a C# process using CLI or any other way
  • Adding custom controls to a full screen movie
  • Rearranging Cells in UITableView Bug & Saving Changes
  • SVN: Merging two branches together
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Load html files in TinyMce
  • Change div Background jquery
  • Turn off referential integrity in Derby? is it possible?
  • How does Linux kernel interrupt the application?
  • apache spark aggregate function using min value
  • Busy indicator not showing up in wpf window [duplicate]
  • Reading document lines to the user (python)
  • Why do underscore prefixed variables exist?