Creating a pagination index in CouchDB?

<strong>I'm trying to create a pagination index view in CouchDB that lists the doc._id for every Nth document found.</strong>

I wrote the following map function, but the <strong>pageIndex</strong> variable doesn't reliably start at 1 - in fact it seems to change arbitrarily depending on the emitted value or the index length (e.g. 50, 55, 10, 25 - all start with a different file, though I seem to get the correct number of files emitted).

function(doc) { if (doc.type == 'log') { if (!pageIndex || pageIndex > 50) { pageIndex = 1; emit(doc.timestamp, null); } pageIndex++; } }

What am I doing wrong here? How would a CouchDB expert build this view?

Note that I don't want to use the "startkey + count + 1" method that's been mentioned elsewhere, since I'd like to be able to jump to a particular page or the last page (user expectations and all), I'd like to have a friendly "?page=5" URI instead of "?startkey=348ca1829328edefe3c5b38b3a1f36d1e988084b", and I'd rather CouchDB did this work instead of bulking up my application, if I can help it.

Thanks!

Answer1:

View functions (map and reduce) are purely functional. Side-effects such as setting a global variable are not supported. (When you move your application to BigCouch, how could multiple independent servers with arbitrary subsets of the data know what pageIndex is?)

Therefore the answer will have to involve a traditional map function, perhaps keyed by timestamp.

function(doc) { if (doc.type == 'log') { emit(doc.timestamp, null); } }

How can you get every 50th document? The simplest way is to add a skip=0 or skip=50, or skip=100 parameter. However that is not ideal (see below).

A way to pre-fetch the exact IDs of every 50th document is a _list function which only outputs every 50th row. (In practice you could use Mustache.JS or another template library to build HTML.)

function() { var ddoc = this, pageIndex = 0, row; send("["); while(row = getRow()) { if(pageIndex % 50 == 0) { send(JSON.stringify(row)); } pageIndex += 1; } send("]"); }

This will work for many situations, however it is not perfect. Here are some considerations I am thinking--not showstoppers necessarily, but it depends on your specific situation.

There is a reason the pretty URLs are discouraged. What does it mean if I load page 1, then a bunch of documents are inserted within the first 50, and then I click to page 2? If the data is changing a lot, there is no perfect user experience, the user must somehow feel the data changing.

The skip parameter and example _list function have the same problem: they do not scale. With skip you are still touching <strong>every</strong> row in the view starting from the beginning: finding it in the database file, reading it from disk, and then ignoring it, over and over, row by row, until you hit the skip value. For small values that's quite convenient but since you are grouping pages into sets of 50, I have to imagine that you will have thousands or more rows. That could make page views slow as the database is spinning its wheels most of the time.

The _list example has a similar problem, however you front-load all the work, running through the entire view from start to finish, and (presumably) sending the relevant document IDs to the client so it can quickly jump around the pages. But with hundreds of thousands of documents (you call them "log" so I assume you will have a ton) that will be an extremely slow query which is not cached.

In summary, for small data sets, you can get away with the page=1, page=2 form however you will bump into problems as your data set gets big. With the release of BigCouch, CouchDB is even better for log storage and analysis so (if that is what you are doing) you will definitely want to consider how high to scale.

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:Creating a pagination index in CouchDB?