I'm working on a POC to showcase how Cassandra works. I took Digg as an example. I wanted to create a data model that'll let me:
1) Add links 2) Add a link to a user favorite list. 3) Attached predetermined tags to links
I came up with two Column Families:<ol> <li>
<li>url is the key
<li>id (a generated uuid)</li>
<li>user (who added it)</li>
<li>favCount (no of users who favorited the link)</li>
<li>upCount (no of users who liked it)</li>
<li>downCount (no of users who disliked it)</li>
<li>user is the key
<li>id (as many ids as the user has favorited)</li>
This works fine for requirements #1 and #2 above, but when I come to #3 it gets trickier. I can add tags like 'java', 'languages', 'architecture' as column names with empty values in the Links column family. But querying will take a long time, let's say if I were to find out all the links that were tagged under 'java'.
Can anyone throw some ideas of how this can be implemented.
If I'm not clear with the question, please let me know.
You could create a secondary index, i.e. a column family keyed on tag. Each row contains all the links for that particular tag. Note that this may result in very wide rows (i.e. with many columns) each of which will be stored on a single cassandra node. You might want a scheme to split these up if they get very large.
cassandra secondary index