4673

Kafka as a data store for future events

Question:

I have a Kafka cluster which receives messages from a source based on data changes in that source. In some cases the messages are meant to be processed in the future. So I have 2 options:

<ol><li>Consume all messages and post messages that are meant for the future back to Kafka under a different topic (with the date in the topic name) and have a Storm topology that looks for topics with that date's name in it. This will ensure that messages are processed only on the day it's meant for.</li> <li>Store it in a separate DB and build a scheduler that reads messages and posts to Kafka only on that future date.</li> </ol>

Option 1 is easier to execute but my question is: Is Kafka a durable data store? And has anyone done this sort of eventing with Kafka? Are there any gaping holes in the design?

Answer1:

You can configure the amount of time your messages stay in Kafka (log.retention.hours).

But keep in mind that Kafka is meant to be used as a "real-time buffer" between your producers and your consumers, not as durable data store. I don't think Kafka+Storm would be the appropriate tool for your use case. Why not just write your messages in some distributed file system, and schedule a job (MapReduce, Spark...) to process those events?

Recommend

  • Error in Maven building?
  • what is use of Tuple.getStringByField(“ABC”) in Storm
  • How to Set spoutconfig from default setting?
  • sstableloader does not exit after successful data loading
  • Image Map-like Blackberry Control - CLDC Application
  • Odd Error Importing DOT files
  • making the background translucent
  • Kafka topic deletion not working
  • Opa: Iterating through stringmap and forming a new string based on it
  • Accessing music files into my application programatically
  • Exception handling as per java coding standards
  • MySql - get days remaining
  • Whats the right place for testhelper-classes? (phpunit/best practise)
  • Returning this from a constructor function in JS
  • SQL query to group by maximal sets of a column having inner consecutive distances below a threshold
  • What is the use of a session store?
  • How to get the date of next specified day of week
  • Activation Function choice for Neural network
  • How to use Windows Media Foundation with UWP without a topology
  • How can I replace the server in Web Component Tester
  • Android Studio Can't Find tools.jar
  • UWP/C# - Issue with AQS and USB Devices
  • Calling Worksheet functions from vba in foreign language versions of Excel
  • How can I extract results of aggregate queries in slick?
  • Needing to do .toArray() to get output of mongodb .find() on key name not value
  • Ensure fsync did its job
  • How do I access an unhandled exception in an MVC Error view?
  • Google Custom Search with transparent background
  • MongoDB in PHP using aggregate to group by _id is null not working
  • How do I change content of ComboFieldEditor?
  • Fetching methods from BroadcastReceiver to update UI
  • Does CUDA 5 support STL or THRUST inside the device code?
  • vba code to select only visible cells in specific column except heading
  • Weird JavaScript statement, what does it mean?
  • Do I've to free mysql result after storing it?
  • Proper folder structure for lots of source files
  • SQL merge duplicate rows and join values that are different
  • Why can't I rebase on to an ancestor of source changesets if on a different branch?
  • Turn off referential integrity in Derby? is it possible?
  • To Get the radio button value in ruby on rails