Kafka as a data store for future events


I have a Kafka cluster which receives messages from a source based on data changes in that source. In some cases the messages are meant to be processed in the future. So I have 2 options:

<ol><li>Consume all messages and post messages that are meant for the future back to Kafka under a different topic (with the date in the topic name) and have a Storm topology that looks for topics with that date's name in it. This will ensure that messages are processed only on the day it's meant for.</li> <li>Store it in a separate DB and build a scheduler that reads messages and posts to Kafka only on that future date.</li> </ol>

Option 1 is easier to execute but my question is: Is Kafka a durable data store? And has anyone done this sort of eventing with Kafka? Are there any gaping holes in the design?


You can configure the amount of time your messages stay in Kafka (log.retention.hours).

But keep in mind that Kafka is meant to be used as a "real-time buffer" between your producers and your consumers, not as durable data store. I don't think Kafka+Storm would be the appropriate tool for your use case. Why not just write your messages in some distributed file system, and schedule a job (MapReduce, Spark...) to process those events?


  • Error in Maven building?
  • what is use of Tuple.getStringByField(“ABC”) in Storm
  • How to Set spoutconfig from default setting?
  • sstableloader does not exit after successful data loading
  • Image Map-like Blackberry Control - CLDC Application
  • Odd Error Importing DOT files
  • making the background translucent
  • Kafka topic deletion not working
  • Opa: Iterating through stringmap and forming a new string based on it
  • Accessing music files into my application programatically
  • Exception handling as per java coding standards
  • MySql - get days remaining
  • Whats the right place for testhelper-classes? (phpunit/best practise)
  • Returning this from a constructor function in JS
  • SQL query to group by maximal sets of a column having inner consecutive distances below a threshold
  • What is the use of a session store?
  • How to get the date of next specified day of week
  • Activation Function choice for Neural network
  • How to use Windows Media Foundation with UWP without a topology
  • How can I replace the server in Web Component Tester
  • Android Studio Can't Find tools.jar
  • UWP/C# - Issue with AQS and USB Devices
  • Calling Worksheet functions from vba in foreign language versions of Excel
  • How can I extract results of aggregate queries in slick?
  • Needing to do .toArray() to get output of mongodb .find() on key name not value
  • Ensure fsync did its job
  • How do I access an unhandled exception in an MVC Error view?
  • Google Custom Search with transparent background
  • MongoDB in PHP using aggregate to group by _id is null not working
  • How do I change content of ComboFieldEditor?
  • Fetching methods from BroadcastReceiver to update UI
  • Does CUDA 5 support STL or THRUST inside the device code?
  • vba code to select only visible cells in specific column except heading
  • Weird JavaScript statement, what does it mean?
  • Do I've to free mysql result after storing it?
  • Proper folder structure for lots of source files
  • SQL merge duplicate rows and join values that are different
  • Why can't I rebase on to an ancestor of source changesets if on a different branch?
  • Turn off referential integrity in Derby? is it possible?
  • To Get the radio button value in ruby on rails