46502

Restore cassandra cluster data when acccidentally drop table

Question:

As you know, Cassandra cluster have replication to prevent data loss even if some node in the cluster down. But in the case that an admin accidentally drop a table with big amount of data, and that command had already executed by all the replica in cluster, is this means you lost that table and cannot restore it? Is there any suggestion to cope with this kind of disaster with short server down time?

Answer1:

From cassandra <a href="http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__automatic-backup-properties" rel="nofollow">docs</a>:

<blockquote>

auto_snapshot (Default: true ) Enable or disable whether a snapshot is taken of the data before keyspace truncation or dropping of tables. To prevent data loss, using the default setting is strongly advised. If you set to false, you will lose data on truncation or drop.

</blockquote>

Answer2:

If the administrator has been deleted the data and replicated in all the nodes it is difficult to recover the data without a consistent backup.

Maybe considering that the deletes in cassandra are not executed instantly you can recover the data. When you delete data, cassandra replace the data with a tombstone.The tombstone can then be propagated to replicas that missed the initial remove request.

See <a href="http://wiki.apache.org/cassandra/DistributedDeletes" rel="nofollow">http://wiki.apache.org/cassandra/DistributedDeletes</a>

Columns marked with a tombstone exist for a configured time period (defined by the gc_grace_seconds value set on the column family), and then are permanently deleted by the compaction process after that time has expired. The default value is 10 days.

Following the explanation in <a href="http://www.datastax.com/docs/1.0/dml/about_writes" rel="nofollow">About Deletes</a> maybe if you shutdown some of the nodes and wait until the compaction succeed and the data is completely delete from the SSTables and then turn on again the nodes the data could appear again. But this will only happen if you dont make periodical repair operations on the node.

I have never tried this before, it is only an idea that comes to me reading the cassandra documentation.

Recommend

  • Parameterized queries with the Python Cassandra Module
  • Laravel and redis scan
  • How implement LEFT or RIGHT JOIN using spark-cassandra-connector
  • Bulk Insert Failed “Bulk load data conversion error (truncation)”
  • Redirection Doesn't Work
  • String split with minimum size
  • ('Unable to complete the operation against any hosts', {})
  • Cassandra: Adding new column to the table
  • How to reset Cassandra superuser, when Cassandra does not know 'cassandra' default user?
  • How do I store unsigned integers in Cassandra?
  • Java Caching frameworks for maintaining huge data
  • Truncate a VARCHAR to specific length in Derby AUTOMATICALLY
  • Firebase, only get new children
  • Other than Linq to SQL does anything else consume INotifyPropertyChanging?
  • Suqueries in select clause with JPA
  • Azure table store snapshot/backup capability
  • How to disable all widgets inside Panel or inside Composite?
  • Security issues with PHP's Readfile method
  • Chrome doesn't support silverlight anymore? How to solve this?
  • The plugin 'org.apache.maven.plugins:maven-jboss-as-plugin' does not exist or no valid ver
  • jQuery show() function is not executed in Safari if submit handler returns true
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • Spray.io: When (not) to use non-blocking route handling?
  • Resize panoramic image to fixed size
  • Volusion's generic SQL folder, functionality
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Modifying destination and filename of gulp-svg-sprite
  • javascript inside java/jsp code
  • WinForms: two way TextBox problem
  • GridView Sorting works once only
  • How to disable jQuery.jplayer autoplay?
  • How do you join a server to an Active Directory (domain)?
  • coudnt use logback because of log4j
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • sending mail using smtp is too slow
  • Busy indicator not showing up in wpf window [duplicate]
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • How can I use `wmic` in a Windows PE script?