40778

Cassandra Database overwhelmed?

Question:

I created a table in a cassandra database with the following query:

CREATE TABLE table( num int, part_key int, val1 int, val2 float, val3 text, ..., PRIMARY KEY((part_key),num) );

The table stores data from a technical device. The partitioning key part_key is 1 for the every record, because I want to execute range queries on only one server. I know this is a bad use case for Cassandra, but I need to do this for a comparison.

The primary key num is the number of the record (from 1 to 8.000.000).

There are like 400 other values per record that are float, int and text type. I Inserted 8.000.000 records to this table (43 GB) and wanted to run my queries like:

SELECT num, val1, val45, val90 FROM ks.table WHERE part_key=1 AND num>9999 AND num<20001;

I executed the query in the cql-shell and got "operation timed out". So I changed read_request_timeout_in_ms and range_request_timeout_in_ms in the cassandra.yaml file to 60000 (2 minutes).

When executing the query again I got "Error 10054: the existing connection was closed by the remotehost" after 5 minutes. The Datastax Cassandra Community Server 2.0.11 Service was not running anymore on the server.

I restarted the service, tried it again and the service crashed again. I could not even restart the service and I had to restart the server. I also tried this using the Cassandra cpp-driver and also could not execute this query.

Small queries like

... AND num<1000;

are still possible.

My question is: did I do something wrong? I know Cassandra is better with more nodes but I thought that Cassandra would only need some more time. Is it possible, that Cassandra is unable to execute a query like that?

Thank You!

The server:

Intel(R) Xeon(R) CPU E5504 @ 2.00GHz 2.00GHz (2 processors) / 16GB RAM

CPU utilization: 50% - 60% and after 15 seconds around 30% / RAM: 2,9 GB the whole time

EDIT:

My Cassandra keyspace is now 60GB and small queries like

... AND num<10;

and even the Inserts return time out. Sometimes the service crashes... Please can someone who got an idea explain that? One answer said that a node with 43GB is not the same in a cluster with more nodes as in my cluster with only one node. Can somebody explain this?

Thank You!

Answer1:

One of the key issues here is that cqlsh with the version of C* that you are running does not page through results. This means the entire result-set has to be serialized at the time of the query which given your data model will be quite large(as pointed out by kha). I would try performing similar queries using a paging enabled driver and of course make sure that you have sufficient network bandwith for returning the data.

43GB Should be easily handled by a single C* node, although operating a C* cluster with only a single node sacrifices almost all of the benefits that C* offers.

Recommend

  • Android store byte[] in the keystore
  • Cassandra CQL 3 - Prefix Select
  • How to check connection of cassandra with pentaho data integrator
  • Using Python and BeautifulSoup to Parse a Table
  • Cassandra Batch Insert in Python
  • Spring-Kafka Integration 1.0.0.RELEASE Issue with Producer
  • java keystore and password changing
  • Matlab and mechanics (mostly physics)
  • How do I add conditionally to a selection of cells in a pandas dataframe column when the the column
  • How to return a HashTable from a WebService?
  • How can I substitute my own custom dynamic scaffolding methods
  • ASP Net Core - Mixing External Identity Provider with Individual User Accounts for Audit Tracking
  • How do I get brown text in a batch file?
  • Getting Application-defined or object-defined error
  • spring data neo4j 3.0.0 - why two labels set by default
  • Grouping by blank nodes
  • Does sql server minds the way records where inserted?
  • Linux command line : edit hacked index files
  • Thread synchronization with syncwarp
  • How can I run DataNucleus Bytecode Enhancer from SBT?
  • Jquery Knockout: ko.computed() vs classic function?
  • Updating both a ConcurrentHashMap and an AtomicInteger safely
  • How to avoid particles glitching together in an elastic particle collision simulator?
  • ActionScript 2 vs ActionScript 3 performance
  • ORA-29908: missing primary invocation for ancillary operator
  • How to get next/previous record number?
  • VB.net deserialize, JSON Conversion from type 'Dictionary(Of String,Object)' to type '
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • How do you join a server to an Active Directory (domain)?
  • How do I configure my settings file to work with unit tests?
  • How does Linux kernel interrupt the application?
  • How to get Windows thread pool to call class member function?
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • costura.fody for a dll that references another dll
  • Binding checkboxes to object values in AngularJs
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?
  • java string with new operator and a literal