How to handle AllServersUnavailable Exception

I wanted to do a simple write operation to a Cassandra instance (v1.1.10) on a single node. I just wanted to see how it handles constant writes and if it can keep up with the write speed.

pool = ConnectionPool('testdb') test_cf = ColumnFamily(pool,'test') test2_cf = ColumnFamily(pool,'test2') test3_cf = ColumnFamily(pool,'test3') test_batch = test_cf.batch(queue_size=1000) test2_batch = test2_cf.batch(queue_size=1000) test3_batch = test3_cf.batch(queue_size=1000) chars=string.ascii_uppercase counter = 0 while True: counter += 1 uid = uuid.uuid1() junk = ''.join(random.choice(chars) for x in range(50)) test_batch.insert(uid, {'junk':junk}) test2_batch.insert(uid, {'junk':junk}) test3_batch.insert(uid, {'junk':junk}) sys.stdout.write(str(counter)+'\n') pool.dispose()

The code keeps crushing after a long write (when the counter is around 10M+) with the following message

pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out

I set the queue_size=100 which didn't help. Also I fired up the cqlsh -3 console to truncate the table after the script crashed and got the following error:

Unable to complete request: one or more nodes were unavailable.

Tailing /var/log/cassandra/system.log gives no error sign but INFO on Compaction, FlushWriter and so on. What am I doing wrong?


I've had this problem too - as @tyler-hobbs suggested in his comment the node is likely overloaded (it was for me). A simple fix that I've used is to back-off and let the node catch up. I've rewritten your loop above to catch the error, sleep a while and try again. I've run this against a single node cluster and it works a treat - pausing (for a minute) and backing off periodically (no more than 5 times in a row). No data is missed using this script unless the error throws five times in a row (in which case you probably want to fail hard rather than return to the loop).

while True: counter += 1 uid = uuid.uuid1() junk = ''.join(random.choice(chars) for x in range(50)) tryCount = 5 # 5 is probably unnecessarily high while tryCount > 0: try: test_batch.insert(uid, {'junk':junk}) test2_batch.insert(uid, {'junk':junk}) test3_batch.insert(uid, {'junk':junk}) tryCount = -1 except pycassa.pool.AllServersUnavailable as e: print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down" time.sleep(60) # A delay of 60s is probably unnecessarily high tryCount = tryCount - 1 sys.stdout.write(str(counter)+'\n')

I've added a complete gist here


