87504

Filtering / iterating through very large lists in python

Question:

If I have a list with say 10 million objects, how do I filter the list quickly. It takes about 4-5 seconds for a complete iteration thru a list comprehension. Are there any efficient data structures or libraries for this in python? Or is python not suited for large sets of data?

Answer1:

<a href="http://docs.python.org/library/itertools.html" rel="nofollow">Itertools</a> is designed for efficient looping. Particularly, you might find that ifilter suits your purpose. Iterating through large data structures is always expensive, but if you only need some of the data at a time lazy evaluation can help a lot.

You can also try using generator expressions, which are usually identical to their list comprehension counterparts (though usage can be different) or a generator, which also have the benefits of lazy evaluation.

Answer2:

If you have uniform types of numbers & if speed is your primary goal (and you want to use python), use a Numpy array.

Answer3:

Even using the builtin functions on a very primitive integer array takes several seconds to evaluate on my computer.

>>> l=[1]*10000000 >>> s=filter(lambda x:True,l)

I'd suggest you using a different approach such as using <a href="http://numpy.scipy.org/" rel="nofollow">Numpy</a> or lazy evaluation with <a href="http://docs.python.org/tutorial/classes.html#generators" rel="nofollow">generators</a> and/or using iteration module <a href="http://docs.python.org/library/itertools.html" rel="nofollow">itertools</a>

Recommend

  • failed to install django-chronograph in django 1.7
  • How can I sort an IntStream in ascending order?
  • Display MediaLibraryPickerField when editting a part
  • Can I recreate a temp table after dropping it?
  • JS insert into array at specific index [duplicate]
  • Angular2 - HTTP call unit testing
  • Python multiprocessing using a lock or manager list for Pool workers accessing a global list variabl
  • C, Little and Big Endian confusion
  • CSS image mask broken in Cordova for iOS 11.x?
  • Azure function C#: Create or replace document in cosmos db on HTTP request
  • Flask-Admin batch action with form
  • Byte Array to *Signed* Int
  • System.IO.IOException: Too many open files
  • R to BigQuery Data Upload Error
  • How to use Sanitize on HTML Entity
  • Caching of Google Cloud Endpoints?
  • Tensorflow transform on beams with flink runner
  • Select running balance from table credit debit columns
  • How to create 2 svg's on one page?
  • How I can specify how rainbow color scheme should be converted to grayscale
  • Heroku Git Push Master Error
  • Converting a self subquery to a self join
  • WooCommerce get order quantity in thank you page and redirect
  • yii rewrite url with many sub categories
  • Django REST framework - HyperlinkedRelatedField with additional parameter
  • Unable to start a WebView from an AsyncTask
  • playing mp3 from nsbundle
  • How to select multiple items from a List view - JavaFX 8
  • Conflicting declaration using constexpr and auto in C++11
  • Google App Engine backend servlet not responding
  • Make checkout phone field optional for specific countries in WooCommerce
  • calling IO Operations from thread in ruby c extension will cause ruby to hang
  • Excel VBA : conditional formatting of sheet1 cells from sheet2 values in excel 2007
  • Angular 4: Responsive Grid List
  • Write to .csv file with PHP (Commas in Data Error)