If I have a list with say 10 million objects, how do I filter the list quickly. It takes about 4-5 seconds for a complete iteration thru a list comprehension. Are there any efficient data structures or libraries for this in python? Or is python not suited for large sets of data?Answer1:
<a href="http://docs.python.org/library/itertools.html" rel="nofollow">Itertools</a> is designed for efficient looping. Particularly, you might find that
ifilter suits your purpose. Iterating through large data structures is always expensive, but if you only need some of the data at a time lazy evaluation can help a lot.
You can also try using generator expressions, which are usually identical to their list comprehension counterparts (though usage can be different) or a generator, which also have the benefits of lazy evaluation.Answer2:
If you have uniform types of numbers & if speed is your primary goal (and you want to use python), use a Numpy array.Answer3:
Even using the builtin functions on a very primitive integer array takes several seconds to evaluate on my computer.
>>> l=*10000000 >>> s=filter(lambda x:True,l)
I'd suggest you using a different approach such as using <a href="http://numpy.scipy.org/" rel="nofollow">Numpy</a> or lazy evaluation with <a href="http://docs.python.org/tutorial/classes.html#generators" rel="nofollow">generators</a> and/or using iteration module <a href="http://docs.python.org/library/itertools.html" rel="nofollow">itertools</a>