Pandas data frame filter by list values - most efficient


I have following pandas data frame that I have build:

dark Mystery adult crime action comedy cartoon winter snow skiing 0001 0.00 0.000 0.000 0.00 0.00 0.000 0.00 0.56 0.65 0.789 0004 0.89 0.678 -0.423 0.12 0.00 0.000 0.00 0.00 0.00 0.000 0005 0.00 0.000 0.000 0.00 0.12 0.678 -0.89 0.00 0.00 0.000

I also have a list that has some of the row index values of the data frame. After filtering I want to have my new data frame with indexes matching the values in the list.

l = [001,005]

This is large data frame I am trying to figure out without iterating via loop.

[df.index[idx] for idx in l]

This is wrong but I feel I am close to the answer or may be not.

Result should be:

dark Mystery adult crime action comedy cartoon winter snow skiing 0001 0.00 0.000 0.000 0.00 0.00 0.000 0.00 0.56 0.65 0.789 0005 0.00 0.000 0.000 0.00 0.12 0.678 -0.89 0.00 0.00 0.000


How about using <a href="http://pandas.pydata.org/pandas-docs/dev/indexing.html#different-choices-for-indexing" rel="nofollow">.loc</a>:


Note, in your actual example, your indices are probably strings rather than integers. When you declare l = [0001, 0005] it's going to be evaluated as [1,5]. So you might want to use l = ["0001", "0005"] or use string formatting to convert the integers (as Jonathan Eunice shows in his answer).

As an aside, <a href="https://www.python.org/dev/peps/pep-0008/#names-to-avoid" rel="nofollow">you should also avoid using lowercase l as a variable name</a>, since it looks similar to 1 in many monospace typefaces.


If your DataFrame is in df:

newdf = df[df.index.isin(l)]

Of course, you have to be careful here. None of your items in l are truly in the index. l = [001,005] is the same as l = [1,5], whereas your index is really strings a la ['0001', '0002', ...]. Given that, you may want to "upgrade" your selection list l to be parallel to your index first:

l = ["{:04d}".format(i) for i in l ] newdf = df[df.index.isin(l)]


