22301

Pandas data frame filter by list values - most efficient

Question:

I have following pandas data frame that I have build:

dark Mystery adult crime action comedy cartoon winter snow skiing 0001 0.00 0.000 0.000 0.00 0.00 0.000 0.00 0.56 0.65 0.789 0004 0.89 0.678 -0.423 0.12 0.00 0.000 0.00 0.00 0.00 0.000 0005 0.00 0.000 0.000 0.00 0.12 0.678 -0.89 0.00 0.00 0.000

I also have a list that has some of the row index values of the data frame. After filtering I want to have my new data frame with indexes matching the values in the list.

l = [001,005]

This is large data frame I am trying to figure out without iterating via loop.

[df.index[idx] for idx in l]

This is wrong but I feel I am close to the answer or may be not.

Result should be:

dark Mystery adult crime action comedy cartoon winter snow skiing 0001 0.00 0.000 0.000 0.00 0.00 0.000 0.00 0.56 0.65 0.789 0005 0.00 0.000 0.000 0.00 0.12 0.678 -0.89 0.00 0.00 0.000

Answer1:

How about using <a href="http://pandas.pydata.org/pandas-docs/dev/indexing.html#different-choices-for-indexing" rel="nofollow">.loc</a>:

df.loc[l]

Note, in your actual example, your indices are probably strings rather than integers. When you declare l = [0001, 0005] it's going to be evaluated as [1,5]. So you might want to use l = ["0001", "0005"] or use string formatting to convert the integers (as Jonathan Eunice shows in his answer).

As an aside, <a href="https://www.python.org/dev/peps/pep-0008/#names-to-avoid" rel="nofollow">you should also avoid using lowercase l as a variable name</a>, since it looks similar to 1 in many monospace typefaces.

Answer2:

If your DataFrame is in df:

newdf = df[df.index.isin(l)]

Of course, you have to be careful here. None of your items in l are truly in the index. l = [001,005] is the same as l = [1,5], whereas your index is really strings a la ['0001', '0002', ...]. Given that, you may want to "upgrade" your selection list l to be parallel to your index first:

l = ["{:04d}".format(i) for i in l ] newdf = df[df.index.isin(l)]

Recommend

  • Calculate Accuracy using ROCR Package in R
  • Assign values based on multiple conditions
  • Java code running faster on Mac with slower processor than on my Windows computer?
  • Using YAML-cpp, how to identify unknown keys?
  • Get the EndElement Node of an XElement
  • how to run the stored procedure in batch mode or in run it in parallel processing
  • How do I split up a list into two colums?
  • How to iterate Roles in IEnumerable and display role names in razor view
  • When a listener is removed, is it okay that the event be called on that listener one more time?
  • Submit javascript dynamically added elements to controller method like Stackoverflow
  • Javascript Object: iterating over properties
  • Broccoli-compass and ember-cli 0.39
  • Cant free memory
  • Unable to Parse XML using LINQ in ASP.Net & C#
  • Getting text from inside editText that is contained in a Recyclerview
  • How to iterate over all strings, of all modules, and of all languages?
  • When does iteration variable in for loop increment
  • Why lock Thread safe collections?
  • Boost Fusion container of shared pointers (shared_ptr) causing Segmentation Fault (sigsegv) or garba
  • Adding and Subtracting with Javascript and JSON
  • Filter elements by class name and value-jquery
  • How to record bad lines skipped by pandas
  • Iterating through a list to create new columns in a dataframe
  • Pandas data types change when iterating over the major axis
  • fs.writeFile callback never gets called, same for WritableStream.write, etc
  • Convert “String” of Binary to NSString of text
  • Making a switch statement in C with an array?
  • Iterate twice through a DataReader
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Rearranging Cells in UITableView Bug & Saving Changes
  • align graphs with different xlab
  • Unanticipated behavior
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • Proper way to use connect-multiparty with express.js?
  • Turn off referential integrity in Derby? is it possible?
  • Add sale price programmatically to product variations
  • Can't mass-assign protected attributes when import data from csv file
  • Unable to use reactive element in my shiny app
  • Conditional In-Line CSS for IE and Others?
  • How do I use LINQ to get all the Items that have a particular SubItem?