21857

selecting random values from dataframe

Question:

I have a pandas dataframe df which appears as following:

Month Day mnthShape 1 1 1.016754224 1 1 1.099451003 1 1 0.963911929 1 2 1.016754224 1 1 1.099451003 1 2 0.963911929 1 3 1.016754224 1 3 1.099451003 1 3 1.783775568

I want to get the following from df:

Month Day mnthShape 1 1 1.016754224 1 2 1.016754224 1 3 1.099451003

where the mnthShape values are selected at random from the index. i.e. if the query is df.loc[(1, 1)] it should look for all values for (1, 1) and select randomly from it a value to be displayed above.

Answer1:

Use groupby with apply to select a row at random per group.

np.random.seed(0) df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index() Month Day mnthShape 0 1 1 1.016754 1 1 2 0.963912 2 1 3 1.099451

If you want to know what index the sampled rows come from, use pd.Series.sample with n=1:

np.random.seed(0) (df.groupby(['Month', 'Day'])['mnthShape'] .apply(pd.Series.sample, n=1) .reset_index(level=[0, 1])) Month Day mnthShape 2 1 1 0.963912 3 1 2 1.016754 6 1 3 1.016754

Answer2:

One way is to Series.sample() a random row from each group:

pd.np.random.seed(1) res = df.groupby(['Month', 'Day'])['mnthShape'].apply(lambda x: x.sample()).reset_index(level=[0, 1]) res Month Day mnthShape 0 1 1 1.099451 1 1 2 1.016754 2 1 3 1.016754

Recommend

  • raise ValueError('Series lengths must match to compare') while manipulating dataframe
  • Change value of column on condition
  • Pandas Dataframe - find the row with minimum value based on two columns but greater than 0
  • Combine set of conditions in data.table to extract value using binary search
  • publishing a typescript library on npm: exported symbols, modules
  • How can i display date in a 1 by 1 array when i use datestr?
  • finding values in pandas series - Python3
  • Xpath how to get element by index AND attribute
  • pandas mix position and label indexing without chaining
  • Compare Pandas dataframes and add column
  • Display Custom Marker in Google Maps Using Relative File Path [duplicate]
  • C# where to add a method
  • How to remove comma or any characters from Python dataframe column name
  • How to concat Pandas dataframe columns
  • You tube videos are not playing
  • Problem with Django using Apache2 (mod_wsgi), Occassionally is “unable to import from module” for no
  • Who propagate bugfixes across branches (corporate development)?
  • Group list of tuples by item
  • MySQL Order by column = x, column asc?
  • AJAX Html Editor Extender upload image appearing blank
  • MongoDb aggregation
  • Can you perform a UNION without a subquery in SQLAlchemy?
  • How to use remove-erase idiom for removing empty vectors in a vector?
  • PostgreSQL Query without WHERE only ORDER BY and LIMIT doesn't use index
  • Scrapy recursive link crawler
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • NetLogo BehaviorSpace - Measure runs using reporters
  • Window Size for Mac application
  • How to handle AllServersUnavailable Exception
  • Display Images one by one with next and previous functionality
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Python: how to group similar lists together in a list of lists?
  • SQL merge duplicate rows and join values that are different
  • WPF Applying a trigger on binding failure
  • Proper way to use connect-multiparty with express.js?
  • Understanding cpu registers
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • Conditional In-Line CSS for IE and Others?
  • How to push additional view controllers onto NavigationController but keep the TabBar?