21857

Question:
I have a pandas dataframe df
which appears as following:
Month Day mnthShape
1 1 1.016754224
1 1 1.099451003
1 1 0.963911929
1 2 1.016754224
1 1 1.099451003
1 2 0.963911929
1 3 1.016754224
1 3 1.099451003
1 3 1.783775568
I want to get the following from df
:
Month Day mnthShape
1 1 1.016754224
1 2 1.016754224
1 3 1.099451003
where the mnthShape
values are selected at random from the index. i.e. if the query is df.loc[(1, 1)] it should look for all values for (1, 1) and select randomly from it a value to be displayed above.
Use groupby
with apply
to select a row at random per group.
np.random.seed(0)
df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index()
Month Day mnthShape
0 1 1 1.016754
1 1 2 0.963912
2 1 3 1.099451
If you want to know what index the sampled rows come from, use pd.Series.sample
with n=1
:
np.random.seed(0)
(df.groupby(['Month', 'Day'])['mnthShape']
.apply(pd.Series.sample, n=1)
.reset_index(level=[0, 1]))
Month Day mnthShape
2 1 1 0.963912
3 1 2 1.016754
6 1 3 1.016754
Answer2:One way is to Series.sample()
a random row from each group:
pd.np.random.seed(1)
res = df.groupby(['Month', 'Day'])['mnthShape'].apply(lambda x: x.sample()).reset_index(level=[0, 1])
res
Month Day mnthShape
0 1 1 1.099451
1 1 2 1.016754
2 1 3 1.016754