I have a CSV file with the following categories: `item1`

,`item2`

,`item3`

,`item4`

which values is exactly one of the following: `0`

,`1`

,`2`

,`3`

,`4`

.
I would like to count for each items how many are there for each value.
My code is the following, df being the corresponding DataFrame:

```
outputDf = pandas.DataFrame()
cat_list = list(df.columns.values)
for col in cat_list:
s = df.groupby(col).size()
outputDf[col] = s
```

I would like to do exactly the same using the `chunksize`

parameter when I read my CSV with `read_csv`

, because my CSV is very big.
My problem is: I can't find a way to find the `cat_list`

, neither to build the `outputDf`

.

Can someone give me a hint?

### Answer1:

I'd apply `value_counts`

columnwise rather than doing `groupby`

:

```
>>> df = pd.read_csv("basic.csv", usecols=["item1", "item2", "item3", "item4"])
>>> df.apply(pd.value_counts)
item1 item2 item3 item4
0 17 26 17 20
1 21 21 22 19
2 17 18 22 23
3 24 14 20 24
4 21 21 19 14
```

And for the chunked version, we just need to assemble the parts (making sure to `fillna(0)`

so that if a part doesn't have a 3, for example, we get 0 and not `nan`

.)

```
>>> df_iter = pd.read_csv("basic.csv", usecols=["item1", "item2", "item3", "item4"], chunksize=10)
>>> sum(c.apply(pd.value_counts).fillna(0) for c in df_iter)
item1 item2 item3 item4
0 17 26 17 20
1 21 21 22 19
2 17 18 22 23
3 24 14 20 24
4 21 21 19 14
```

(Of course, in practice you'd probably want to use as large a `chunksize`

as you can get away with.)

人吐槽 | 人点赞 |

## Comment