Convert 2-column counter-like csv file to Python collections.Counter?


I have a comma separated (,) tab delimited (\t), file.

68,"phrase"\t 485,"another phrase"\t 43, "phrase 3"\t

Is there a simple approach to throw it into a Python Counter?


You could use a dictionary comprehension, is considered more <em>pythonic</em> and <a href="https://stackoverflow.com/questions/52542742/why-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionar" rel="nofollow">it can be marginally faster</a>:

import csv from collections import Counter def convert_counter_like_csv_to_counter(file_to_convert): with file_to_convert.open(encoding="utf-8") as f: csv_reader = csv.DictReader(f, delimiter="\t", fieldnames=["count", "title"]) the_counter = Counter({row["title"]: int(float(row["count"])) for row in csv_reader}) return the_counter


I couldn't let this go and stumbled on what I think is the winner.

In testing it was clear that looping through the rows of the csv.DictReader was the slowest part; taking about 30 of the 40 seconds.

I switched it to simple csv.reader to see what I would get. This resulted in rows of lists. I wrapped this in a dict to see if it directly converted. It did!

Then I could loop through a native dictionary instead of a csv.DictReader.

The result... <strong>done with 4 million rows in 3 seconds</strong>!


