86395

Convert 2-column counter-like csv file to Python collections.Counter?

Question:

I have a comma separated (,) tab delimited (\t), file.

68,"phrase"\t 485,"another phrase"\t 43, "phrase 3"\t

Is there a simple approach to throw it into a Python Counter?

Answer1:

You could use a dictionary comprehension, is considered more <em>pythonic</em> and <a href="https://stackoverflow.com/questions/52542742/why-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionar" rel="nofollow">it can be marginally faster</a>:

import csv from collections import Counter def convert_counter_like_csv_to_counter(file_to_convert): with file_to_convert.open(encoding="utf-8") as f: csv_reader = csv.DictReader(f, delimiter="\t", fieldnames=["count", "title"]) the_counter = Counter({row["title"]: int(float(row["count"])) for row in csv_reader}) return the_counter

Answer2:

I couldn't let this go and stumbled on what I think is the winner.

In testing it was clear that looping through the rows of the csv.DictReader was the slowest part; taking about 30 of the 40 seconds.

I switched it to simple csv.reader to see what I would get. This resulted in rows of lists. I wrapped this in a dict to see if it directly converted. It did!

Then I could loop through a native dictionary instead of a csv.DictReader.

The result... <strong>done with 4 million rows in 3 seconds</strong>!

Recommend

  • Where is the database connection information in an ADP file? [closed]
  • Jackson @JsonRawValue for Map's value
  • How can I shuffle a specific range of an ArrayList?
  • Simple Injector inject dependency into custom global authentication filters and OWIN middle ware OAu
  • Add onload function to an opening window
  • Android: On some devices I get NoSuchMethodError when calling method of linked library
  • partial select doctrine query builder
  • Python: what is the fastest way to map or compress calls and ignore errors?
  • Python write to file based on offset
  • How can we use multi-thread in round robin manner?
  • How to create a search system for a tree of javascript arrays
  • 403 error when executing Google Apps Script form a different google account
  • What is the benefit of using the super global `$_SERVER['PHP_SELF']` in PHP?
  • C# Use/connect to MySQL database in webhost (One.com)
  • std::future returned from std::async hangs while going out of scope
  • implementing euclidean distance based formula using numpy
  • Angular Library Modules export components, services and others from module
  • Failure to Read Updated AnyLogic DB Values
  • MySQL equivalent to MS SQL's Cross Apply
  • Installing Kohana on OpenShift?
  • Google Geocoding API limit exceeded on cell network, but not on wifi
  • Thick underline when hover AND when active
  • Map Annotation Disclosure Indicator - Xamarin.Form
  • Questions related to Garbage Collector and finalize() method
  • Please update your Node runtime to version >=0.12.x
  • how to add dashed border on highcharts “area” graph for every point
  • Connect to a local database from phpmyadmin with R
  • Create an Office365 mailbox from within C# Web API method
  • How to use Typescript with libraries like Ampersand.js that parse configs to build prototypes
  • How to create subsets of a single set of elements with XSLT?
  • how to run a different select statement based on condition in Hive SQL
  • Conflicting declaration using constexpr and auto in C++11
  • Background transfer download task failed when app was closed
  • ssh remote server login script
  • Using Service Component Runtime
  • How to use FirstOrDefault inside Include