75294

Parsing a data matrix containing HH:MM:SS.mmm times using numpy.loadtxt

Question:

I know I can do something like

numpy.loadtxt('data.txt', dtype={'names': ('time', 'magnitude'), 'formats': ('S12', 'f8')})

but this gives me times as a string. How can I manipulate it into a float?

Answer1:

You could use the <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html" rel="nofollow">converter parameter</a> to apply a function to each string in the first column. Calling a Python function once for each row may slow down np.loadtxt considerably, but this might still be a workable solution for moderate-sized files:

import numpy as np def parse_date(datestr): return sum([multiplier*val for multiplier, val in zip((3600, 60, 1), map(float, datestr.split(':')))]) x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')}, converters={0:parse_date}) print(x) <hr />

Alternatively, you could parse the strings into floats after using loadtxt like this:

x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('S12', 'f8')}) arr = np.char.split(x['time'], ':') # http://stackoverflow.com/a/19459439/190597 (Jaime) newarr = np.fromiter((tuple(row) for row in arr), dtype=[('', np.float)]*3, count=len(arr)).view('float').reshape(-1, 3) times = (newarr * [3600,60,1]).sum(axis=1) y = np.empty_like(x, dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')}) y['time'] = times y['magnitude'] = x['magnitude'] print(y) <hr />

Edit: I created a test files of 10**6 lines to test which method is faster. The second method is a bit faster:

In [329]: %timeit using_fromiter() 1 loops, best of 3: 5.59 s per loop In [328]: %timeit using_converter() 1 loops, best of 3: 6.88 s per loop <hr />import os import numpy as np def create_data(N): data = np.random.random(size=N)*86400 hours, remainder = data.__divmod__(3600) minutes, seconds = remainder.__divmod__(60) mag = np.arange(N) filename = os.path.expanduser('~/tmp/data') with open(filename, 'w') as f: for h,m,s,a in np.column_stack([hours, minutes, seconds, mag]): f.write('{h:d}:{m:d}:{s:.6f} {a}\n'.format(h=int(h), m=int(m), s=s, a=a)) def parse_date(datestr): return sum([multiplier*val for multiplier, val in zip((3600, 60, 1), map(float, datestr.split(':')))]) def using_converter(): x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')}, converters={0:parse_date}) return x def using_fromiter(): x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('S12', 'f8')}) arr = np.char.split(x['time'], ':') newarr = np.fromiter((tuple(row) for row in arr), dtype=[('', np.float)]*3, count=len(arr)).view('float').reshape(-1, 3) times = (newarr * [3600,60,1]).sum(axis=1) y = np.empty_like(x, dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')}) y['time'] = times y['magnitude'] = x['magnitude'] return y create_data(10**6)

Recommend

  • Ruby: List DateTime Format Options [closed]
  • Symfony to set a DateTime in MySQL database
  • How does unicodecsv.DictReader represent a csv file
  • Fix bug: Date values work in iPhone but not iPad
  • Python 3.5: Sort dictionary by key (dates)
  • Swift - NSDate - remove part of date
  • jQuery datepicker not working with iPad
  • Extracting frequencies from a wav file python
  • Contour/curve with orientation
  • Google Big Query using Custom Dimension to get new user count and user count
  • Force Cancel Task with API that might hang
  • RegEx to ignore / skip everything in html tags
  • How is the gradient and hessian of logarithmic loss computed in the custom objective function exampl
  • wrong data in PHP session
  • Drool rules using cron expression?
  • race condition in mysql select sql
  • Is it necessary to close session after tensorflow InteractiveSession()
  • Cannot find control with path: angular2
  • How to get an estimated amount of talktime,music/video playback time, idletime that could be possibl
  • Can't figure out a function to return a reference to a given type stored in RefCell
  • Converting query results into DataFrame in python
  • vectorized indexing/slicing in numpy/scipy?
  • Wrong labels when plotting a time series pandas dataframe with matplotlib
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Approximate Order-Preserving Huffman Code
  • Grails calculated field in SQL
  • Is possible to count alias result on mysql
  • Excel - Autoshape get it's name from cell (value)
  • Check if a string to interpolate provides expected placeholders
  • Matplotlib draw Spline from multiple points
  • Why winpcap requires both .lib and .dll to run?
  • Return words with double consecutive letters
  • RestKit - RKRequestDelegate does not exist
  • Traverse Array and Display in markup
  • Python: how to group similar lists together in a list of lists?
  • Django query for large number of relationships
  • Busy indicator not showing up in wpf window [duplicate]
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • How can I use `wmic` in a Windows PE script?
  • How to push additional view controllers onto NavigationController but keep the TabBar?