23355

Getting date ranges for multiple datetime pairs

Given a datetime array of the shape (n, 2):

x = np.array([['2017-10-02T00:00:00.000000000', '2017-10-12T00:00:00.000000000']], dtype='datetime64[ns]')

x has shape (1, 2), but in reality it could be (n, 2), n >= 1. In each pair, the first date is always smaller than (or equal to) the second. I want to get a list of all date ranges between each pair of dates in x. This is what I'm doing basically:

np.concatenate([pd.date_range(*y, closed='right') for y in x])

And it works, giving

array(['2017-10-03T00:00:00.000000000', '2017-10-04T00:00:00.000000000', '2017-10-05T00:00:00.000000000', '2017-10-06T00:00:00.000000000', '2017-10-07T00:00:00.000000000', '2017-10-08T00:00:00.000000000', '2017-10-09T00:00:00.000000000', '2017-10-10T00:00:00.000000000', '2017-10-11T00:00:00.000000000', '2017-10-12T00:00:00.000000000'], dtype='datetime64[ns]')

But this is pretty slow because of the list comp - it isn't exactly vectorised as I'd like. I'm wondering if there's a better way to obtain date ranges for multiple pairs of dates?

I'll provide as much clarification as needed. Thanks.

Answer1:

It's a tad convoluted... But

d = np.array(1, dtype='timedelta64[D]') x = x.astype('datetime64[D]') deltas = np.diff(x, axis=1) / d np.concatenate([ i + np.arange(j + 1) for i, j in zip(x[:, 0], deltas[:, 0].astype(int)) ]).astype('datetime64[ns]') array(['2017-10-02T00:00:00.000000000', '2017-10-03T00:00:00.000000000', '2017-10-04T00:00:00.000000000', '2017-10-05T00:00:00.000000000', '2017-10-06T00:00:00.000000000', '2017-10-07T00:00:00.000000000', '2017-10-08T00:00:00.000000000', '2017-10-09T00:00:00.000000000', '2017-10-10T00:00:00.000000000', '2017-10-11T00:00:00.000000000', '2017-10-12T00:00:00.000000000'], dtype='datetime64[ns]') <hr>

<strong>How it works</strong>

    <li>d represents one day</li> <li>x is turned into dates with no timestamps</li> <li>diff gets me the number of days difference... but in timedelta space</li> <li>I divide by my d which is also in timedelta space and the dimensions disappear... leaving me with float which I cast to int</li> <li>When I add the first column of the pairs x[:, 0] to an array of integers, I get a broadcasting of adding 1 unit of whatever the dimension is of x, which is datetime64[D]. So I'm adding one day.</li> </ul> <hr>

    <strong>Derived from / Inspired by @hpaulj</strong> Will remove if they post an answer

    d = np.array(1, dtype='timedelta64[D]') np.concatenate([np.arange(row[0], row[1] + 1, d) for row in x]) array(['2017-10-02T00:00:00.000000000', '2017-10-03T00:00:00.000000000', '2017-10-04T00:00:00.000000000', '2017-10-05T00:00:00.000000000', '2017-10-06T00:00:00.000000000', '2017-10-07T00:00:00.000000000', '2017-10-08T00:00:00.000000000', '2017-10-09T00:00:00.000000000', '2017-10-10T00:00:00.000000000', '2017-10-11T00:00:00.000000000', '2017-10-12T00:00:00.000000000'], dtype='datetime64[ns]')

Recommend

  • Find value in one column in another column with regex in pandas
  • Pandas DatetimeIndex indexing dtype: datetime64 vs Timestamp
  • Undo a Series Diff
  • How to loop list value of a specific column in pandas?
  • Pandas shift based on different values to calculate percentages
  • Slicing array by using another array as the slice indices along axis
  • depth first tree traversal accumulation in clojure
  • Will Route53 private hosted zone work over AWS VPC Peering
  • in r combine a list of lists into one list
  • Testing $http.get() requests in angular
  • Appending strings from json object on a condition together efficiently?
  • R: merging copies of the same variable
  • Is there an existing gem or script that converts numbers to comp-3/packed decimal format?
  • Acitivity Two starts before Main Activity
  • How to auto update a record in database?
  • How to put all my selected columns into a dummy variable?
  • How to make Plotly chart with year mapped to line color and months on x-axis
  • Keep pika BlockingConnection alive without disabling heartbeat
  • How to update powerpivot pivot table filter via cell reference?
  • How to lookup value with multiple criteria in excel 2007 and newer
  • VBScript to check for open process by user
  • Reloading table causes flickering
  • Always require certain dependencies in RequireJS
  • Unable to run SDL program in Eclipse but able to do so in Windows Explorer
  • Generate a unique string based on a pair of strings
  • How to filter on year and quarter in pandas
  • Color time-series based on column values in pandas
  • Is looping through all style sheets and classes a good idea in JavaScript?
  • C# - Most efficient way to iterate through multiple arrays/list
  • netsh acl setting (need alternative method - registry settings?)
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • C# fibonacci function returning errors
  • Remove final comma from string in vb.net
  • Error when parsing timestamp with pandas read_csv
  • Asynchronous UI Testing in Xcode With Swift
  • How to recover from a Spring Social ExpiredAuthorizationException
  • ILMerge & Keep Assembly Name
  • Large data - storage and query
  • WOWZA + RTMP + HTML5 Playback?
  • python regex in pyparsing