61722

Group DataFrame in 5-minute intervals

Question:

How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I'm trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for that 5 minute interval.

DATE TIME OPEN HIGH LOW CLOSE VOLUME 02/03/1997 09:04:00 3046.00 3048.50 3046.00 3047.50 505 02/03/1997 09:05:00 3047.00 3048.00 3046.00 3047.00 162 02/03/1997 09:06:00 3047.50 3048.00 3047.00 3047.50 98 02/03/1997 09:07:00 3047.50 3047.50 3047.00 3047.50 228 02/03/1997 09:08:00 3048.00 3048.00 3047.50 3048.00 136 02/03/1997 09:09:00 3048.00 3048.00 3046.50 3046.50 174 02/03/1997 09:10:00 3046.50 3046.50 3045.00 3045.00 134 02/03/1997 09:11:00 3045.50 3046.00 3044.00 3045.00 43 02/03/1997 09:12:00 3045.00 3045.50 3045.00 3045.00 214 02/03/1997 09:13:00 3045.50 3045.50 3045.50 3045.50 8 02/03/1997 09:14:00 3045.50 3046.00 3044.50 3044.50 152

Answer1:

You can use <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html" rel="nofollow">df.resample</a> to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:

df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)

This will result in a dataframe with an index where date and time are combined (<a href="https://stackoverflow.com/a/17978188/2285236" rel="nofollow">source</a>):

df.head() Out[7]: OPEN HIGH LOW CLOSE VOLUME DATE_TIME 1997-02-03 09:04:00 3046.0 3048.5 3046.0 3047.5 505 1997-02-03 09:05:00 3047.0 3048.0 3046.0 3047.0 162 1997-02-03 09:06:00 3047.5 3048.0 3047.0 3047.5 98 1997-02-03 09:07:00 3047.5 3047.5 3047.0 3047.5 228 1997-02-03 09:08:00 3048.0 3048.0 3047.5 3048.0 136

After that you can use resample to get the sum, mean, etc. of those five minute intervals.

df.resample("5T").mean() Out[8]: OPEN HIGH LOW CLOSE VOLUME DATE_TIME 1997-02-03 09:00:00 3046.0 3048.5 3046.0 3047.5 505.0 1997-02-03 09:05:00 3047.6 3047.9 3046.8 3047.3 159.6 1997-02-03 09:10:00 3045.6 3045.9 3044.8 3045.0 110.2 1997-02-03 09:15:00 3043.6 3044.0 3042.8 3043.2 69.2 1997-02-03 09:20:00 3044.7 3045.2 3044.5 3045.0 65.8 1997-02-03 09:25:00 3043.8 3044.0 3043.5 3043.7 59.0 1997-02-03 09:30:00 3044.6 3045.0 3044.3 3044.6 56.0 1997-02-03 09:35:00 3044.5 3044.5 3043.5 3044.5 44.0

(<em>T</em> is used for minute frequency. <a href="https://stackoverflow.com/a/17001474/2285236" rel="nofollow">Here</a> is a list of other units.)

Recommend

  • Difference in average AUC computation using ROCR and pROC (R)
  • Data manipulation startdate enddate python pandas
  • Resample by custom annual frequency
  • pandas - how to organised dataframe based on date and assign new values to column
  • Pandas: Transpose a list in column into rows
  • MongoDB: Sort by subdocument with unknown name
  • What's the best way to find the most frequently occurring value in MongoDB?
  • combination of smote and undersampling on weka
  • How to convert rows into columns in SQL Server?
  • Rails - Cocoon gem - Nested Forms
  • Complex Maven2 with Flex4 Setup
  • code works at jsfiddle but not on my site [closed]
  • Group By and add columns
  • python - Fill in missing dates with respect to a specific attribute in pandas
  • jquery add an event handler to objects in an array
  • How to get list of users who's birthday is today in MongoDB
  • How to concat Pandas dataframe columns
  • Extract zip entries to another Zip file
  • Not able to aggregate on nested fields in elasticsearch
  • ActiveRecord query for a count of new users by day
  • Use of this Javascript
  • C++ Partial template specialization - design simplification
  • R - Combining Columns to String Based on Logical Match
  • Illegal mix of collations for operation for date/time comparison
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Release, debug version and Authorization Google?
  • Weird JavaScript statement, what does it mean?
  • How do you troubleshoot character encoding problems?
  • How to get next/previous record number?
  • How to format a variable of double type
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • Python: how to group similar lists together in a list of lists?
  • Hits per day in Google Big Query
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • reshape alternating columns in less time and using less memory
  • How to Embed XSL into XML
  • How can I use threading to 'tick' a timer to be accessed by other threads?