
Question:
How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I'm trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for that 5 minute interval.
DATE TIME OPEN HIGH LOW CLOSE VOLUME
02/03/1997 09:04:00 3046.00 3048.50 3046.00 3047.50 505
02/03/1997 09:05:00 3047.00 3048.00 3046.00 3047.00 162
02/03/1997 09:06:00 3047.50 3048.00 3047.00 3047.50 98
02/03/1997 09:07:00 3047.50 3047.50 3047.00 3047.50 228
02/03/1997 09:08:00 3048.00 3048.00 3047.50 3048.00 136
02/03/1997 09:09:00 3048.00 3048.00 3046.50 3046.50 174
02/03/1997 09:10:00 3046.50 3046.50 3045.00 3045.00 134
02/03/1997 09:11:00 3045.50 3046.00 3044.00 3045.00 43
02/03/1997 09:12:00 3045.00 3045.50 3045.00 3045.00 214
02/03/1997 09:13:00 3045.50 3045.50 3045.50 3045.50 8
02/03/1997 09:14:00 3045.50 3046.00 3044.50 3044.50 152
Answer1:You can use <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html" rel="nofollow">df.resample
</a> to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:
df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)
This will result in a dataframe with an index where date and time are combined (<a href="https://stackoverflow.com/a/17978188/2285236" rel="nofollow">source</a>):
df.head()
Out[7]:
OPEN HIGH LOW CLOSE VOLUME
DATE_TIME
1997-02-03 09:04:00 3046.0 3048.5 3046.0 3047.5 505
1997-02-03 09:05:00 3047.0 3048.0 3046.0 3047.0 162
1997-02-03 09:06:00 3047.5 3048.0 3047.0 3047.5 98
1997-02-03 09:07:00 3047.5 3047.5 3047.0 3047.5 228
1997-02-03 09:08:00 3048.0 3048.0 3047.5 3048.0 136
After that you can use resample to get the sum, mean, etc. of those five minute intervals.
df.resample("5T").mean()
Out[8]:
OPEN HIGH LOW CLOSE VOLUME
DATE_TIME
1997-02-03 09:00:00 3046.0 3048.5 3046.0 3047.5 505.0
1997-02-03 09:05:00 3047.6 3047.9 3046.8 3047.3 159.6
1997-02-03 09:10:00 3045.6 3045.9 3044.8 3045.0 110.2
1997-02-03 09:15:00 3043.6 3044.0 3042.8 3043.2 69.2
1997-02-03 09:20:00 3044.7 3045.2 3044.5 3045.0 65.8
1997-02-03 09:25:00 3043.8 3044.0 3043.5 3043.7 59.0
1997-02-03 09:30:00 3044.6 3045.0 3044.3 3044.6 56.0
1997-02-03 09:35:00 3044.5 3044.5 3043.5 3044.5 44.0
(<em>T</em> is used for minute frequency. <a href="https://stackoverflow.com/a/17001474/2285236" rel="nofollow">Here</a> is a list of other units.)