时间序列
|
|
|
|
|
|
日期和时间数据类型及工具
|
|
datetime.datetime(2017, 3, 8, 14, 47, 50, 32019)
|
|
(2017, 3, 8)
返回值(天数,秒数)
|
|
datetime.timedelta(926, 56700)
|
|
926
|
|
56700
timedelta
天数
|
|
datetime.datetime(2011, 1, 19, 0, 0)
|
|
datetime.datetime(2010, 12, 14, 0, 0)
字符串和datatime
的相互转换
|
|
使用str
直接转换
|
|
'2011-01-03 00:00:00'
格式化转换
|
|
'2011-01-03'
逆转换
|
|
datetime.datetime(2011, 1, 3, 0, 0)
批量转换
|
|
[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]
总是写格式很麻烦,直接调用parser
解析
|
|
datetime.datetime(2011, 1, 3, 0, 0)
可以解析任意格式
|
|
datetime.datetime(1997, 1, 31, 22, 45)
指定格式
|
|
datetime.datetime(2011, 12, 6, 0, 0)
|
|
['7/6/2011', '8/6/2011']
pandas
的API
|
|
DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)
None
也可以转换,只不过会变成缺失值
|
|
DatetimeIndex(['2011-07-06', '2011-08-06', 'NaT'], dtype='datetime64[ns]', freq=None)
|
|
NaT
|
|
array([False, False, True], dtype=bool)
时间序列基础
将行索引变成时间类型,也就是时间戳
|
|
2011-01-02 -0.296854
2011-01-05 -1.968663
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-10 -0.348697
2011-01-12 0.102276
dtype: float64
|
|
pandas.core.series.Series
拥有一个特定的类型
|
|
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
'2011-01-10', '2011-01-12'],
dtype='datetime64[ns]', freq=None)
可以直接进行加法运算,相同的时间戳会进行匹配
|
|
2011-01-02 -0.593708
2011-01-05 NaN
2011-01-07 -0.968984
2011-01-08 NaN
2011-01-10 -0.697394
2011-01-12 NaN
dtype: float64
以纳秒形式存储时间戳
|
|
dtype('<M8[ns]')
行索引就会变成时间戳类型
|
|
Timestamp('2011-01-02 00:00:00')
索引、选取、子集构造
时间戳索引与正常索引行为一样
|
|
-0.4844920247591406
可以直接通过传入与行索引相匹配的时间戳进行索引
|
|
-0.34869693931763396
换个格式也可以,会自动转换为datatime
,只要最后转换成的时间戳是相同的,任意格式都可以
|
|
-0.34869693931763396
通过periods
参数来指定往后顺延的时间长短
|
|
2000-01-01 0.871808
2000-01-02 -0.025158
2000-01-03 0.132813
2000-01-04 -2.006494
2000-01-05 -0.988423
2000-01-06 0.775930
...
2002-09-21 -0.186519
2002-09-22 0.881745
2002-09-23 -1.335826
2002-09-24 0.418774
2002-09-25 0.970405
2002-09-26 0.636320
Freq: D, dtype: float64
时间戳的特殊之处在于可以进行年份以及月份等的选取,相当于一个多维索引
|
|
2001-01-01 -1.799866
2001-01-02 0.499890
2001-01-03 -0.409970
2001-01-04 -0.808111
2001-01-05 -1.220433
2001-01-06 0.581235
...
2001-12-26 -0.312186
2001-12-27 -0.804940
2001-12-28 -0.572741
2001-12-29 -0.175605
2001-12-30 0.693675
2001-12-31 -0.196274
Freq: D, dtype: float64
|
|
2001-05-01 -2.783535
2001-05-02 1.386292
2001-05-03 0.153705
2001-05-04 -0.571590
2001-05-05 -0.933012
2001-05-06 0.579244
...
2001-05-26 0.080809
2001-05-27 0.652650
2001-05-28 0.862616
2001-05-29 -0.967580
2001-05-30 0.907069
2001-05-31 0.551137
Freq: D, dtype: float64
同样可以进行切片,只不过是按照时间的先后度量
|
|
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-10 -0.348697
2011-01-12 0.102276
dtype: float64
|
|
2011-01-02 -0.296854
2011-01-05 -1.968663
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-10 -0.348697
2011-01-12 0.102276
dtype: float64
而且切片不需要进行索引匹配,只需要指定时间范围即可切片
|
|
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-10 -0.348697
dtype: float64
一个可以实现同样功能的内置方法
|
|
2011-01-02 -0.296854
2011-01-05 -1.968663
2011-01-07 -0.484492
2011-01-08 -0.517927
dtype: float64
这里的freq
参数指定了选取的频率,这里的是每一个星期三
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2001-05-02 | 0.506207 | -1.116218 | 0.656575 | 0.212606 |
2001-05-09 | -1.306963 | -0.054373 | -1.165053 | -1.319361 |
2001-05-16 | 0.891692 | -0.463900 | 1.642267 | 0.644972 |
2001-05-23 | -0.025283 | 2.363886 | -0.367988 | 0.827882 |
2001-05-30 | -1.501301 | -2.534553 | 0.256369 | 0.268207 |
带有重复索引的时间序列
直接创建时间戳索引
|
|
2000-01-01 0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int32
|
|
False
|
|
4
如果有重复的时间索引,则会将满足条件的全部输出
|
|
2000-01-02 1
2000-01-02 2
2000-01-02 3
dtype: int32
因此可以直接根据时间戳进行索引
|
|
2000-01-01 0
2000-01-02 2
2000-01-03 4
dtype: int32
|
|
2000-01-01 1
2000-01-02 3
2000-01-03 1
dtype: int64
日期的范围、频率以及移动
pandas
中的时间序列一般被认为是不规则的,也就是说没有固定的频率。但是有时候需要以某种相对固定的频率进行分析,比如每日、每月、每15分钟等(这样自然会在时间序列中引入缺失值)。pandas
拥有一整套标准时间序列频率以及用于重采样、频率推断、生成固定频率日期范围的工具
|
|
2011-01-02 -0.296854
2011-01-05 -1.968663
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-10 -0.348697
2011-01-12 0.102276
dtype: float64
例如,我们可以将之前那个时间序列转换为一个具有固定频率(每日)的时间序列。只需要调用resample即可
|
|
2011-01-02 -0.296854
2011-01-03 NaN
2011-01-04 NaN
2011-01-05 -1.968663
2011-01-06 NaN
2011-01-07 -0.484492
2011-01-08 -0.517927
2011-01-09 NaN
2011-01-10 -0.348697
2011-01-11 NaN
2011-01-12 0.102276
Freq: D, dtype: float64
生成日期范围
data_range
函数, 指定始末
|
|
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
'2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
'2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
'2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
'2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
'2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
'2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
'2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
'2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
'2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
'2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
'2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
'2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
'2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
'2012-05-27', '2012-05-28', '2012-05-29', '2012-05-30',
'2012-05-31', '2012-06-01'],
dtype='datetime64[ns]', freq='D')
只指定起始, 以及长度
|
|
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
'2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
'2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
'2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
'2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
dtype='datetime64[ns]', freq='D')
只指定结尾,以及长度
|
|
DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
'2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
'2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
'2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
'2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
dtype='datetime64[ns]', freq='D')
指定始末,以及采样频率, BM = business end of month
|
|
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
'2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
'2000-09-29', '2000-10-31', '2000-11-30'],
dtype='datetime64[ns]', freq='BM')
默认peroids指的是天数
|
|
DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
'2012-05-04 12:56:31', '2012-05-05 12:56:31',
'2012-05-06 12:56:31'],
dtype='datetime64[ns]', freq='D')
可以省略时间戳
|
|
DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
'2012-05-06'],
dtype='datetime64[ns]', freq='D')
频率和日期偏移量
偏移量可以采用特定单位的时间对象
|
|
<Hour>
4个小时,简单粗暴
|
|
<4 * Hours>
每隔四个小时进行采样
|
|
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
'2000-01-01 08:00:00', '2000-01-01 12:00:00',
'2000-01-01 16:00:00', '2000-01-01 20:00:00',
'2000-01-02 00:00:00', '2000-01-02 04:00:00',
'2000-01-02 08:00:00', '2000-01-02 12:00:00',
'2000-01-02 16:00:00', '2000-01-02 20:00:00',
'2000-01-03 00:00:00', '2000-01-03 04:00:00',
'2000-01-03 08:00:00', '2000-01-03 12:00:00',
'2000-01-03 16:00:00', '2000-01-03 20:00:00'],
dtype='datetime64[ns]', freq='4H')
两个半小时
|
|
<150 * Minutes>
也可以直接使用这种类似于自然语言的形式
|
|
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
'2000-01-01 03:00:00', '2000-01-01 04:30:00',
'2000-01-01 06:00:00', '2000-01-01 07:30:00',
'2000-01-01 09:00:00', '2000-01-01 10:30:00',
'2000-01-01 12:00:00', '2000-01-01 13:30:00'],
dtype='datetime64[ns]', freq='90T')
Week of month dates (WOM日期)
每月第三个星期五
|
|
[Timestamp('2012-01-20 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-02-17 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-03-16 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-04-20 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-05-18 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-06-15 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-07-20 00:00:00', offset='WOM-3FRI'),
Timestamp('2012-08-17 00:00:00', offset='WOM-3FRI')]
移动(超前或滞后)数据
|
|
2000-01-31 1.294798
2000-02-29 -1.907732
2000-03-31 -1.407750
2000-04-30 0.544825
Freq: M, dtype: float64
整体数据前移
|
|
2000-01-31 NaN
2000-02-29 NaN
2000-03-31 1.294798
2000-04-30 -1.907732
Freq: M, dtype: float64
整体数据后移,有点类似于位运算中的移位操作
|
|
2000-01-31 -1.407750
2000-02-29 0.544825
2000-03-31 NaN
2000-04-30 NaN
Freq: M, dtype: float64
移位之后数据对齐
|
|
2000-01-31 NaN
2000-02-29 -2.473382
2000-03-31 -0.262082
2000-04-30 -1.387018
Freq: M, dtype: float64
加入freq
之后就是在行索引上进行时间前移
|
|
2000-03-31 1.294798
2000-04-30 -1.907732
2000-05-31 -1.407750
2000-06-30 0.544825
Freq: M, dtype: float64
在天数上进行前移
|
|
2000-02-03 1.294798
2000-03-03 -1.907732
2000-04-03 -1.407750
2000-05-03 0.544825
dtype: float64
另一种实现方式
|
|
2000-02-03 1.294798
2000-03-03 -1.907732
2000-04-03 -1.407750
2000-05-03 0.544825
dtype: float64
换一个频率
|
|
2000-01-31 01:30:00 1.294798
2000-02-29 01:30:00 -1.907732
2000-03-31 01:30:00 -1.407750
2000-04-30 01:30:00 0.544825
Freq: M, dtype: float64
通过偏移量对日期进行位移
|
|
Timestamp('2011-11-20 00:00:00')
直接移位到月末,是一个相对位移
|
|
Timestamp('2011-11-30 00:00:00')
传入的参数表示第几个月的月末
|
|
Timestamp('2011-12-31 00:00:00')
换一种方式实现,“主语”不同
|
|
Timestamp('2011-11-30 00:00:00')
往回走,上一个月的月末
|
|
Timestamp('2011-10-31 00:00:00')
对日期进行移位之后分组
|
|
2000-01-31 -0.610639
2000-02-29 0.029121
2000-03-31 -0.089587
dtype: float64
另一种方式也可以达到相同的效果
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
if __name__ == '__main__':
2000-01-31 -0.610639
2000-02-29 0.029121
2000-03-31 -0.089587
Freq: M, dtype: float64
时区处理
显示一些时区
|
|
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
显示某个时区的具体信息
|
|
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
本地化和转换
|
|
2012-03-09 09:30:00 0.065144
2012-03-10 09:30:00 -0.391505
2012-03-11 09:30:00 1.207495
2012-03-12 09:30:00 1.516354
2012-03-13 09:30:00 -0.253149
2012-03-14 09:30:00 -0.768138
Freq: D, dtype: float64
没有指定时区的时候默认时区为None
|
|
None
指定时区
|
|
DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
'2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
'2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
'2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
'2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
进行时区的转换
|
|
2012-03-09 09:30:00+00:00 0.065144
2012-03-10 09:30:00+00:00 -0.391505
2012-03-11 09:30:00+00:00 1.207495
2012-03-12 09:30:00+00:00 1.516354
2012-03-13 09:30:00+00:00 -0.253149
2012-03-14 09:30:00+00:00 -0.768138
Freq: D, dtype: float64
|
|
DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
'2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
'2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
继续转换
|
|
2012-03-09 04:30:00-05:00 0.065144
2012-03-10 04:30:00-05:00 -0.391505
2012-03-11 05:30:00-04:00 1.207495
2012-03-12 05:30:00-04:00 1.516354
2012-03-13 05:30:00-04:00 -0.253149
2012-03-14 05:30:00-04:00 -0.768138
Freq: D, dtype: float64
依旧是转换
|
|
2012-03-09 14:30:00+00:00 0.065144
2012-03-10 14:30:00+00:00 -0.391505
2012-03-11 13:30:00+00:00 1.207495
2012-03-12 13:30:00+00:00 1.516354
2012-03-13 13:30:00+00:00 -0.253149
2012-03-14 13:30:00+00:00 -0.768138
Freq: D, dtype: float64
转转转
ts_eastern.tz_convert(‘Europe/Berlin’)
转换之前必须要进行本地化
|
|
操作时区意识型TimeStamp
对象
初始化时间戳,本地化,时区转换
|
|
Timestamp('2011-03-11 23:00:00-0500', tz='US/Eastern')
显式地初始化
|
|
Timestamp('2011-03-12 04:00:00+0300', tz='Europe/Moscow')
自1970年1月1日起计算的纳秒数
|
|
1299902400000000000
这个值是绝对的
|
|
1299902400000000000
|
|
Timestamp('2012-03-12 01:30:00-0400', tz='US/Eastern')
进行时间的位移
|
|
Timestamp('2012-03-12 02:30:00-0400', tz='US/Eastern')
|
|
Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')
|
|
Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')
不同时区之间的运算
|
|
2012-03-07 09:30:00 -0.461750
2012-03-08 09:30:00 0.947394
2012-03-09 09:30:00 0.703239
2012-03-12 09:30:00 0.266519
2012-03-13 09:30:00 0.302334
2012-03-14 09:30:00 -0.000725
2012-03-15 09:30:00 0.305446
2012-03-16 09:30:00 -1.605358
2012-03-19 09:30:00 1.306474
2012-03-20 09:30:00 0.865511
Freq: B, dtype: float64
最终结果会变成UTC
|
|
DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
'2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
'2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
'2012-03-15 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='B')
时期及其算术运算
|
|
Period('2007', 'A-DEC')
|
|
Period('2012', 'A-DEC')
|
|
Period('2005', 'A-DEC')
|
|
7
|
|
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='int64', freq='M')
|
|
2000-01 0.061389
2000-02 0.059265
2000-03 0.779627
2000-04 -0.068995
2000-05 -0.451276
2000-06 -1.531821
Freq: M, dtype: float64
|
|
PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='int64', freq='Q-DEC')
时区的频率转换
以十二月为结尾的一个年时期
|
|
Period('2007-01', 'M')
|
|
Period('2007-12', 'M')
以六月份结尾的一个年时期
|
|
Period('2006-07', 'M')
|
|
Period('2007-06', 'M')
2007年8月是属于以六月结尾的2008年的时期中
|
|
Period('2008', 'A-JUN')
相当于一个批量操作
|
|
2006 0.634252
2007 -0.738716
2008 0.398145
2009 -1.226529
Freq: A-DEC, dtype: float64
|
|
2006-01 0.634252
2007-01 -0.738716
2008-01 0.398145
2009-01 -1.226529
Freq: M, dtype: float64
|
|
2006-12-29 0.634252
2007-12-31 -0.738716
2008-12-31 0.398145
2009-12-31 -1.226529
Freq: B, dtype: float64
按季度计算的时间频率
以一月为截止的第四个季度
|
|
Period('2012Q4', 'Q-JAN')
第四个季度的起始日
|
|
Period('2011-11-01', 'D')
结束日
|
|
Period('2012-01-31', 'D')
截止日前一天的下午四点
|
|
Period('2012-01-30 16:00', 'T')
转化成时间戳对象
|
|
Timestamp('2012-01-30 16:00:00')
|
|
2011Q3 0
2011Q4 1
2012Q1 2
2012Q2 3
2012Q3 4
2012Q4 5
Freq: Q-JAN, dtype: int32
批量转化为时间戳
|
|
2010-10-28 16:00:00 0
2011-01-28 16:00:00 1
2011-04-28 16:00:00 2
2011-07-28 16:00:00 3
2011-10-28 16:00:00 4
2012-01-30 16:00:00 5
dtype: int32
将时间戳转化为时期(以及其逆过程)
|
|
2000-01-31 0.239752
2000-02-29 -0.469201
2000-03-31 2.835243
Freq: M, dtype: float64
默认以月份为单位进行转化
|
|
2000-01 0.239752
2000-02 -0.469201
2000-03 2.835243
Freq: M, dtype: float64
转化为月份为单位的时期
|
|
2000-01 1.126773
2000-01 -0.979309
2000-01 -0.784376
2000-02 -1.490820
2000-02 1.125043
2000-02 0.421830
Freq: M, dtype: float64
|
|
2000-01 0.239752
2000-02 -0.469201
2000-03 2.835243
Freq: M, dtype: float64
逆向转换
|
|
2000-01-31 0.239752
2000-02-29 -0.469201
2000-03-31 2.835243
Freq: M, dtype: float64
通过数组创建PeriodIndex
|
|
0 1959.0
1 1959.0
2 1959.0
3 1959.0
4 1960.0
5 1960.0
...
197 2008.0
198 2008.0
199 2008.0
200 2009.0
201 2009.0
202 2009.0
Name: year, dtype: float64
|
|
0 1.0
1 2.0
2 3.0
3 4.0
4 1.0
5 2.0
...
197 2.0
198 3.0
199 4.0
200 1.0
201 2.0
202 3.0
Name: quarter, dtype: float64
将年份和季度数据统一起来转化为时期索引数据
|
|
PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
'1960Q3', '1960Q4', '1961Q1', '1961Q2',
...
'2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
'2008Q4', '2009Q1', '2009Q2', '2009Q3'],
dtype='int64', length=203, freq='Q-DEC')
|
|
1959Q1 0.00
1959Q2 2.34
1959Q3 2.74
1959Q4 0.27
1960Q1 2.31
1960Q2 0.14
...
2008Q2 8.53
2008Q3 -3.16
2008Q4 -8.79
2009Q1 0.94
2009Q2 3.37
2009Q3 3.56
Freq: Q-DEC, Name: infl, dtype: float64
重采样以及频率转换
相当于进行了一次分组操作
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
app.launch_new_instance()
2000-01-31 -0.055153
2000-02-29 0.189412
2000-03-31 -0.075940
2000-04-30 -0.239036
Freq: M, dtype: float64
换个索引的形式
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
if __name__ == '__main__':
2000-01 -0.055153
2000-02 0.189412
2000-03 -0.075940
2000-04 -0.239036
Freq: M, dtype: float64
降采样
按照分钟进行采样
|
|
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
2000-01-01 00:09:00 9
2000-01-01 00:10:00 10
2000-01-01 00:11:00 11
Freq: T, dtype: int32
每5分钟降采样
|
|
2000-01-01 00:00:00 10
2000-01-01 00:05:00 35
2000-01-01 00:10:00 21
Freq: 5T, dtype: int32
|
|
2000-01-01 00:00:00 10
2000-01-01 00:05:00 35
2000-01-01 00:10:00 21
Freq: 5T, dtype: int32
|
|
2000-01-01 00:00:00 10
2000-01-01 00:05:00 35
2000-01-01 00:10:00 21
Freq: 5T, dtype: int32
加了个时间的偏移
|
|
1999-12-31 23:59:59 10
2000-01-01 00:04:59 35
2000-01-01 00:09:59 21
Freq: 5T, dtype: int32
Open-High-Low-Close (OHLC) 降采样
|
|
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
2000-01-01 00:09:00 9
2000-01-01 00:10:00 10
2000-01-01 00:11:00 11
Freq: T, dtype: int32
以5分钟为单位
|
|
open | high | low | close | |
---|---|---|---|---|
2000-01-01 00:00:00 | 0 | 4 | 0 | 4 |
2000-01-01 00:05:00 | 5 | 9 | 5 | 9 |
2000-01-01 00:10:00 | 10 | 11 | 10 | 11 |
通过GroupBy进行重采样
|
|
1 15
2 45
3 75
4 95
dtype: int32
|
|
0 47.5
1 48.5
2 49.5
3 50.5
4 51.5
5 49.0
6 50.0
dtype: float64
升采样和插值
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01-05 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-12 | -0.587124 | 0.612993 | -0.796000 | -0.341138 |
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01-05 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-06 | NaN | NaN | NaN | NaN |
2000-01-07 | NaN | NaN | NaN | NaN |
2000-01-08 | NaN | NaN | NaN | NaN |
2000-01-09 | NaN | NaN | NaN | NaN |
2000-01-10 | NaN | NaN | NaN | NaN |
2000-01-11 | NaN | NaN | NaN | NaN |
2000-01-12 | -0.587124 | 0.612993 | -0.796000 | -0.341138 |
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01-05 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-06 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-07 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-08 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-09 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-10 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-11 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-12 | -0.587124 | 0.612993 | -0.796000 | -0.341138 |
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01-05 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-06 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-07 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-08 | NaN | NaN | NaN | NaN |
2000-01-09 | NaN | NaN | NaN | NaN |
2000-01-10 | NaN | NaN | NaN | NaN |
2000-01-11 | NaN | NaN | NaN | NaN |
2000-01-12 | -0.587124 | 0.612993 | -0.796000 | -0.341138 |
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01-06 | 0.360773 | 0.506429 | 1.166424 | 1.402336 |
2000-01-13 | -0.587124 | 0.612993 | -0.796000 | -0.341138 |
通过时期进行重采样
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000-01 | -0.254340 | 0.401110 | -0.931350 | -0.872552 |
2000-02 | 0.390968 | -0.815357 | -1.656213 | -2.251621 |
2000-03 | 0.206297 | 0.197394 | 0.927518 | -0.657257 |
2000-04 | -0.451709 | 0.908598 | -0.187902 | -0.498082 |
2000-05 | -0.215150 | -0.042141 | -0.738733 | 2.499246 |
以年为单位
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
以季度为单位
|
|
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000Q1 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q2 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q3 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q4 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q1 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q2 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q3 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q4 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == '__main__':
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000Q1 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q2 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q3 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2000Q4 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q1 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q2 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q3 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2001Q4 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == '__main__':
Colorado | Texas | New York | Ohio | |
---|---|---|---|---|
2000Q4 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q1 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q2 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q3 | -0.049383 | 0.037021 | -0.272851 | -0.140984 |
2001Q4 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2002Q1 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2002Q2 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
2002Q3 | -0.183766 | -0.291993 | 0.340941 | 0.209276 |
时间序列绘图
|
|
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2292 entries, 2003-01-02 to 2011-10-14
Freq: B
Data columns (total 3 columns):
AAPL 2292 non-null float64
MSFT 2292 non-null float64
XOM 2292 non-null float64
dtypes: float64(3)
memory usage: 71.6 KB
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
app.launch_new_instance()
按年绘图
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x1fb1f85a080>
按月绘图
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x1fb20c4d550>
按天绘图
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21235668>
按季度绘图
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == '__main__':
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21346c50>
移动窗口函数
|
|
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x1fb213b95f8>
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=250,center=False).mean()
from ipykernel import kernelapp as app
<matplotlib.axes._subplots.AxesSubplot at 0x1fb213b95f8>
|
|
<matplotlib.figure.Figure at 0x1fb212fb550>
<matplotlib.figure.Figure at 0x1fb212fb550>
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: pd.rolling_std is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=250,min_periods=10,center=False).std()
if __name__ == '__main__':
2003-01-09 NaN
2003-01-10 NaN
2003-01-13 NaN
2003-01-14 NaN
2003-01-15 0.077496
2003-01-16 0.074760
2003-01-17 0.112368
Freq: B, Name: AAPL, dtype: float64
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21466ba8>
|
|
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(window=60,center=False).mean()
if __name__ == '__main__':
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21571208>
|
|
指数加权函数
更好的拟合
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:6: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=60,min_periods=50,center=False).mean()
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:7: FutureWarning: pd.ewm_mean is deprecated for Series and will be removed in a future version, replace with
Series.ewm(span=60,ignore_na=False,min_periods=0,adjust=True).mean()
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21983b70>
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21983b70>
<matplotlib.axes._subplots.AxesSubplot at 0x1fb219c9a20>
<matplotlib.axes._subplots.AxesSubplot at 0x1fb219c9a20>
<matplotlib.text.Text at 0x1fb219b10b8>
<matplotlib.text.Text at 0x1fb219efcc0>
二元移动窗口函数
|
|
AAPL | MSFT | XOM | |
---|---|---|---|
2003-01-02 | 7.40 | 21.11 | 29.22 |
2003-01-03 | 7.45 | 21.14 | 29.24 |
2003-01-06 | 7.45 | 21.52 | 29.96 |
2003-01-07 | 7.43 | 21.93 | 28.95 |
2003-01-08 | 7.28 | 21.31 | 28.83 |
2003-01-09 | 7.34 | 21.93 | 29.44 |
… | … | … | … |
2011-10-07 | 369.80 | 26.25 | 73.56 |
2011-10-10 | 388.81 | 26.94 | 76.28 |
2011-10-11 | 400.29 | 27.00 | 76.27 |
2011-10-12 | 402.19 | 26.96 | 77.16 |
2011-10-13 | 408.43 | 27.18 | 76.37 |
2011-10-14 | 422.00 | 27.27 | 78.11 |
2292 rows × 3 columns
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: pd.rolling_corr is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=125,min_periods=100).corr(other=<Series>)
app.launch_new_instance()
<matplotlib.axes._subplots.AxesSubplot at 0x1fb21a93438>
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: pd.rolling_corr is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(window=125,min_periods=100).corr(other=<Series>)
if __name__ == '__main__':
<matplotlib.axes._subplots.AxesSubplot at 0x1fb22b21438>
用户自定义移动窗口函数
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=250,center=False).apply(kwargs=<dict>,args=<tuple>,func=<function>)
app.launch_new_instance()
<matplotlib.axes._subplots.AxesSubplot at 0x1fb3dbdd2e8>
性能和内存使用方面的注意事项
|
|
2000-01-01 00:00:00.000 -0.428577
2000-01-01 00:00:00.010 1.650203
2000-01-01 00:00:00.020 -0.064777
2000-01-01 00:00:00.030 -0.219433
2000-01-01 00:00:00.040 1.907433
2000-01-01 00:00:00.050 0.103347
...
2000-01-02 03:46:39.940 0.989446
2000-01-02 03:46:39.950 2.333137
2000-01-02 03:46:39.960 0.354455
2000-01-02 03:46:39.970 0.353224
2000-01-02 03:46:39.980 -0.862868
2000-01-02 03:46:39.990 2.007468
Freq: 10L, dtype: float64
|
|
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 11112 entries, 2000-01-01 00:00:00 to 2000-04-25 17:45:00
Freq: 15T
Data columns (total 4 columns):
open 11112 non-null float64
high 11112 non-null float64
low 11112 non-null float64
close 11112 non-null float64
dtypes: float64(4)
memory usage: 434.1 KB
|
|
10 loops, best of 3: 123 ms per loop
|
|
1 loop, best of 3: 192 ms per loop