Numpy高级应用
|
|
|
|
ndarray对象的内部机制
NumPy 数据类型体系
检测类型是否是某种类型的子类
|
|
True
True
输出某种类型的所有父类
|
|
[numpy.float64,
numpy.floating,
numpy.inexact,
numpy.number,
numpy.generic,
float,
object]
高级数组操作
数组重塑
|
|
array([0, 1, 2, 3, 4, 5, 6, 7])
array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
|
|
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
-1代表自动选择合适的维度
|
|
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
用其他数组的shape进行重塑
|
|
(3, 5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
拉直
|
|
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
会产生一个副本
|
|
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
C vs. Fortran 顺序
|
|
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
数组的合并以及拆分
|
|
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
array([[ 1, 2, 3, 7, 8, 9],
[ 4, 5, 6, 10, 11, 12]])
更方便的方法
|
|
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
array([[ 1, 2, 3, 7, 8, 9],
[ 4, 5, 6, 10, 11, 12]])
|
|
array([[ 0.9659, 1.3079],
[-1.7632, 0.0904],
[-0.6033, 0.2266],
[-0.4417, -1.8609],
[-1.2463, -0.6249]])
array([[ 0.9659, 1.3079]])
array([[-1.7632, 0.0904],
[-0.6033, 0.2266]])
array([[-0.4417, -1.8609],
[-1.2463, -0.6249]])
堆叠辅助类
更…简洁…
|
|
array([[ 0. , 1. ],
[ 2. , 3. ],
[ 4. , 5. ],
[ 0.0376, 1.8236],
[ 0.9025, -0.053 ],
[-0.6849, 1.6728]])
array([[ 0. , 1. , 0. ],
[ 2. , 3. , 1. ],
[ 4. , 5. , 2. ],
[ 0.0376, 1.8236, 3. ],
[ 0.9025, -0.053 , 4. ],
[-0.6849, 1.6728, 5. ]])
|
|
array([[ 1, -10],
[ 2, -9],
[ 3, -8],
[ 4, -7],
[ 5, -6]])
元素的重复操作: tile and repeat
元素级重复
|
|
array([0, 0, 0, 1, 1, 1, 2, 2, 2])
指定重复次数
|
|
array([0, 0, 1, 1, 1, 2, 2, 2, 2])
多维数组需要指定axis
|
|
array([[-0.4628, 1.1142],
[ 0.3637, 0.4341]])
array([[-0.4628, 1.1142],
[-0.4628, 1.1142],
[ 0.3637, 0.4341],
[ 0.3637, 0.4341]])
|
|
array([[-0.4628, 1.1142],
[-0.4628, 1.1142],
[ 0.3637, 0.4341],
[ 0.3637, 0.4341],
[ 0.3637, 0.4341]])
array([[-0.4628, -0.4628, 1.1142, 1.1142, 1.1142],
[ 0.3637, 0.3637, 0.4341, 0.4341, 0.4341]])
块级重复
|
|
array([[-0.4628, 1.1142],
[ 0.3637, 0.4341]])
array([[-0.4628, 1.1142, -0.4628, 1.1142],
[ 0.3637, 0.4341, 0.3637, 0.4341]])
|
|
array([[-0.4628, 1.1142],
[ 0.3637, 0.4341]])
array([[-0.4628, 1.1142],
[ 0.3637, 0.4341],
[-0.4628, 1.1142],
[ 0.3637, 0.4341]])
array([[-0.4628, 1.1142, -0.4628, 1.1142],
[ 0.3637, 0.4341, 0.3637, 0.4341],
[-0.4628, 1.1142, -0.4628, 1.1142],
[ 0.3637, 0.4341, 0.3637, 0.4341],
[-0.4628, 1.1142, -0.4628, 1.1142],
[ 0.3637, 0.4341, 0.3637, 0.4341]])
花式索引的等价函数: take and put
|
|
array([700, 100, 200, 600])
|
|
array([700, 100, 200, 600])
array([ 0, 42, 42, 300, 400, 500, 42, 42, 800, 900])
array([ 0, 41, 42, 300, 400, 500, 43, 40, 800, 900])
|
|
array([[ 0.2772, -1.3059, -1.4607, -0.4856],
[ 1.5585, -0.4521, -1.6259, -1.6644]])
array([[-1.4607, 0.2772, -1.4607, -1.3059],
[-1.6259, 1.5585, -1.6259, -0.4521]])
广播
每一个元素都乘以4
|
|
array([0, 1, 2, 3, 4])
array([ 0, 4, 8, 12, 16])
每一维对应减去均值
|
|
array([-0.1556, 0.3494, -0.2545])
array([[-0.3753, 0.5353, 1.3534],
[-0.4282, 0.5606, 0.8935],
[-0.0956, -0.9767, -1.2444],
[ 0.899 , -0.1192, -1.0024]])
array([ -5.5511e-17, -1.3878e-17, 0.0000e+00])
|
|
array([[-0.5308, 0.8848, 1.0989],
[-0.5837, 0.91 , 0.639 ],
[-0.2511, -0.6273, -1.4989],
[ 0.7434, 0.2302, -1.2569]])
array([[ 0.4843],
[ 0.3218],
[-0.7924],
[-0.0944]])
array([ 7.4015e-17, 0.0000e+00, 0.0000e+00, 0.0000e+00])
沿其他轴向广播
维度不对应
|
|
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-7b87b85a20b2> in <module>()
----> 1 arr - arr.mean(1)
ValueError: operands could not be broadcast together with shapes (4,3) (4,)
|
|
array([[-1.0151, 0.4005, 0.6146],
[-0.9055, 0.5882, 0.3173],
[ 0.5413, 0.1652, -0.7065],
[ 0.8378, 0.3246, -1.1625]])
|
|
array([[[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.]]])
(4, 1, 4)
|
|
array([-1.1083, 0.5576, 1.2277])
array([[-1.1083],
[ 0.5576],
[ 1.2277]])
array([[-1.1083, 0.5576, 1.2277]])
|
|
array([[[-1.9966, -0.2431, -0.992 , 0.8283, -0.5073],
[-0.3938, -0.1332, -0.7427, 0.3094, -0.9241],
[ 1.1069, -0.5383, -0.9288, 0.0233, -0.4678],
[-1.2015, 0.6905, 1.6706, -0.1703, -1.3975]],
[[-0.3048, -1.7181, -0.189 , 0.6263, 1.1194],
[ 0.0823, -0.7132, -0.5162, 1.5305, -1.199 ],
[ 0.5777, 1.2935, 0.1547, -1.3637, 0.4251],
[ 0.4923, 1.4004, 0.3646, 0.1594, -0.7334]],
[[ 1.3836, -0.5313, 0.2826, 0.4739, -1.3435],
[-1.141 , -0.3084, 1.1364, 1.1326, 0.3064],
[-0.9692, 1.0229, -0.0246, 1.4484, -1.137 ],
[ 1.7033, -1.8358, 1.2087, -0.5463, 0.5904]]])
array([[-0.5822, -0.3769, -0.1609, -0.0816],
[-0.0932, -0.1631, 0.2174, 0.3367],
[ 0.0531, 0.2252, 0.0681, 0.2241]])
array([[ 8.8818e-17, 0.0000e+00, -4.4409e-17, -8.8818e-17],
[ 0.0000e+00, 0.0000e+00, 2.7756e-17, 8.8818e-17],
[ 4.4409e-17, 5.5511e-17, 4.4409e-17, 0.0000e+00]])
|
|
通过广播设置数组的值
|
|
array([[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.]])
|
|
array([[ 1.28, 1.28, 1.28],
[-0.42, -0.42, -0.42],
[ 0.44, 0.44, 0.44],
[ 1.6 , 1.6 , 1.6 ]])
array([[-1.37 , -1.37 , -1.37 ],
[ 0.509, 0.509, 0.509],
[ 0.44 , 0.44 , 0.44 ],
[ 1.6 , 1.6 , 1.6 ]])
ufunc高级应用
ufunc实例方法
reduce
通过一系列的二元运算对其值进行聚合(可指明轴向)
|
|
45
45
|
|
这里聚合的是逻辑与操作
|
|
array([[-0.7066, 0.4268, -0.2776, -0.8283, -2.7628],
[ 0.9835, 0.4378, -0.8496, 0.7188, 0.7329],
[ 0.5047, -0.7893, 0.5392, 1.2907, 0.8676],
[ 0.4113, 0.4459, -0.3172, -1.0493, 1.3459],
[ 0.356 , -0.0915, -0.535 , -0.036 , -0.2591]])
array([[-2.7628, -0.8283, -0.7066, -0.2776, 0.4268],
[ 0.9835, 0.4378, -0.8496, 0.7188, 0.7329],
[-0.7893, 0.5047, 0.5392, 0.8676, 1.2907],
[ 0.4113, 0.4459, -0.3172, -1.0493, 1.3459],
[-0.535 , -0.2591, -0.0915, -0.036 , 0.356 ]])
array([[ True, True, True, True],
[False, False, True, True],
[ True, True, True, True],
[ True, False, False, True],
[ True, True, True, True]], dtype=bool)
array([ True, False, True, False, True], dtype=bool)
相对于reduce
只输出最后结果,accumulate
保留中间结果
|
|
array([[ 0, 1, 3, 6, 10],
[ 5, 11, 18, 26, 35],
[10, 21, 33, 46, 60]], dtype=int32)
outer
计算两个数组的叉积
|
|
array([0, 1, 1, 2, 2])
array([[0, 0, 0, 0, 0],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 2, 4, 6, 8],
[0, 2, 4, 6, 8]])
outer
输出结果的维度是输入两个数组的维度之和
|
|
(3, 4, 5)
|
|
array([10, 18, 17], dtype=int32)
|
|
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12]])
array([[ 0, 0, 0],
[ 1, 5, 4],
[ 2, 10, 8],
[ 3, 15, 12]], dtype=int32)
自定义 ufuncs
两种不同的调用方式
|
|
array([0, 2, 4, 6, 8, 10, 12, 14], dtype=object)
|
|
array([ 0., 2., 4., 6., 8., 10., 12., 14.])
自己实现的还是比不上内置优化过的函数
|
|
100 loops, best of 3: 1.81 ms per loop
The slowest run took 16.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.65 µs per loop
结构化和记录式数组
|
|
array([(1.5, 6), (3.141592653589793, -2)],
dtype=[('x', '<f8'), ('y', '<i4')])
|
|
(1.5, 6)
6
|
|
array([ 1.5 , 3.1416])
嵌套dtype
和多维字段
|
|
array([([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0)],
dtype=[('x', '<i8', (3,)), ('y', '<i4')])
|
|
array([0, 0, 0], dtype=int64)
|
|
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int64)
|
|
array([(1.0, 2.0), (3.0, 4.0)],
dtype=[('a', '<f8'), ('b', '<f4')])
array([5, 6])
array([ 1., 3.])
更多有关排序的话题
|
|
array([-1.3918, -0.2089, 0.2316, 0.728 , 0.8356, 1.9956])
|
|
array([[ -2.9812e-01, 1.2037e+00, -1.5768e-02, 7.4395e-01,
8.6880e-01],
[ -4.2865e-01, 7.1886e-01, -1.4510e+00, 1.0510e-01,
-1.7942e+00],
[ -2.8792e-04, 6.1168e-01, -9.1210e-02, -1.2799e+00,
-4.0230e-02]])
array([[ -4.2865e-01, 1.2037e+00, -1.5768e-02, 7.4395e-01,
8.6880e-01],
[ -2.9812e-01, 7.1886e-01, -1.4510e+00, 1.0510e-01,
-1.7942e+00],
[ -2.8792e-04, 6.1168e-01, -9.1210e-02, -1.2799e+00,
-4.0230e-02]])
|
|
array([-0.9699, -0.5626, 1.1172, 0.2791, -1.1148])
array([-1.1148, -0.9699, -0.5626, 0.2791, 1.1172])
array([-0.9699, -0.5626, 1.1172, 0.2791, -1.1148])
|
|
array([[ 0.2266, 0.3405, 2.6439, -1.6262, -0.3976],
[-1.4821, 1.068 , -0.252 , -0.9331, 2.2639],
[-0.2311, 1.1472, 0.9287, -0.9023, 1.1761]])
array([[-1.6262, -0.3976, 0.2266, 0.3405, 2.6439],
[-1.4821, -0.9331, -0.252 , 1.068 , 2.2639],
[-0.9023, -0.2311, 0.9287, 1.1472, 1.1761]])
|
|
array([[ 2.6439, 0.3405, 0.2266, -0.3976, -1.6262],
[ 2.2639, 1.068 , -0.252 , -0.9331, -1.4821],
[ 1.1761, 1.1472, 0.9287, -0.2311, -0.9023]])
间接排序: argsort and lexsort
|
|
array([1, 2, 4, 3, 0], dtype=int64)
array([0, 1, 2, 3, 5])
|
|
array([[ 5. , 0. , 1. , 3. , 2. ],
[ 0.422 , 0.1187, 1.1352, 1.4363, -1.2487],
[ 0.1909, -1.0984, 0.7886, -0.5827, 1.1592]])
array([[ 0. , 1. , 2. , 3. , 5. ],
[ 0.1187, 1.1352, -1.2487, 1.4363, 0.422 ],
[-1.0984, 0.7886, 1.1592, -0.5827, 0.1909]])
|
|
<zip at 0x1d1284f87c8>
其他排序算法
|
|
array([2, 3, 4, 0, 1], dtype=int64)
array(['1:first', '1:second', '1:third', '2:first', '2:second'],
dtype='<U8')
numpy.searchsorted: 在有序数组中查找元素
|
|
3
|
|
array([0, 3, 3, 5], dtype=int64)
|
|
array([0, 3], dtype=int64)
array([3, 7], dtype=int64)
|
|
array([ 143., 8957., 309., 2349., 5503., 2754., 4408., 4259.,
3313., 3364., 2492., 9977., 4704., 5538., 6089., 5864.,
6926., 3677., 8698., 1832., 8931., 6631., 5322., 3712.,
9350., 3945., 9514., 3683., 8568., 8247., 7087., 7630.,
3392., 8320., 1973., 982., 1672., 7052., 6230., 3894.,
1832., 9488., 755., 8522., 1858., 5417., 6162., 7517.,
9827., 4458.])
|
|
array([2, 4, 2, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4,
3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 4, 4, 3, 3, 4, 2, 4, 3, 4,
4, 4, 4, 3], dtype=int64)
|
|
2 547.250000
3 3178.550000
4 7591.038462
dtype: float64
|
|
array([2, 4, 2, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4,
3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 4, 4, 3, 3, 4, 2, 4, 3, 4,
4, 4, 4, 3], dtype=int64)
NumPy matrix class
|
|
array([ 8.8277, 3.8222, -1.1428, 2.0441])
array([[ 8.8277, 3.8222, -1.1428, 2.0441],
[ 3.8222, 6.7527, 0.8391, 2.0829],
[-1.1428, 0.8391, 5.0169, 0.7957],
[ 2.0441, 2.0829, 0.7957, 6.241 ]])
array([[ 8.8277],
[ 3.8222],
[-1.1428],
[ 2.0441]])
|
|
array([[ 1195.468]])
|
|
matrix([[ 8.8277, 3.8222, -1.1428, 2.0441],
[ 3.8222, 6.7527, 0.8391, 2.0829],
[-1.1428, 0.8391, 5.0169, 0.7957],
[ 2.0441, 2.0829, 0.7957, 6.241 ]])
matrix([[ 8.8277],
[ 3.8222],
[-1.1428],
[ 2.0441]])
matrix([[ 1195.468]])
|
|
matrix([[ 1.0000e+00, 6.9616e-17, -4.0136e-17, 8.1258e-17],
[ -2.3716e-17, 1.0000e+00, 2.2230e-17, -2.5721e-17],
[ 1.0957e-16, 5.0783e-18, 1.0000e+00, 7.8658e-18],
[ -5.7092e-17, -3.7777e-18, 6.2391e-18, 1.0000e+00]])
高级数组输入输出
内存映像文件
|
|
memmap([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
|
|
|
|
memmap([[-1.273 , -0.1547, 0.7817, ..., 0.3421, 1.0272, -1.8742],
[-0.3544, -3.1195, 0.1256, ..., -0.4476, 0.4863, -0.8311],
[-1.1117, 0.8186, 2.3934, ..., 0.1061, 1.4123, 0.6489],
...,
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ]])
|
|
memmap([[-1.273 , -0.1547, 0.7817, ..., 0.3421, 1.0272, -1.8742],
[-0.3544, -3.1195, 0.1256, ..., -0.4476, 0.4863, -0.8311],
[-1.1117, 0.8186, 2.3934, ..., 0.1061, 1.4123, 0.6489],
...,
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ]])
|
|
NameError: name 'mmap' is not defined
C:\Users\Ewan\Downloads\pydata-book-master\mymmap
The process cannot access the file because it is being used by another process.
性能建议
连续内存的重要性
|
|
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
True
|
|
1000 loops, best of 3: 848 µs per loop
1000 loops, best of 3: 582 µs per loop
|
|
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
|
|
True
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
|
|
C:\Users\Ewan\Downloads
其他加速手段: Cython, f2py, C
|
|