Numpy高级应用

ndarray对象的内部机制

NumPy 数据类型体系

True

True


[numpy.float64,
numpy.floating,
numpy.inexact,
numpy.number,
numpy.generic,
float,
object]


高级数组操作

数组重塑

array([0, 1, 2, 3, 4, 5, 6, 7])

array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])

array([[0, 1, 2, 3],
[4, 5, 6, 7]])


-1代表自动选择合适的维度

array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])


(3, 5)

array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])


array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])


C vs. Fortran 顺序

array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])


数组的合并以及拆分

array([[ 1,  2,  3],
[ 4,  5,  6],
[ 7,  8,  9],
[10, 11, 12]])

array([[ 1,  2,  3,  7,  8,  9],
[ 4,  5,  6, 10, 11, 12]])


array([[ 1,  2,  3],
[ 4,  5,  6],
[ 7,  8,  9],
[10, 11, 12]])

array([[ 1,  2,  3,  7,  8,  9],
[ 4,  5,  6, 10, 11, 12]])

array([[ 0.9659,  1.3079],
[-1.7632,  0.0904],
[-0.6033,  0.2266],
[-0.4417, -1.8609],
[-1.2463, -0.6249]])

array([[ 0.9659,  1.3079]])

array([[-1.7632,  0.0904],
[-0.6033,  0.2266]])

array([[-0.4417, -1.8609],
[-1.2463, -0.6249]])


堆叠辅助类

array([[ 0.    ,  1.    ],
[ 2.    ,  3.    ],
[ 4.    ,  5.    ],
[ 0.0376,  1.8236],
[ 0.9025, -0.053 ],
[-0.6849,  1.6728]])

array([[ 0.    ,  1.    ,  0.    ],
[ 2.    ,  3.    ,  1.    ],
[ 4.    ,  5.    ,  2.    ],
[ 0.0376,  1.8236,  3.    ],
[ 0.9025, -0.053 ,  4.    ],
[-0.6849,  1.6728,  5.    ]])

array([[  1, -10],
[  2,  -9],
[  3,  -8],
[  4,  -7],
[  5,  -6]])


元素的重复操作: tile and repeat

array([0, 0, 0, 1, 1, 1, 2, 2, 2])


array([0, 0, 1, 1, 1, 2, 2, 2, 2])


array([[-0.4628,  1.1142],
[ 0.3637,  0.4341]])

array([[-0.4628,  1.1142],
[-0.4628,  1.1142],
[ 0.3637,  0.4341],
[ 0.3637,  0.4341]])

array([[-0.4628,  1.1142],
[-0.4628,  1.1142],
[ 0.3637,  0.4341],
[ 0.3637,  0.4341],
[ 0.3637,  0.4341]])

array([[-0.4628, -0.4628,  1.1142,  1.1142,  1.1142],
[ 0.3637,  0.3637,  0.4341,  0.4341,  0.4341]])


array([[-0.4628,  1.1142],
[ 0.3637,  0.4341]])

array([[-0.4628,  1.1142, -0.4628,  1.1142],
[ 0.3637,  0.4341,  0.3637,  0.4341]])

array([[-0.4628,  1.1142],
[ 0.3637,  0.4341]])

array([[-0.4628,  1.1142],
[ 0.3637,  0.4341],
[-0.4628,  1.1142],
[ 0.3637,  0.4341]])

array([[-0.4628,  1.1142, -0.4628,  1.1142],
[ 0.3637,  0.4341,  0.3637,  0.4341],
[-0.4628,  1.1142, -0.4628,  1.1142],
[ 0.3637,  0.4341,  0.3637,  0.4341],
[-0.4628,  1.1142, -0.4628,  1.1142],
[ 0.3637,  0.4341,  0.3637,  0.4341]])


花式索引的等价函数: take and put

array([700, 100, 200, 600])

array([700, 100, 200, 600])

array([  0,  42,  42, 300, 400, 500,  42,  42, 800, 900])

array([  0,  41,  42, 300, 400, 500,  43,  40, 800, 900])

array([[ 0.2772, -1.3059, -1.4607, -0.4856],
[ 1.5585, -0.4521, -1.6259, -1.6644]])

array([[-1.4607,  0.2772, -1.4607, -1.3059],
[-1.6259,  1.5585, -1.6259, -0.4521]])


广播

array([0, 1, 2, 3, 4])

array([ 0,  4,  8, 12, 16])


array([-0.1556,  0.3494, -0.2545])

array([[-0.3753,  0.5353,  1.3534],
[-0.4282,  0.5606,  0.8935],
[-0.0956, -0.9767, -1.2444],
[ 0.899 , -0.1192, -1.0024]])

array([ -5.5511e-17,  -1.3878e-17,   0.0000e+00])

array([[-0.5308,  0.8848,  1.0989],
[-0.5837,  0.91  ,  0.639 ],
[-0.2511, -0.6273, -1.4989],
[ 0.7434,  0.2302, -1.2569]])

array([[ 0.4843],
[ 0.3218],
[-0.7924],
[-0.0944]])

array([  7.4015e-17,   0.0000e+00,   0.0000e+00,   0.0000e+00])


沿其他轴向广播

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-31-7b87b85a20b2> in <module>()
----> 1 arr - arr.mean(1)

ValueError: operands could not be broadcast together with shapes (4,3) (4,)

array([[-1.0151,  0.4005,  0.6146],
[-0.9055,  0.5882,  0.3173],
[ 0.5413,  0.1652, -0.7065],
[ 0.8378,  0.3246, -1.1625]])

array([[[ 0.,  0.,  0.,  0.]],

[[ 0.,  0.,  0.,  0.]],

[[ 0.,  0.,  0.,  0.]],

[[ 0.,  0.,  0.,  0.]]])

(4, 1, 4)

array([-1.1083,  0.5576,  1.2277])

array([[-1.1083],
[ 0.5576],
[ 1.2277]])

array([[-1.1083,  0.5576,  1.2277]])

array([[[-1.9966, -0.2431, -0.992 ,  0.8283, -0.5073],
[-0.3938, -0.1332, -0.7427,  0.3094, -0.9241],
[ 1.1069, -0.5383, -0.9288,  0.0233, -0.4678],
[-1.2015,  0.6905,  1.6706, -0.1703, -1.3975]],

[[-0.3048, -1.7181, -0.189 ,  0.6263,  1.1194],
[ 0.0823, -0.7132, -0.5162,  1.5305, -1.199 ],
[ 0.5777,  1.2935,  0.1547, -1.3637,  0.4251],
[ 0.4923,  1.4004,  0.3646,  0.1594, -0.7334]],

[[ 1.3836, -0.5313,  0.2826,  0.4739, -1.3435],
[-1.141 , -0.3084,  1.1364,  1.1326,  0.3064],
[-0.9692,  1.0229, -0.0246,  1.4484, -1.137 ],
[ 1.7033, -1.8358,  1.2087, -0.5463,  0.5904]]])

array([[-0.5822, -0.3769, -0.1609, -0.0816],
[-0.0932, -0.1631,  0.2174,  0.3367],
[ 0.0531,  0.2252,  0.0681,  0.2241]])

array([[  8.8818e-17,   0.0000e+00,  -4.4409e-17,  -8.8818e-17],
[  0.0000e+00,   0.0000e+00,   2.7756e-17,   8.8818e-17],
[  4.4409e-17,   5.5511e-17,   4.4409e-17,   0.0000e+00]])


通过广播设置数组的值

array([[ 5.,  5.,  5.],
[ 5.,  5.,  5.],
[ 5.,  5.,  5.],
[ 5.,  5.,  5.]])

array([[ 1.28,  1.28,  1.28],
[-0.42, -0.42, -0.42],
[ 0.44,  0.44,  0.44],
[ 1.6 ,  1.6 ,  1.6 ]])

array([[-1.37 , -1.37 , -1.37 ],
[ 0.509,  0.509,  0.509],
[ 0.44 ,  0.44 ,  0.44 ],
[ 1.6  ,  1.6  ,  1.6  ]])


ufunc高级应用

ufunc实例方法

reduce通过一系列的二元运算对其值进行聚合（可指明轴向）

45

45


array([[-0.7066,  0.4268, -0.2776, -0.8283, -2.7628],
[ 0.9835,  0.4378, -0.8496,  0.7188,  0.7329],
[ 0.5047, -0.7893,  0.5392,  1.2907,  0.8676],
[ 0.4113,  0.4459, -0.3172, -1.0493,  1.3459],
[ 0.356 , -0.0915, -0.535 , -0.036 , -0.2591]])

array([[-2.7628, -0.8283, -0.7066, -0.2776,  0.4268],
[ 0.9835,  0.4378, -0.8496,  0.7188,  0.7329],
[-0.7893,  0.5047,  0.5392,  0.8676,  1.2907],
[ 0.4113,  0.4459, -0.3172, -1.0493,  1.3459],
[-0.535 , -0.2591, -0.0915, -0.036 ,  0.356 ]])

array([[ True,  True,  True,  True],
[False, False,  True,  True],
[ True,  True,  True,  True],
[ True, False, False,  True],
[ True,  True,  True,  True]], dtype=bool)

array([ True, False,  True, False,  True], dtype=bool)


array([[ 0,  1,  3,  6, 10],
[ 5, 11, 18, 26, 35],
[10, 21, 33, 46, 60]], dtype=int32)


outer计算两个数组的叉积

array([0, 1, 1, 2, 2])

array([[0, 0, 0, 0, 0],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 2, 4, 6, 8],
[0, 2, 4, 6, 8]])


outer输出结果的维度是输入两个数组的维度之和

(3, 4, 5)

array([10, 18, 17], dtype=int32)

array([[ 0,  0,  0,  0,  0],
[ 0,  1,  2,  3,  4],
[ 0,  2,  4,  6,  8],
[ 0,  3,  6,  9, 12]])

array([[ 0,  0,  0],
[ 1,  5,  4],
[ 2, 10,  8],
[ 3, 15, 12]], dtype=int32)


自定义 ufuncs

array([0, 2, 4, 6, 8, 10, 12, 14], dtype=object)

array([  0.,   2.,   4.,   6.,   8.,  10.,  12.,  14.])


100 loops, best of 3: 1.81 ms per loop
The slowest run took 16.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.65 µs per loop


结构化和记录式数组

array([(1.5, 6), (3.141592653589793, -2)],
dtype=[('x', '<f8'), ('y', '<i4')])

(1.5, 6)

6

array([ 1.5   ,  3.1416])


嵌套dtype和多维字段

array([([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0)],
dtype=[('x', '<i8', (3,)), ('y', '<i4')])

array([0, 0, 0], dtype=int64)

array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int64)

array([(1.0, 2.0), (3.0, 4.0)],
dtype=[('a', '<f8'), ('b', '<f4')])

array([5, 6])

array([ 1.,  3.])


更多有关排序的话题

array([-1.3918, -0.2089,  0.2316,  0.728 ,  0.8356,  1.9956])

array([[ -2.9812e-01,   1.2037e+00,  -1.5768e-02,   7.4395e-01,
8.6880e-01],
[ -4.2865e-01,   7.1886e-01,  -1.4510e+00,   1.0510e-01,
-1.7942e+00],
[ -2.8792e-04,   6.1168e-01,  -9.1210e-02,  -1.2799e+00,
-4.0230e-02]])

array([[ -4.2865e-01,   1.2037e+00,  -1.5768e-02,   7.4395e-01,
8.6880e-01],
[ -2.9812e-01,   7.1886e-01,  -1.4510e+00,   1.0510e-01,
-1.7942e+00],
[ -2.8792e-04,   6.1168e-01,  -9.1210e-02,  -1.2799e+00,
-4.0230e-02]])

array([-0.9699, -0.5626,  1.1172,  0.2791, -1.1148])

array([-1.1148, -0.9699, -0.5626,  0.2791,  1.1172])

array([-0.9699, -0.5626,  1.1172,  0.2791, -1.1148])

array([[ 0.2266,  0.3405,  2.6439, -1.6262, -0.3976],
[-1.4821,  1.068 , -0.252 , -0.9331,  2.2639],
[-0.2311,  1.1472,  0.9287, -0.9023,  1.1761]])

array([[-1.6262, -0.3976,  0.2266,  0.3405,  2.6439],
[-1.4821, -0.9331, -0.252 ,  1.068 ,  2.2639],
[-0.9023, -0.2311,  0.9287,  1.1472,  1.1761]])

array([[ 2.6439,  0.3405,  0.2266, -0.3976, -1.6262],
[ 2.2639,  1.068 , -0.252 , -0.9331, -1.4821],
[ 1.1761,  1.1472,  0.9287, -0.2311, -0.9023]])


间接排序: argsort and lexsort

array([1, 2, 4, 3, 0], dtype=int64)

array([0, 1, 2, 3, 5])

array([[ 5.    ,  0.    ,  1.    ,  3.    ,  2.    ],
[ 0.422 ,  0.1187,  1.1352,  1.4363, -1.2487],
[ 0.1909, -1.0984,  0.7886, -0.5827,  1.1592]])

array([[ 0.    ,  1.    ,  2.    ,  3.    ,  5.    ],
[ 0.1187,  1.1352, -1.2487,  1.4363,  0.422 ],
[-1.0984,  0.7886,  1.1592, -0.5827,  0.1909]])

<zip at 0x1d1284f87c8>


其他排序算法

array([2, 3, 4, 0, 1], dtype=int64)

array(['1:first', '1:second', '1:third', '2:first', '2:second'],
dtype='<U8')


numpy.searchsorted: 在有序数组中查找元素

3

array([0, 3, 3, 5], dtype=int64)

array([0, 3], dtype=int64)

array([3, 7], dtype=int64)

array([  143.,  8957.,   309.,  2349.,  5503.,  2754.,  4408.,  4259.,
3313.,  3364.,  2492.,  9977.,  4704.,  5538.,  6089.,  5864.,
6926.,  3677.,  8698.,  1832.,  8931.,  6631.,  5322.,  3712.,
9350.,  3945.,  9514.,  3683.,  8568.,  8247.,  7087.,  7630.,
3392.,  8320.,  1973.,   982.,  1672.,  7052.,  6230.,  3894.,
1832.,  9488.,   755.,  8522.,  1858.,  5417.,  6162.,  7517.,
9827.,  4458.])

array([2, 4, 2, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4,
3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 4, 4, 3, 3, 4, 2, 4, 3, 4,
4, 4, 4, 3], dtype=int64)

2     547.250000
3    3178.550000
4    7591.038462
dtype: float64

array([2, 4, 2, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4,
3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 4, 4, 3, 3, 4, 2, 4, 3, 4,
4, 4, 4, 3], dtype=int64)


NumPy matrix class

array([ 8.8277,  3.8222, -1.1428,  2.0441])

array([[ 8.8277,  3.8222, -1.1428,  2.0441],
[ 3.8222,  6.7527,  0.8391,  2.0829],
[-1.1428,  0.8391,  5.0169,  0.7957],
[ 2.0441,  2.0829,  0.7957,  6.241 ]])

array([[ 8.8277],
[ 3.8222],
[-1.1428],
[ 2.0441]])

array([[ 1195.468]])

matrix([[ 8.8277,  3.8222, -1.1428,  2.0441],
[ 3.8222,  6.7527,  0.8391,  2.0829],
[-1.1428,  0.8391,  5.0169,  0.7957],
[ 2.0441,  2.0829,  0.7957,  6.241 ]])

matrix([[ 8.8277],
[ 3.8222],
[-1.1428],
[ 2.0441]])

matrix([[ 1195.468]])

matrix([[  1.0000e+00,   6.9616e-17,  -4.0136e-17,   8.1258e-17],
[ -2.3716e-17,   1.0000e+00,   2.2230e-17,  -2.5721e-17],
[  1.0957e-16,   5.0783e-18,   1.0000e+00,   7.8658e-18],
[ -5.7092e-17,  -3.7777e-18,   6.2391e-18,   1.0000e+00]])


高级数组输入输出

内存映像文件

memmap([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
...,
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

memmap([[-1.273 , -0.1547,  0.7817, ...,  0.3421,  1.0272, -1.8742],
[-0.3544, -3.1195,  0.1256, ..., -0.4476,  0.4863, -0.8311],
[-1.1117,  0.8186,  2.3934, ...,  0.1061,  1.4123,  0.6489],
...,
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ],
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ],
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ]])

memmap([[-1.273 , -0.1547,  0.7817, ...,  0.3421,  1.0272, -1.8742],
[-0.3544, -3.1195,  0.1256, ..., -0.4476,  0.4863, -0.8311],
[-1.1117,  0.8186,  2.3934, ...,  0.1061,  1.4123,  0.6489],
...,
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ],
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ],
[ 0.    ,  0.    ,  0.    , ...,  0.    ,  0.    ,  0.    ]])

NameError: name 'mmap' is not defined

The process cannot access the file because it is being used by another process.


性能建议

连续内存的重要性

  C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

True

1000 loops, best of 3: 848 µs per loop
1000 loops, best of 3: 582 µs per loop

C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

True

C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

C:\Users\Ewan\Downloads