数据规整化:清理、转换、合并、重塑
|
|
|
|
合并数据集
数据库风格的DataFrame合并
|
|
data1 | key | |
---|---|---|
0 | 0 | b |
1 | 1 | b |
2 | 2 | a |
3 | 3 | c |
4 | 4 | a |
5 | 5 | a |
6 | 6 | b |
|
|
data2 | key | |
---|---|---|
0 | 0 | a |
1 | 1 | b |
2 | 2 | d |
默认情况下根据重叠的列名进行合并
|
|
data1 | key | data2 | |
---|---|---|---|
0 | 0 | b | 1 |
1 | 1 | b | 1 |
2 | 6 | b | 1 |
3 | 2 | a | 0 |
4 | 4 | a | 0 |
5 | 5 | a | 0 |
最好进行显式地指定
|
|
data1 | key | data2 | |
---|---|---|---|
0 | 0 | b | 1 |
1 | 1 | b | 1 |
2 | 6 | b | 1 |
3 | 2 | a | 0 |
4 | 4 | a | 0 |
5 | 5 | a | 0 |
|
|
|
|
data1 | lkey | |
---|---|---|
0 | 0 | b |
1 | 1 | b |
2 | 2 | a |
3 | 3 | c |
4 | 4 | a |
5 | 5 | a |
6 | 6 | b |
|
|
data2 | rkey | |
---|---|---|
0 | 0 | a |
1 | 1 | b |
2 | 2 | d |
如果两个对象的列名不同,那么就需要分别指定
|
|
data1 | lkey | data2 | rkey | |
---|---|---|---|---|
0 | 0 | b | 1 | b |
1 | 1 | b | 1 | b |
2 | 6 | b | 1 | b |
3 | 2 | a | 0 | a |
4 | 4 | a | 0 | a |
5 | 5 | a | 0 | a |
默认是进行inner连接(交集), outer是求取并集
|
|
data1 | key | data2 | |
---|---|---|---|
0 | 0.0 | b | 1.0 |
1 | 1.0 | b | 1.0 |
2 | 6.0 | b | 1.0 |
3 | 2.0 | a | 0.0 |
4 | 4.0 | a | 0.0 |
5 | 5.0 | a | 0.0 |
6 | 3.0 | c | NaN |
7 | NaN | d | 2.0 |
|
|
|
|
data1 | key | |
---|---|---|
0 | 0 | b |
1 | 1 | b |
2 | 2 | a |
3 | 3 | c |
4 | 4 | a |
5 | 5 | b |
|
|
data2 | key | |
---|---|---|
0 | 0 | a |
1 | 1 | b |
2 | 2 | a |
3 | 3 | b |
4 | 4 | d |
|
|
data1 | key | data2 | |
---|---|---|---|
0 | 0 | b | 1.0 |
1 | 0 | b | 3.0 |
2 | 1 | b | 1.0 |
3 | 1 | b | 3.0 |
4 | 2 | a | 0.0 |
5 | 2 | a | 2.0 |
6 | 3 | c | NaN |
7 | 4 | a | 0.0 |
8 | 4 | a | 2.0 |
9 | 5 | b | 1.0 |
10 | 5 | b | 3.0 |
|
|
data1 | key | data2 | |
---|---|---|---|
0 | 0 | b | 1 |
1 | 0 | b | 3 |
2 | 1 | b | 1 |
3 | 1 | b | 3 |
4 | 5 | b | 1 |
5 | 5 | b | 3 |
6 | 2 | a | 0 |
7 | 2 | a | 2 |
8 | 4 | a | 0 |
9 | 4 | a | 2 |
|
|
|
|
|
key1 | key2 | lval | |
---|---|---|---|
0 | foo | one | 1 |
1 | foo | two | 2 |
2 | bar | one | 3 |
|
|
key1 | key2 | rval | |
---|---|---|---|
0 | foo | one | 4 |
1 | foo | one | 5 |
2 | bar | one | 6 |
3 | bar | two | 7 |
|
|
key1 | key2 | lval | rval | |
---|---|---|---|---|
0 | foo | one | 1.0 | 4.0 |
1 | foo | one | 1.0 | 5.0 |
2 | foo | two | 2.0 | NaN |
3 | bar | one | 3.0 | 6.0 |
4 | bar | two | NaN | 7.0 |
列名重复问题
|
|
key1 | key2_x | lval | key2_y | rval | |
---|---|---|---|---|---|
0 | foo | one | 1 | one | 4 |
1 | foo | one | 1 | one | 5 |
2 | foo | two | 2 | one | 4 |
3 | foo | two | 2 | one | 5 |
4 | bar | one | 3 | one | 6 |
5 | bar | one | 3 | two | 7 |
|
|
key1 | key2_left | lval | key2_right | rval | |
---|---|---|---|---|---|
0 | foo | one | 1 | one | 4 |
1 | foo | one | 1 | one | 5 |
2 | foo | two | 2 | one | 4 |
3 | foo | two | 2 | one | 5 |
4 | bar | one | 3 | one | 6 |
5 | bar | one | 3 | two | 7 |
索引上的合并
|
|
|
|
key | value | |
---|---|---|
0 | a | 0 |
1 | b | 1 |
2 | a | 2 |
3 | a | 3 |
4 | b | 4 |
5 | c | 5 |
|
|
group_val | |
---|---|
a | 3.5 |
b | 7.0 |
|
|
key | value | group_val | |
---|---|---|---|
0 | a | 0 | 3.5 |
2 | a | 2 | 3.5 |
3 | a | 3 | 3.5 |
1 | b | 1 | 7.0 |
4 | b | 4 | 7.0 |
|
|
key | value | group_val | |
---|---|---|---|
0 | a | 0 | 3.5 |
2 | a | 2 | 3.5 |
3 | a | 3 | 3.5 |
1 | b | 1 | 7.0 |
4 | b | 4 | 7.0 |
5 | c | 5 | NaN |
|
|
data | key1 | key2 | |
---|---|---|---|
0 | 0.0 | Ohio | 2000 |
1 | 1.0 | Ohio | 2001 |
2 | 2.0 | Ohio | 2002 |
3 | 3.0 | Nevada | 2001 |
4 | 4.0 | Nevada | 2002 |
|
|
event1 | event2 | ||
---|---|---|---|
Nevada | 2001 | 0 | 1 |
2000 | 2 | 3 | |
Ohio | 2000 | 4 | 5 |
2000 | 6 | 7 | |
2001 | 8 | 9 | |
2002 | 10 | 11 |
|
|
data | key1 | key2 | event1 | event2 | |
---|---|---|---|---|---|
0 | 0.0 | Ohio | 2000 | 4 | 5 |
0 | 0.0 | Ohio | 2000 | 6 | 7 |
1 | 1.0 | Ohio | 2001 | 8 | 9 |
2 | 2.0 | Ohio | 2002 | 10 | 11 |
3 | 3.0 | Nevada | 2001 | 0 | 1 |
|
|
data | key1 | key2 | event1 | event2 | |
---|---|---|---|---|---|
0 | 0.0 | Ohio | 2000.0 | 4.0 | 5.0 |
0 | 0.0 | Ohio | 2000.0 | 6.0 | 7.0 |
1 | 1.0 | Ohio | 2001.0 | 8.0 | 9.0 |
2 | 2.0 | Ohio | 2002.0 | 10.0 | 11.0 |
3 | 3.0 | Nevada | 2001.0 | 0.0 | 1.0 |
4 | 4.0 | Nevada | 2002.0 | NaN | NaN |
4 | NaN | Nevada | 2000.0 | 2.0 | 3.0 |
|
|
|
|
Ohio | Nevada | |
---|---|---|
a | 1.0 | 2.0 |
c | 3.0 | 4.0 |
e | 5.0 | 6.0 |
|
|
Missouri | Alabama | |
---|---|---|
b | 7.0 | 8.0 |
c | 9.0 | 10.0 |
d | 11.0 | 12.0 |
e | 13.0 | 14.0 |
|
|
Ohio | Nevada | Missouri | Alabama | |
---|---|---|---|---|
a | 1.0 | 2.0 | NaN | NaN |
b | NaN | NaN | 7.0 | 8.0 |
c | 3.0 | 4.0 | 9.0 | 10.0 |
d | NaN | NaN | 11.0 | 12.0 |
e | 5.0 | 6.0 | 13.0 | 14.0 |
|
|
Ohio | Nevada | Missouri | Alabama | |
---|---|---|---|---|
a | 1.0 | 2.0 | NaN | NaN |
b | NaN | NaN | 7.0 | 8.0 |
c | 3.0 | 4.0 | 9.0 | 10.0 |
d | NaN | NaN | 11.0 | 12.0 |
e | 5.0 | 6.0 | 13.0 | 14.0 |
|
|
key | value | group_val | |
---|---|---|---|
0 | a | 0 | 3.5 |
1 | b | 1 | 7.0 |
2 | a | 2 | 3.5 |
3 | a | 3 | 3.5 |
4 | b | 4 | 7.0 |
5 | c | 5 | NaN |
|
|
New York | Oregon | |
---|---|---|
a | 7.0 | 8.0 |
c | 9.0 | 10.0 |
e | 11.0 | 12.0 |
f | 16.0 | 17.0 |
相当于三个表进行合并
|
|
Ohio | Nevada | |
---|---|---|
a | 1.0 | 2.0 |
c | 3.0 | 4.0 |
e | 5.0 | 6.0 |
Missouri | Alabama | |
---|---|---|
b | 7.0 | 8.0 |
c | 9.0 | 10.0 |
d | 11.0 | 12.0 |
e | 13.0 | 14.0 |
New York | Oregon | |
---|---|---|
a | 7.0 | 8.0 |
c | 9.0 | 10.0 |
e | 11.0 | 12.0 |
f | 16.0 | 17.0 |
Ohio | Nevada | Missouri | Alabama | New York | Oregon | |
---|---|---|---|---|---|---|
a | 1.0 | 2.0 | NaN | NaN | 7.0 | 8.0 |
c | 3.0 | 4.0 | 9.0 | 10.0 | 9.0 | 10.0 |
e | 5.0 | 6.0 | 13.0 | 14.0 | 11.0 | 12.0 |
|
|
轴向连接
之前指的都是行级别的连接操作
|
|
|
|
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
|
|
array([[ 0, 1, 2, 3, 0, 1, 2, 3],
[ 4, 5, 6, 7, 4, 5, 6, 7],
[ 8, 9, 10, 11, 8, 9, 10, 11]])
|
|
|
|
a 0
b 1
c 2
d 3
e 4
f 5
g 6
dtype: int64
|
|
0 | 1 | 2 | |
---|---|---|---|
a | 0.0 | NaN | NaN |
b | 1.0 | NaN | NaN |
c | NaN | 2.0 | NaN |
d | NaN | 3.0 | NaN |
e | NaN | 4.0 | NaN |
f | NaN | NaN | 5.0 |
g | NaN | NaN | 6.0 |
|
|
a 0
b 5
f 5
g 6
dtype: int64
|
|
0 | 1 | |
---|---|---|
a | 0.0 | 0 |
b | 1.0 | 5 |
f | NaN | 5 |
g | NaN | 6 |
|
|
0 | 1 | |
---|---|---|
a | 0 | 0 |
b | 1 | 5 |
|
|
0 | 1 | |
---|---|---|
a | 0.0 | 0.0 |
c | NaN | NaN |
b | 1.0 | 5.0 |
e | NaN | NaN |
在连接轴上建立一个层次化索引
|
|
a 0
b 1
dtype: int64
f 5
g 6
dtype: int64
|
|
one a 0
b 1
two a 0
b 1
three f 5
g 6
dtype: int64
|
|
a | b | f | g | |
---|---|---|---|---|
one | 0.0 | 1.0 | NaN | NaN |
two | 0.0 | 1.0 | NaN | NaN |
three | NaN | NaN | 5.0 | 6.0 |
|
|
one | two | three | |
---|---|---|---|
a | 0.0 | NaN | NaN |
b | 1.0 | NaN | NaN |
c | NaN | 2.0 | NaN |
d | NaN | 3.0 | NaN |
e | NaN | 4.0 | NaN |
f | NaN | NaN | 5.0 |
g | NaN | NaN | 6.0 |
|
|
one | two | |
---|---|---|
a | 0 | 1 |
b | 2 | 3 |
c | 4 | 5 |
three | four | |
---|---|---|
a | 5 | 6 |
c | 7 | 8 |
four | one | three | two | ||
---|---|---|---|---|---|
level1 | a | NaN | 0.0 | NaN | 1.0 |
b | NaN | 2.0 | NaN | 3.0 | |
c | NaN | 4.0 | NaN | 5.0 | |
level2 | a | 6.0 | NaN | 5.0 | NaN |
c | 8.0 | NaN | 7.0 | NaN |
level1 | level2 | |||
---|---|---|---|---|
one | two | three | four | |
a | 0 | 1 | 5.0 | 6.0 |
b | 2 | 3 | NaN | NaN |
c | 4 | 5 | 7.0 | 8.0 |
|
|
four | one | three | two | ||
---|---|---|---|---|---|
level1 | a | NaN | 0.0 | NaN | 1.0 |
b | NaN | 2.0 | NaN | 3.0 | |
c | NaN | 4.0 | NaN | 5.0 | |
level2 | a | 6.0 | NaN | 5.0 | NaN |
c | 8.0 | NaN | 7.0 | NaN |
level1 | level2 | |||
---|---|---|---|---|
one | two | three | four | |
a | 0 | 1 | 5.0 | 6.0 |
b | 2 | 3 | NaN | NaN |
c | 4 | 5 | 7.0 | 8.0 |
|
|
upper | level1 | level2 | ||
---|---|---|---|---|
lower | one | two | three | four |
a | 0 | 1 | 5.0 | 6.0 |
b | 2 | 3 | NaN | NaN |
c | 4 | 5 | 7.0 | 8.0 |
|
|
|
|
a | b | c | d | |
---|---|---|---|---|
0 | -0.204708 | 0.478943 | -0.519439 | -0.555730 |
1 | 1.965781 | 1.393406 | 0.092908 | 0.281746 |
2 | 0.769023 | 1.246435 | 1.007189 | -1.296221 |
|
|
b | d | a | |
---|---|---|---|
0 | 0.274992 | 0.228913 | 1.352917 |
1 | 0.886429 | -2.001637 | -0.371843 |
去除无关的行索引
|
|
a | b | c | d | |
---|---|---|---|---|
0 | -0.204708 | 0.478943 | -0.519439 | -0.555730 |
1 | 1.965781 | 1.393406 | 0.092908 | 0.281746 |
2 | 0.769023 | 1.246435 | 1.007189 | -1.296221 |
0 | 1.352917 | 0.274992 | NaN | 0.228913 |
1 | -0.371843 | 0.886429 | NaN | -2.001637 |
a | b | c | d | |
---|---|---|---|---|
0 | -0.204708 | 0.478943 | -0.519439 | -0.555730 |
1 | 1.965781 | 1.393406 | 0.092908 | 0.281746 |
2 | 0.769023 | 1.246435 | 1.007189 | -1.296221 |
3 | 1.352917 | 0.274992 | NaN | 0.228913 |
4 | -0.371843 | 0.886429 | NaN | -2.001637 |
合并重叠数据
|
|
|
|
f NaN
e 2.5
d NaN
c 3.5
b 4.5
a NaN
dtype: float64
|
|
f 0.0
e 1.0
d 2.0
c 3.0
b 4.0
a NaN
dtype: float64
|
|
array([ 0. , 2.5, 2. , 3.5, 4.5, nan])
combine_first, 重叠值合并,且进行数据对其
|
|
a NaN
b 4.5
c 3.0
d 2.0
e 1.0
f 0.0
dtype: float64
|
|
a | b | c | |
---|---|---|---|
0 | 1.0 | NaN | 2 |
1 | NaN | 2.0 | 6 |
2 | 5.0 | NaN | 10 |
3 | NaN | 6.0 | 14 |
a | b | |
---|---|---|
0 | 5.0 | NaN |
1 | 4.0 | 3.0 |
2 | NaN | 4.0 |
3 | 3.0 | 6.0 |
4 | 7.0 | 8.0 |
a | b | c | |
---|---|---|---|
0 | 1.0 | NaN | 2.0 |
1 | 4.0 | 2.0 | 6.0 |
2 | 5.0 | 4.0 | 10.0 |
3 | 3.0 | 6.0 | 14.0 |
4 | 7.0 | 8.0 | NaN |
重塑和轴向旋转
重塑层次化索引
|
|
number | one | two | three |
---|---|---|---|
state | |||
Ohio | 0 | 1 | 2 |
Colorado | 3 | 4 | 5 |
stack将列旋转为行
|
|
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
unstack将行旋转为列,默认操作最内层
|
|
number | one | two | three |
---|---|---|---|
state | |||
Ohio | 0 | 1 | 2 |
Colorado | 3 | 4 | 5 |
|
|
state | Ohio | Colorado |
---|---|---|
number | ||
one | 0 | 3 |
two | 1 | 4 |
three | 2 | 5 |
|
|
state | Ohio | Colorado |
---|---|---|
number | ||
one | 0 | 3 |
two | 1 | 4 |
three | 2 | 5 |
|
|
a 0
b 1
c 2
d 3
dtype: int64
c 4
d 5
e 6
dtype: int64
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
a | b | c | d | e | |
---|---|---|---|---|---|
one | 0.0 | 1.0 | 2.0 | 3.0 | NaN |
two | NaN | NaN | 4.0 | 5.0 | 6.0 |
stack默认会滤除缺失值,因此两者可逆
|
|
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
|
|
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
|
|
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
side | left | right | |
---|---|---|---|
state | number | ||
Ohio | one | 0 | 5 |
two | 1 | 6 | |
three | 2 | 7 | |
Colorado | one | 3 | 8 |
two | 4 | 9 | |
three | 5 | 10 |
DataFrame作为旋转轴的级别将成为结果中的最低级别(axis=MAX)
|
|
side | left | right | ||
---|---|---|---|---|
state | Ohio | Colorado | Ohio | Colorado |
number | ||||
one | 0 | 3 | 5 | 8 |
two | 1 | 4 | 6 | 9 |
three | 2 | 5 | 7 | 10 |
stack操作将axis-1?
|
|
state | Ohio | Colorado | |
---|---|---|---|
number | side | ||
one | left | 0 | 3 |
right | 5 | 8 | |
two | left | 1 | 4 |
right | 6 | 9 | |
three | left | 2 | 5 |
right | 7 | 10 |
|
|
side | left | right | ||
---|---|---|---|---|
state | Ohio | Colorado | Ohio | Colorado |
number | ||||
one | 0 | 3 | 5 | 8 |
two | 1 | 4 | 6 | 9 |
three | 2 | 5 | 7 | 10 |
side | left | right | |
---|---|---|---|
number | state | ||
one | Ohio | 0 | 5 |
Colorado | 3 | 8 | |
two | Ohio | 1 | 6 |
Colorado | 4 | 9 | |
three | Ohio | 2 | 7 |
Colorado | 5 | 10 |
将长格式旋转为宽格式
|
|
year | quarter | realgdp | realcons | realinv | realgovt | realdpi | cpi | m1 | tbilrate | unemp | pop | infl | realint | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1959.0 | 1.0 | 2710.349 | 1707.4 | 286.898 | 470.045 | 1886.9 | 28.98 | 139.7 | 2.82 | 5.8 | 177.146 | 0.00 | 0.00 |
1 | 1959.0 | 2.0 | 2778.801 | 1733.7 | 310.859 | 481.301 | 1919.7 | 29.15 | 141.7 | 3.08 | 5.1 | 177.830 | 2.34 | 0.74 |
2 | 1959.0 | 3.0 | 2775.488 | 1751.8 | 289.226 | 491.260 | 1916.4 | 29.35 | 140.5 | 3.82 | 5.3 | 178.657 | 2.74 | 1.09 |
3 | 1959.0 | 4.0 | 2785.204 | 1753.7 | 299.356 | 484.052 | 1931.3 | 29.37 | 140.0 | 4.33 | 5.6 | 179.386 | 0.27 | 4.06 |
4 | 1960.0 | 1.0 | 2847.699 | 1770.5 | 331.722 | 462.199 | 1955.5 | 29.54 | 139.6 | 3.50 | 5.2 | 180.007 | 2.31 | 1.19 |
5 | 1960.0 | 2.0 | 2834.390 | 1792.9 | 298.152 | 460.400 | 1966.1 | 29.55 | 140.2 | 2.68 | 5.2 | 180.671 | 0.14 | 2.55 |
6 | 1960.0 | 3.0 | 2839.022 | 1785.8 | 296.375 | 474.676 | 1967.8 | 29.75 | 140.9 | 2.36 | 5.6 | 181.528 | 2.70 | -0.34 |
7 | 1960.0 | 4.0 | 2802.616 | 1788.2 | 259.764 | 476.434 | 1966.6 | 29.84 | 141.1 | 2.29 | 6.3 | 182.287 | 1.21 | 1.08 |
8 | 1961.0 | 1.0 | 2819.264 | 1787.7 | 266.405 | 475.854 | 1984.5 | 29.81 | 142.1 | 2.37 | 6.8 | 182.992 | -0.40 | 2.77 |
9 | 1961.0 | 2.0 | 2872.005 | 1814.3 | 286.246 | 480.328 | 2014.4 | 29.92 | 142.9 | 2.29 | 7.0 | 183.691 | 1.47 | 0.81 |
PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
'1960Q3', '1960Q4', '1961Q1', '1961Q2'],
dtype='int64', name='date', freq='Q-DEC')
item | realgdp | infl | unemp |
---|---|---|---|
date | |||
1959-03-31 | 2710.349 | 0.00 | 5.8 |
1959-06-30 | 2778.801 | 2.34 | 5.1 |
1959-09-30 | 2775.488 | 2.74 | 5.3 |
1959-12-31 | 2785.204 | 0.27 | 5.6 |
1960-03-31 | 2847.699 | 2.31 | 5.2 |
1960-06-30 | 2834.390 | 0.14 | 5.2 |
1960-09-30 | 2839.022 | 2.70 | 5.6 |
1960-12-31 | 2802.616 | 1.21 | 6.3 |
1961-03-31 | 2819.264 | -0.40 | 6.8 |
1961-06-30 | 2872.005 | 1.47 | 7.0 |
rec.array([(datetime.datetime(1959, 3, 31, 0, 0), 2710.349, 0.0, 5.8),
(datetime.datetime(1959, 6, 30, 0, 0), 2778.801, 2.34, 5.1),
(datetime.datetime(1959, 9, 30, 0, 0), 2775.488, 2.74, 5.3),
(datetime.datetime(1959, 12, 31, 0, 0), 2785.204, 0.27, 5.6),
(datetime.datetime(1960, 3, 31, 0, 0), 2847.699, 2.31, 5.2),
(datetime.datetime(1960, 6, 30, 0, 0), 2834.39, 0.14, 5.2),
(datetime.datetime(1960, 9, 30, 0, 0), 2839.022, 2.7, 5.6),
(datetime.datetime(1960, 12, 31, 0, 0), 2802.616, 1.21, 6.3),
(datetime.datetime(1961, 3, 31, 0, 0), 2819.264, -0.4, 6.8),
(datetime.datetime(1961, 6, 30, 0, 0), 2872.005, 1.47, 7.0)],
dtype=[('date', 'O'), ('realgdp', '<f8'), ('infl', '<f8'), ('unemp', '<f8')])
date item
1959-03-31 realgdp 2710.349
infl 0.000
unemp 5.800
1959-06-30 realgdp 2778.801
infl 2.340
unemp 5.100
1959-09-30 realgdp 2775.488
infl 2.740
unemp 5.300
1959-12-31 realgdp 2785.204
dtype: float64
|
|
date | item | value | |
---|---|---|---|
0 | 1959-03-31 | realgdp | 2710.349 |
1 | 1959-03-31 | infl | 0.000 |
2 | 1959-03-31 | unemp | 5.800 |
3 | 1959-06-30 | realgdp | 2778.801 |
4 | 1959-06-30 | infl | 2.340 |
5 | 1959-06-30 | unemp | 5.100 |
6 | 1959-09-30 | realgdp | 2775.488 |
7 | 1959-09-30 | infl | 2.740 |
8 | 1959-09-30 | unemp | 5.300 |
9 | 1959-12-31 | realgdp | 2785.204 |
|
|
item | infl | realgdp | unemp |
---|---|---|---|
date | |||
1959-03-31 | 0.00 | 2710.349 | 5.8 |
1959-06-30 | 2.34 | 2778.801 | 5.1 |
1959-09-30 | 2.74 | 2775.488 | 5.3 |
1959-12-31 | 0.27 | 2785.204 | 5.6 |
1960-03-31 | 2.31 | 2847.699 | 5.2 |
|
|
date | item | value | value2 | |
---|---|---|---|---|
0 | 1959-03-31 | realgdp | 2710.349 | -0.204708 |
1 | 1959-03-31 | infl | 0.000 | 0.478943 |
2 | 1959-03-31 | unemp | 5.800 | -0.519439 |
3 | 1959-06-30 | realgdp | 2778.801 | -0.555730 |
4 | 1959-06-30 | infl | 2.340 | 1.965781 |
5 | 1959-06-30 | unemp | 5.100 | 1.393406 |
6 | 1959-09-30 | realgdp | 2775.488 | 0.092908 |
7 | 1959-09-30 | infl | 2.740 | 0.281746 |
8 | 1959-09-30 | unemp | 5.300 | 0.769023 |
9 | 1959-12-31 | realgdp | 2785.204 | 1.246435 |
|
|
value | value2 | |||||
---|---|---|---|---|---|---|
item | infl | realgdp | unemp | infl | realgdp | unemp |
date | ||||||
1959-03-31 | 0.00 | 2710.349 | 5.8 | 0.478943 | -0.204708 | -0.519439 |
1959-06-30 | 2.34 | 2778.801 | 5.1 | 1.965781 | -0.555730 | 1.393406 |
1959-09-30 | 2.74 | 2775.488 | 5.3 | 0.281746 | 0.092908 | 0.769023 |
1959-12-31 | 0.27 | 2785.204 | 5.6 | 1.007189 | 1.246435 | -1.296221 |
1960-03-31 | 2.31 | 2847.699 | 5.2 | 0.228913 | 0.274992 | 1.352917 |
|
|
item | infl | realgdp | unemp |
---|---|---|---|
date | |||
1959-03-31 | 0.00 | 2710.349 | 5.8 |
1959-06-30 | 2.34 | 2778.801 | 5.1 |
1959-09-30 | 2.74 | 2775.488 | 5.3 |
1959-12-31 | 0.27 | 2785.204 | 5.6 |
1960-03-31 | 2.31 | 2847.699 | 5.2 |
|
|
value | value2 | ||
---|---|---|---|
date | item | ||
1959-03-31 | realgdp | 2710.349 | -0.204708 |
infl | 0.000 | 0.478943 | |
unemp | 5.800 | -0.519439 | |
1959-06-30 | realgdp | 2778.801 | -0.555730 |
infl | 2.340 | 1.965781 | |
unemp | 5.100 | 1.393406 | |
1959-09-30 | realgdp | 2775.488 | 0.092908 |
infl | 2.740 | 0.281746 | |
unemp | 5.300 | 0.769023 |
value | value2 | |||||
---|---|---|---|---|---|---|
item | infl | realgdp | unemp | infl | realgdp | unemp |
date | ||||||
1959-03-31 | 0.00 | 2710.349 | 5.8 | 0.478943 | -0.204708 | -0.519439 |
1959-06-30 | 2.34 | 2778.801 | 5.1 | 1.965781 | -0.555730 | 1.393406 |
1959-09-30 | 2.74 | 2775.488 | 5.3 | 0.281746 | 0.092908 | 0.769023 |
1959-12-31 | 0.27 | 2785.204 | 5.6 | 1.007189 | 1.246435 | -1.296221 |
1960-03-31 | 2.31 | 2847.699 | 5.2 | 0.228913 | 0.274992 | 1.352917 |
1960-06-30 | 0.14 | 2834.390 | 5.2 | -2.001637 | 0.886429 | -0.371843 |
1960-09-30 | 2.70 | 2839.022 | 5.6 | -0.438570 | 1.669025 | -0.539741 |
数据转换
移除重复值
|
|
k1 | k2 | |
---|---|---|
0 | one | 1 |
1 | one | 1 |
2 | one | 2 |
3 | two | 3 |
4 | two | 3 |
5 | two | 4 |
6 | two | 4 |
|
|
0 False
1 True
2 False
3 False
4 True
5 False
6 True
dtype: bool
|
|
k1 | k2 | |
---|---|---|
0 | one | 1 |
2 | one | 2 |
3 | two | 3 |
5 | two | 4 |
|
|
k1 | k2 | v1 | |
---|---|---|---|
0 | one | 1 | 0 |
1 | one | 1 | 1 |
2 | one | 2 | 2 |
3 | two | 3 | 3 |
4 | two | 3 | 4 |
5 | two | 4 | 5 |
6 | two | 4 | 6 |
k1 | k2 | v1 | |
---|---|---|---|
0 | one | 1 | 0 |
3 | two | 3 | 3 |
|
|
k1 | k2 | v1 | |
---|---|---|---|
1 | one | 1 | 1 |
2 | one | 2 | 2 |
4 | two | 3 | 4 |
6 | two | 4 | 6 |
k1 | k2 | v1 | |
---|---|---|---|
0 | one | 1 | 0 |
2 | one | 2 | 2 |
3 | two | 3 | 3 |
5 | two | 4 | 5 |
利用函数或映射进行数据转换
|
|
food | ounces | |
---|---|---|
0 | bacon | 4.0 |
1 | pulled pork | 3.0 |
2 | bacon | 12.0 |
3 | Pastrami | 6.0 |
4 | corned beef | 7.5 |
5 | Bacon | 8.0 |
6 | pastrami | 3.0 |
7 | honey ham | 5.0 |
8 | nova lox | 6.0 |
|
|
|
|
food | ounces | animal | |
---|---|---|---|
0 | bacon | 4.0 | pig |
1 | pulled pork | 3.0 | pig |
2 | bacon | 12.0 | pig |
3 | Pastrami | 6.0 | cow |
4 | corned beef | 7.5 | cow |
5 | Bacon | 8.0 | pig |
6 | pastrami | 3.0 | cow |
7 | honey ham | 5.0 | pig |
8 | nova lox | 6.0 | salmon |
|
|
0 pig
1 pig
2 pig
3 cow
4 cow
5 pig
6 cow
7 pig
8 salmon
Name: food, dtype: object
替换值
|
|
0 1.0
1 -999.0
2 2.0
3 -999.0
4 -1000.0
5 3.0
dtype: float64
|
|
0 1.0
1 NaN
2 2.0
3 NaN
4 -1000.0
5 3.0
dtype: float64
|
|
0 1.0
1 NaN
2 2.0
3 NaN
4 NaN
5 3.0
dtype: float64
|
|
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
|
|
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
重命名轴索引
|
|
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 1 | 2 | 3 |
Colorado | 4 | 5 | 6 | 7 |
New York | 8 | 9 | 10 | 11 |
|
|
array(['OHIO', 'COLORADO', 'NEW YORK'], dtype=object)
|
|
one | two | three | four | |
---|---|---|---|---|
OHIO | 0 | 1 | 2 | 3 |
COLORADO | 4 | 5 | 6 | 7 |
NEW YORK | 8 | 9 | 10 | 11 |
|
|
ONE | TWO | THREE | FOUR | |
---|---|---|---|---|
Ohio | 0 | 1 | 2 | 3 |
Colorado | 4 | 5 | 6 | 7 |
New York | 8 | 9 | 10 | 11 |
|
|
one | two | peekaboo | four | |
---|---|---|---|---|
INDIANA | 0 | 1 | 2 | 3 |
COLORADO | 4 | 5 | 6 | 7 |
NEW YORK | 8 | 9 | 10 | 11 |
|
|
one | two | three | four | |
---|---|---|---|---|
INDIANA | 0 | 1 | 2 | 3 |
COLORADO | 4 | 5 | 6 | 7 |
NEW YORK | 8 | 9 | 10 | 11 |
离散化和面元划分
|
|
|
|
[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, object): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]
|
|
array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)
|
|
Index(['(18, 25]', '(25, 35]', '(35, 60]', '(60, 100]'], dtype='object')
|
|
(18, 25] 5
(35, 60] 3
(25, 35] 3
(60, 100] 1
dtype: int64
|
|
[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]
Length: 12
Categories (4, object): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]
|
|
[Youth, Youth, Youth, YoungAdult, Youth, ..., YoungAdult, Senior, MiddleAged, MiddleAged, YoungAdult]
Length: 12
Categories (4, object): [Youth < YoungAdult < MiddleAged < Senior]
|
|
[(0.25, 0.49], (0.25, 0.49], (0.73, 0.98], (0.25, 0.49], (0.25, 0.49], ..., (0.25, 0.49], (0.73, 0.98], (0.49, 0.73], (0.49, 0.73], (0.49, 0.73]]
Length: 20
Categories (4, object): [(0.0032, 0.25] < (0.25, 0.49] < (0.49, 0.73] < (0.73, 0.98]]
|
|
[(0.636, 3.26], [-3.745, -0.648], (0.636, 3.26], (-0.022, 0.636], (-0.648, -0.022], ..., (0.636, 3.26], (-0.022, 0.636], [-3.745, -0.648], (-0.022, 0.636], (-0.022, 0.636]]
Length: 1000
Categories (4, object): [[-3.745, -0.648] < (-0.648, -0.022] < (-0.022, 0.636] < (0.636, 3.26]]
|
|
(0.636, 3.26] 250
(-0.022, 0.636] 250
(-0.648, -0.022] 250
[-3.745, -0.648] 250
dtype: int64
|
|
[(-0.022, 1.298], [-3.745, -1.274], (-0.022, 1.298], (-0.022, 1.298], (-1.274, -0.022], ..., (-0.022, 1.298], (-0.022, 1.298], [-3.745, -1.274], (-0.022, 1.298], (-0.022, 1.298]]
Length: 1000
Categories (4, object): [[-3.745, -1.274] < (-1.274, -0.022] < (-0.022, 1.298] < (1.298, 3.26]]
检测和过滤异常值
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 |
mean | -0.067684 | 0.067924 | 0.025598 | -0.002298 |
std | 0.998035 | 0.992106 | 1.006835 | 0.996794 |
min | -3.428254 | -3.548824 | -3.184377 | -3.745356 |
25% | -0.774890 | -0.591841 | -0.641675 | -0.644144 |
50% | -0.116401 | 0.101143 | 0.002073 | -0.013611 |
75% | 0.616366 | 0.780282 | 0.680391 | 0.654328 |
max | 3.366626 | 2.653656 | 3.260383 | 3.927528 |
|
|
97 3.927528
305 -3.399312
400 -3.745356
Name: 3, dtype: float64
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
5 | -0.539741 | 0.476985 | 3.248944 | -1.021228 |
97 | -0.774363 | 0.552936 | 0.106061 | 3.927528 |
102 | -0.655054 | -0.565230 | 3.176873 | 0.959533 |
305 | -2.315555 | 0.457246 | -0.025907 | -3.399312 |
324 | 0.050188 | 1.951312 | 3.260383 | 0.963301 |
400 | 0.146326 | 0.508391 | -0.196713 | -3.745356 |
499 | -0.293333 | -0.242459 | -3.056990 | 1.918403 |
523 | -3.428254 | -0.296336 | -0.439938 | -0.867165 |
586 | 0.275144 | 1.179227 | -3.184377 | 1.369891 |
808 | -0.362528 | -3.548824 | 1.553205 | -2.186301 |
900 | 3.366626 | -2.372214 | 0.851010 | 1.332846 |
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 |
mean | -0.067623 | 0.068473 | 0.025153 | -0.002081 |
std | 0.995485 | 0.990253 | 1.003977 | 0.989736 |
min | -3.000000 | -3.000000 | -3.000000 | -3.000000 |
25% | -0.774890 | -0.591841 | -0.641675 | -0.644144 |
50% | -0.116401 | 0.101143 | 0.002073 | -0.013611 |
75% | 0.616366 | 0.780282 | 0.680391 | 0.654328 |
max | 3.000000 | 2.653656 | 3.000000 | 3.000000 |
排列和随机采样
|
|
array([1, 0, 2, 3, 4])
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 | 7 |
2 | 8 | 9 | 10 | 11 |
3 | 12 | 13 | 14 | 15 |
4 | 16 | 17 | 18 | 19 |
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
1 | 4 | 5 | 6 | 7 |
0 | 0 | 1 | 2 | 3 |
2 | 8 | 9 | 10 | 11 |
3 | 12 | 13 | 14 | 15 |
4 | 16 | 17 | 18 | 19 |
|
|
0 | 1 | 2 | 3 | |
---|---|---|---|---|
1 | 4 | 5 | 6 | 7 |
0 | 0 | 1 | 2 | 3 |
4 | 16 | 17 | 18 | 19 |
|
|
|
|
array([3, 0, 4, 1, 1, 2, 3, 0, 1, 2])
|
|
array([ 6, 5, 4, 7, 7, -1, 6, 5, 7, -1])
计算指标 / 哑变量
|
|
data1 | key | |
---|---|---|
0 | 0 | b |
1 | 1 | b |
2 | 2 | a |
3 | 3 | c |
4 | 4 | a |
5 | 5 | b |
a | b | c | |
---|---|---|---|
0 | 0.0 | 1.0 | 0.0 |
1 | 0.0 | 1.0 | 0.0 |
2 | 1.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 1.0 |
4 | 1.0 | 0.0 | 0.0 |
5 | 0.0 | 1.0 | 0.0 |
|
|
key_a | key_b | key_c | |
---|---|---|---|
0 | 0.0 | 1.0 | 0.0 |
1 | 0.0 | 1.0 | 0.0 |
2 | 1.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 1.0 |
4 | 1.0 | 0.0 | 0.0 |
5 | 0.0 | 1.0 | 0.0 |
data1 | key_a | key_b | key_c | |
---|---|---|---|---|
0 | 0 | 0.0 | 1.0 | 0.0 |
1 | 1 | 0.0 | 1.0 | 0.0 |
2 | 2 | 1.0 | 0.0 | 0.0 |
3 | 3 | 0.0 | 0.0 | 1.0 |
4 | 4 | 1.0 | 0.0 | 0.0 |
5 | 5 | 0.0 | 1.0 | 0.0 |
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
app.launch_new_instance()
movie_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children’s|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children’s|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
5 | 6 | Heat (1995) | Action|Crime|Thriller |
6 | 7 | Sabrina (1995) | Comedy|Romance |
7 | 8 | Tom and Huck (1995) | Adventure|Children’s |
8 | 9 | Sudden Death (1995) | Action |
9 | 10 | GoldenEye (1995) | Action|Adventure|Thriller |
|
|
['Action',
'Adventure',
'Animation',
"Children's",
'Comedy',
'Crime',
'Documentary',
'Drama',
'Fantasy',
'Film-Noir',
'Horror',
'Musical',
'Mystery',
'Romance',
'Sci-Fi',
'Thriller',
'War',
'Western']
|
|
Action | Adventure | Animation | Children’s | Comedy | |
---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|
|
Action | Adventure | Animation | Children’s | Comedy | |
---|---|---|---|---|---|
0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
1 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
5 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
7 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
8 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
|
|
movie_id 1
title Toy Story (1995)
genres Animation|Children's|Comedy
Genre_Action 0
Genre_Adventure 0
Genre_Animation 1
Genre_Children's 1
Genre_Comedy 1
Genre_Crime 0
Genre_Documentary 0
Genre_Drama 0
Genre_Fantasy 0
Genre_Film-Noir 0
Genre_Horror 0
Genre_Musical 0
Genre_Mystery 0
Genre_Romance 0
Genre_Sci-Fi 0
Genre_Thriller 0
Genre_War 0
Genre_Western 0
Name: 0, dtype: object
|
|
|
|
array([ 0.9296, 0.3164, 0.1839, 0.2046, 0.5677, 0.5955, 0.9645,
0.6532, 0.7489, 0.6536])
|
|
(0, 0.2] | (0.2, 0.4] | (0.4, 0.6] | (0.6, 0.8] | (0.8, 1] | |
---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
1 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
2 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
5 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
7 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
8 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
9 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
字符串操作
字符串对象方法
|
|
['a', 'b', ' guido']
|
|
['a', 'b', 'guido']
|
|
'a::b::guido'
Surprise :P
|
|
'a::b::guido'
|
|
True
|
|
1
|
|
-1
|
|
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-110-280f8b2856ce> in <module>()
----> 1 val.index(':')
ValueError: substring not found
|
|
2
|
|
'a::b:: guido'
|
|
'ab guido'
正则表达式
|
|
['foo', 'bar', 'baz', 'qux']
|
|
['foo', 'bar', 'baz', 'qux']
|
|
[' ', '\t ', ' \t']
|
|
|
|
['dave@google.com', 'steve@gmail.com', 'rob@gmail.com', 'ryan@yahoo.com']
Search只返回第一项
|
|
<_sre.SRE_Match object; span=(5, 20), match='dave@google.com'>
|
|
'dave@google.com'
只匹配出现在字符串开头的模式
|
|
None
替换
|
|
Dave REDACTED
Steve REDACTED
Rob REDACTED
Ryan REDACTED
|
|
|
|
('wesm', 'bright', 'net')
|
|
[('dave', 'google', 'com'),
('steve', 'gmail', 'com'),
('rob', 'gmail', 'com'),
('ryan', 'yahoo', 'com')]
|
|
Dave Username: dave, Domain: google, Suffix: com
Steve Username: steve, Domain: gmail, Suffix: com
Rob Username: rob, Domain: gmail, Suffix: com
Ryan Username: ryan, Domain: yahoo, Suffix: com
|
|
|
|
{'domain': 'bright', 'suffix': 'net', 'username': 'wesm'}
pandas中矢量化的字符串函数
|
|
|
|
Dave dave@google.com
Rob rob@gmail.com
Steve steve@gmail.com
Wes NaN
dtype: object
|
|
Dave False
Rob False
Steve False
Wes True
dtype: bool
|
|
Dave False
Rob True
Steve True
Wes NaN
dtype: object
|
|
'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'
|
|
Dave [(dave, google, com)]
Rob [(rob, gmail, com)]
Steve [(steve, gmail, com)]
Wes NaN
dtype: object
|
|
C:\Users\Ewan\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: In future versions of pandas, match will change to always return a bool indexer.
if __name__ == '__main__':
Dave (dave, google, com)
Rob (rob, gmail, com)
Steve (steve, gmail, com)
Wes NaN
dtype: object
|
|
Dave google
Rob gmail
Steve gmail
Wes NaN
dtype: object
|
|
Dave dave
Rob rob
Steve steve
Wes NaN
dtype: object
|
|
Dave dave@
Rob rob@g
Steve steve
Wes NaN
dtype: object
Example: USDA Food Database
|
|
|
|
6636
|
|
dict_keys(['id', 'tags', 'portions', 'nutrients', 'description', 'group', 'manufacturer'])
|
|
{'description': 'Protein',
'group': 'Composition',
'units': 'g',
'value': 25.18}
|
|
description | group | units | value | |
---|---|---|---|---|
0 | Protein | Composition | g | 25.18 |
1 | Total lipid (fat) | Composition | g | 29.20 |
2 | Carbohydrate, by difference | Composition | g | 3.06 |
3 | Ash | Other | g | 3.28 |
4 | Energy | Energy | kcal | 376.00 |
5 | Water | Composition | g | 39.28 |
6 | Energy | Energy | kJ | 1573.00 |
|
|
|
|
description | group | id | manufacturer | |
---|---|---|---|---|
0 | Cheese, caraway | Dairy and Egg Products | 1008 | |
1 | Cheese, cheddar | Dairy and Egg Products | 1009 | |
2 | Cheese, edam | Dairy and Egg Products | 1018 | |
3 | Cheese, feta | Dairy and Egg Products | 1019 | |
4 | Cheese, mozzarella, part skim milk | Dairy and Egg Products | 1028 |
|
|
Vegetables and Vegetable Products 812
Beef Products 618
Baked Products 496
Breakfast Cereals 403
Legumes and Legume Products 365
Fast Foods 365
Lamb, Veal, and Game Products 345
Sweets 341
Pork Products 328
Fruits and Fruit Juices 328
Name: group, dtype: int64
|
|
|
|
description | group | units | value | id | |
---|---|---|---|---|---|
0 | Protein | Composition | g | 25.18 | 1008 |
1 | Total lipid (fat) | Composition | g | 29.20 | 1008 |
2 | Carbohydrate, by difference | Composition | g | 3.06 | 1008 |
3 | Ash | Other | g | 3.28 | 1008 |
4 | Energy | Energy | kcal | 376.00 | 1008 |
5 | Water | Composition | g | 39.28 | 1008 |
6 | Energy | Energy | kJ | 1573.00 | 1008 |
7 | Fiber, total dietary | Composition | g | 0.00 | 1008 |
8 | Calcium, Ca | Elements | mg | 673.00 | 1008 |
9 | Iron, Fe | Elements | mg | 0.64 | 1008 |
|
|
14179
|
|
|
|
food | fgroup | id | manufacturer | |
---|---|---|---|---|
0 | Cheese, caraway | Dairy and Egg Products | 1008 | |
1 | Cheese, cheddar | Dairy and Egg Products | 1009 | |
2 | Cheese, edam | Dairy and Egg Products | 1018 | |
3 | Cheese, feta | Dairy and Egg Products | 1019 | |
4 | Cheese, mozzarella, part skim milk | Dairy and Egg Products | 1028 | |
5 | Cheese, mozzarella, part skim milk, low moisture | Dairy and Egg Products | 1029 | |
6 | Cheese, romano | Dairy and Egg Products | 1038 | |
7 | Cheese, roquefort | Dairy and Egg Products | 1039 | |
8 | Cheese spread, pasteurized process, american, … | Dairy and Egg Products | 1048 | |
9 | Cream, fluid, half and half | Dairy and Egg Products | 1049 |
|
|
nutrient | nutgroup | units | value | id | |
---|---|---|---|---|---|
0 | Protein | Composition | g | 25.18 | 1008 |
1 | Total lipid (fat) | Composition | g | 29.20 | 1008 |
2 | Carbohydrate, by difference | Composition | g | 3.06 | 1008 |
3 | Ash | Other | g | 3.28 | 1008 |
4 | Energy | Energy | kcal | 376.00 | 1008 |
5 | Water | Composition | g | 39.28 | 1008 |
6 | Energy | Energy | kJ | 1573.00 | 1008 |
7 | Fiber, total dietary | Composition | g | 0.00 | 1008 |
8 | Calcium, Ca | Elements | mg | 673.00 | 1008 |
9 | Iron, Fe | Elements | mg | 0.64 | 1008 |
|
|
|
|
nutrient | nutgroup | units | value | id | food | fgroup | manufacturer | |
---|---|---|---|---|---|---|---|---|
0 | Protein | Composition | g | 25.18 | 1008 | Cheese, caraway | Dairy and Egg Products | |
1 | Total lipid (fat) | Composition | g | 29.20 | 1008 | Cheese, caraway | Dairy and Egg Products | |
2 | Carbohydrate, by difference | Composition | g | 3.06 | 1008 | Cheese, caraway | Dairy and Egg Products | |
3 | Ash | Other | g | 3.28 | 1008 | Cheese, caraway | Dairy and Egg Products | |
4 | Energy | Energy | kcal | 376.00 | 1008 | Cheese, caraway | Dairy and Egg Products | |
5 | Water | Composition | g | 39.28 | 1008 | Cheese, caraway | Dairy and Egg Products | |
6 | Energy | Energy | kJ | 1573.00 | 1008 | Cheese, caraway | Dairy and Egg Products | |
7 | Fiber, total dietary | Composition | g | 0.00 | 1008 | Cheese, caraway | Dairy and Egg Products | |
8 | Calcium, Ca | Elements | mg | 673.00 | 1008 | Cheese, caraway | Dairy and Egg Products | |
9 | Iron, Fe | Elements | mg | 0.64 | 1008 | Cheese, caraway | Dairy and Egg Products |
|
|
nutrient Glycine
nutgroup Amino Acids
units g
value 0.04
id 6158
food Soup, tomato bisque, canned, condensed
fgroup Soups, Sauces, and Gravies
manufacturer
Name: 30000, dtype: object
|
|
nutrient fgroup
Adjusted Protein Sweets 12.900
Vegetables and Vegetable Products 2.180
Alanine Baby Foods 0.085
Baked Products 0.248
Beef Products 1.550
Beverages 0.003
Breakfast Cereals 0.311
Cereal Grains and Pasta 0.373
Dairy and Egg Products 0.271
Ethnic Foods 1.290
Name: value, dtype: float64
<matplotlib.axes._subplots.AxesSubplot at 0x1ba08c9e780>
|
|
|
|
nutrient
Alanine Gelatins, dry powder, unsweetened
Arginine Seeds, sesame flour, low-fat
Aspartic acid Soy protein isolate
Cystine Seeds, cottonseed flour, low fat (glandless)
Glutamic acid Soy protein isolate
Glycine Gelatins, dry powder, unsweetened
Histidine Whale, beluga, meat, dried (Alaska Native)
Hydroxyproline KENTUCKY FRIED CHICKEN, Fried Chicken, ORIGINA...
Isoleucine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Leucine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Lysine Seal, bearded (Oogruk), meat, dried (Alaska Na...
Methionine Fish, cod, Atlantic, dried and salted
Phenylalanine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Proline Gelatins, dry powder, unsweetened
Serine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Threonine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Tryptophan Sea lion, Steller, meat with fat (Alaska Native)
Tyrosine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Valine Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Name: food, dtype: object