Data loading, storage, and file formats
|
|
|
|
'C:\\Users\\Ewan\\Downloads\\pydata-book-master'
Reading and Writing Data in Text Format
|
|
a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
|
|
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
|
|
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
a | b | c | d | |
---|---|---|---|---|
message | ||||
hello | 1 | 2 | 3 | 4 |
world | 5 | 6 | 7 | 8 |
foo | 9 | 10 | 11 | 12 |
|
|
key1,key2,value1,value2
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16
value1 | value2 | ||
---|---|---|---|
key1 | key2 | ||
one | a | 1 | 2 |
b | 3 | 4 | |
c | 5 | 6 | |
d | 7 | 8 | |
two | a | 9 | 10 |
b | 11 | 12 | |
c | 13 | 14 | |
d | 15 | 16 |
|
|
[' A B C\n',
'aaa -0.264438 -1.026059 -0.619500\n',
'bbb 0.927272 0.302904 -0.032399\n',
'ccc -0.264273 -0.386314 -0.217601\n',
'ddd -0.871858 -0.348382 1.100491\n']
采用正则表达式作为分隔符
|
|
A | B | C | |
---|---|---|---|
aaa | -0.264438 | -1.026059 | -0.619500 |
bbb | 0.927272 | 0.302904 | -0.032399 |
ccc | -0.264273 | -0.386314 | -0.217601 |
ddd | -0.871858 | -0.348382 | 1.100491 |
|
|
# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo
something | a | b | c | d | message | |
---|---|---|---|---|---|---|
0 | one | 1 | 2 | 3.0 | 4 | NaN |
1 | two | 5 | 6 | NaN | 8 | world |
2 | three | 9 | 10 | 11.0 | 12 | foo |
something | a | b | c | d | message | |
---|---|---|---|---|---|---|
0 | False | False | False | False | False | True |
1 | False | False | False | True | False | False |
2 | False | False | False | False | False | False |
|
|
something | a | b | c | d | message | |
---|---|---|---|---|---|---|
0 | one | 1 | 2 | 3.0 | 4 | NaN |
1 | two | 5 | 6 | NaN | 8 | world |
2 | three | 9 | 10 | 11.0 | 12 | foo |
|
|
something | a | b | c | d | message | |
---|---|---|---|---|---|---|
0 | one | 1 | 2 | 3.0 | 4 | NaN |
1 | NaN | 5 | 6 | NaN | 8 | world |
2 | three | 9 | 10 | 11.0 | 12 | NaN |
逐块读取文本文件
|
|
one | two | three | four | key | |
---|---|---|---|---|---|
0 | 0.467976 | -0.038649 | -0.295344 | -1.824726 | L |
1 | -0.358893 | 1.404453 | 0.704965 | -0.200638 | B |
2 | -0.501840 | 0.659254 | -0.421691 | -0.057688 | G |
3 | 0.204886 | 1.074134 | 1.388361 | -0.982404 | R |
4 | 0.354628 | -0.133116 | 0.283763 | -0.837063 | Q |
5 | 1.817480 | 0.742273 | 0.419395 | -2.251035 | Q |
6 | -0.776764 | 0.935518 | -0.332872 | -1.875641 | U |
7 | -0.913135 | 1.530624 | -0.572657 | 0.477252 | K |
8 | 0.358480 | -0.497572 | -0.367016 | 0.507702 | S |
9 | -1.740877 | -1.160417 | -1.637830 | 2.172201 | G |
10 | 0.240564 | -0.328249 | 1.252155 | 1.072796 | 8 |
11 | 0.764018 | 1.165476 | -0.639544 | 1.495258 | R |
12 | 0.571035 | -0.310537 | 0.582437 | -0.298765 | 1 |
13 | 2.317658 | 0.430710 | -1.334216 | 0.199679 | P |
14 | 1.547771 | -1.119753 | -2.277634 | 0.329586 | J |
15 | -1.310608 | 0.401719 | -1.000987 | 1.156708 | E |
16 | -0.088496 | 0.634712 | 0.153324 | 0.415335 | B |
17 | -0.018663 | -0.247487 | -1.446522 | 0.750938 | A |
18 | -0.070127 | -1.579097 | 0.120892 | 0.671432 | F |
19 | -0.194678 | -0.492039 | 2.359605 | 0.319810 | H |
20 | -0.248618 | 0.868707 | -0.492226 | -0.717959 | W |
21 | -1.091549 | -0.867110 | -0.647760 | -0.832562 | C |
22 | 0.641404 | -0.138822 | -0.621963 | -0.284839 | C |
23 | 1.216408 | 0.992687 | 0.165162 | -0.069619 | V |
24 | -0.564474 | 0.792832 | 0.747053 | 0.571675 | I |
25 | 1.759879 | -0.515666 | -0.230481 | 1.362317 | S |
26 | 0.126266 | 0.309281 | 0.382820 | -0.239199 | L |
27 | 1.334360 | -0.100152 | -0.840731 | -0.643967 | 6 |
28 | -0.737620 | 0.278087 | -0.053235 | -0.950972 | J |
29 | -1.148486 | -0.986292 | -0.144963 | 0.124362 | Y |
… | … | … | … | … | … |
9970 | 0.633495 | -0.186524 | 0.927627 | 0.143164 | 4 |
9971 | 0.308636 | -0.112857 | 0.762842 | -1.072977 | 1 |
9972 | -1.627051 | -0.978151 | 0.154745 | -1.229037 | Z |
9973 | 0.314847 | 0.097989 | 0.199608 | 0.955193 | P |
9974 | 1.666907 | 0.992005 | 0.496128 | -0.686391 | S |
9975 | 0.010603 | 0.708540 | -1.258711 | 0.226541 | K |
9976 | 0.118693 | -0.714455 | -0.501342 | -0.254764 | K |
9977 | 0.302616 | -2.011527 | -0.628085 | 0.768827 | H |
9978 | -0.098572 | 1.769086 | -0.215027 | -0.053076 | A |
9979 | -0.019058 | 1.964994 | 0.738538 | -0.883776 | F |
9980 | -0.595349 | 0.001781 | -1.423355 | -1.458477 | M |
9981 | 1.392170 | -1.396560 | -1.425306 | -0.847535 | H |
9982 | -0.896029 | -0.152287 | 1.924483 | 0.365184 | 6 |
9983 | -2.274642 | -0.901874 | 1.500352 | 0.996541 | N |
9984 | -0.301898 | 1.019906 | 1.102160 | 2.624526 | I |
9985 | -2.548389 | -0.585374 | 1.496201 | -0.718815 | D |
9986 | -0.064588 | 0.759292 | -1.568415 | -0.420933 | E |
9987 | -0.143365 | -1.111760 | -1.815581 | 0.435274 | 2 |
9988 | -0.070412 | -1.055921 | 0.338017 | -0.440763 | X |
9989 | 0.649148 | 0.994273 | -1.384227 | 0.485120 | Q |
9990 | -0.370769 | 0.404356 | -1.051628 | -1.050899 | 8 |
9991 | -0.409980 | 0.155627 | -0.818990 | 1.277350 | W |
9992 | 0.301214 | -1.111203 | 0.668258 | 0.671922 | A |
9993 | 1.821117 | 0.416445 | 0.173874 | 0.505118 | X |
9994 | 0.068804 | 1.322759 | 0.802346 | 0.223618 | H |
9995 | 2.311896 | -0.417070 | -1.409599 | -0.515821 | L |
9996 | -0.479893 | -0.650419 | 0.745152 | -0.646038 | E |
9997 | 0.523331 | 0.787112 | 0.486066 | 1.093156 | K |
9998 | -0.362559 | 0.598894 | -1.843201 | 0.887292 | G |
9999 | -0.096376 | -1.012999 | -0.657431 | -0.573315 | 0 |
10000 rows × 5 columns
|
|
one | two | three | four | key | |
---|---|---|---|---|---|
0 | 0.467976 | -0.038649 | -0.295344 | -1.824726 | L |
1 | -0.358893 | 1.404453 | 0.704965 | -0.200638 | B |
2 | -0.501840 | 0.659254 | -0.421691 | -0.057688 | G |
3 | 0.204886 | 1.074134 | 1.388361 | -0.982404 | R |
4 | 0.354628 | -0.133116 | 0.283763 | -0.837063 | Q |
|
|
<pandas.io.parsers.TextFileReader at 0x2035229de80>
|
|
|
|
E 368.0
X 364.0
L 346.0
O 343.0
Q 340.0
M 338.0
J 337.0
F 335.0
K 334.0
H 330.0
dtype: float64
将数据写出到文本格式
|
|
something | a | b | c | d | message | |
---|---|---|---|---|---|---|
0 | one | 1 | 2 | 3.0 | 4 | NaN |
1 | two | 5 | 6 | NaN | 8 | world |
2 | three | 9 | 10 | 11.0 | 12 | foo |
|
|
,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo
|
|
|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo
|
|
,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo
|
|
one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo
|
|
a,b,c
1,2,3.0
5,6,
9,10,11.0
|
|
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07'],
dtype='datetime64[ns]', freq='D')
2000-01-01 0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
2000-01-06 5
2000-01-07 6
Freq: D, dtype: int32
2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6
|
|
2000-01-01 0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
2000-01-06 5
2000-01-07 6
dtype: int64
手动处理分隔符格式
|
|
"a","b","c"
"1","2","3"
"1","2","3","4"
|
|
|
|
['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3', '4']
|
|
('1', '1')
('2', '2')
('3', '3')
a ('1', '1')
b ('2', '2')
c ('3', '3')
{'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}
|
|
|
|
14
6
6
6
|
|
one;two;three
1;2;3
4;5;6
7;8;9
JSON数据
|
|
|
|
{'name': 'Wes',
'pet': None,
'places_lived': ['United States', 'Spain', 'Germany'],
'siblings': [{'age': 25, 'name': 'Scott', 'pet': 'Zuko'},
{'age': 33, 'name': 'Katie', 'pet': 'Cisco'}]}
|
|
|
|
name | age | |
---|---|---|
0 | Scott | 25 |
1 | Katie | 33 |
XML和HTML: Web信息收集
NB. The Yahoo! Finance API has changed and this example no longer works
|
|
|
|
[<Element a at 0x20352cad598>,
<Element a at 0x20352cad5e8>,
<Element a at 0x20352cad638>,
<Element a at 0x20352cad688>,
<Element a at 0x20352cad6d8>]
|
|
<Element a at 0x20352cad9a8>
'/quote/NFLX?p=NFLX'
'NFLX'
|
|
['//finance.yahoo.com/broker-comparison?bypass=true',
'https://help.yahoo.com/kb/index?page=content&y=PROD_MAIL_ML&locale=en_US&id=SLN2310&actp=productlink',
'http://help.yahoo.com/l/us/yahoo/finance/',
'https://yahoo.uservoice.com/forums/382977',
'http://info.yahoo.com/privacy/us/yahoo/',
'http://info.yahoo.com/relevantads/',
'http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html',
'http://twitter.com/YahooFinance',
'http://facebook.com/yahoofinance',
'http://yahoofinance.tumblr.com']
|
|
1
|
|
|
|
|
|
[]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-87-7d371ed47023> in <module>()
1 _unpack(rows[0], kind='th')
----> 2 _unpack(rows[1], kind='td')
IndexError: list index out of range
|
|
|
|
Parsing XML with lxml.objectify
|
|
|
|
|
|
INDICATOR_UNIT | MONTHLY_ACTUAL | MONTHLY_TARGET | PERIOD_MONTH | PERIOD_YEAR | YTD_ACTUAL | YTD_TARGET | |
---|---|---|---|---|---|---|---|
0 | % | 96.9 | 95 | 1 | 2008 | 96.9 | 95 |
1 | % | 95 | 95 | 2 | 2008 | 96 | 95 |
2 | % | 96.9 | 95 | 3 | 2008 | 96.3 | 95 |
3 | % | 98.3 | 95 | 4 | 2008 | 96.8 | 95 |
4 | % | 95.8 | 95 | 5 | 2008 | 96.6 | 95 |
5 | % | 94.4 | 95 | 6 | 2008 | 96.2 | 95 |
6 | % | 96 | 95 | 7 | 2008 | 96.2 | 95 |
7 | % | 96.4 | 95 | 8 | 2008 | 96.2 | 95 |
8 | % | 93.7 | 95 | 9 | 2008 | 95.9 | 95 |
9 | % | 96.4 | 95 | 10 | 2008 | 96 | 95 |
10 | % | 96.9 | 95 | 11 | 2008 | 96.1 | 95 |
二进制数据格式
|
|
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
使用HDF5格式
|
|
<class 'pandas.io.pytables.HDFStore'>
File path: mydata.h5
/obj1 frame (shape->[3,5])
/obj1_col series (shape->[3])
|
|
a | b | c | d | message | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | hello |
1 | 5 | 6 | 7 | 8 | world |
2 | 9 | 10 | 11 | 12 | foo |
|
|
使用数据库
|
|
<sqlite3.Cursor at 0x2035487c880>
|
|
<sqlite3.Cursor at 0x2035487c810>
|
|
[('Atlanta', 'Georgia', 1.25, 6),
('Tallahassee', 'Florida', 2.6, 3),
('Sacramento', 'California', 1.7, 5)]
|
|
a | b | c | d | |
---|---|---|---|---|
0 | Atlanta | Georgia | 1.25 | 6 |
1 | Tallahassee | Florida | 2.60 | 3 |
2 | Sacramento | California | 1.70 | 5 |