Pythonのpandasでcsvファイルを読み込む
Python の pandas で csv ファイルを読み込むには read_csv という関数を使う。
import pandas as pd
df = pd.read_csv('population.csv')
print(df)
csv のヘッダーは自動的に認識される。
df という変数名を使っているが、これは DataFrame の略である。
pandas の read_csv で日本の年齢別人口を読み込む
population.csv に日本の人口データを入れる。データは東京都総務局統計部から引用した。この population.csv は pandas の解説で今後も使用する。
市区町村,世帯数,総数,男,女,人口密度
千代田区,"35,830","63,635","31,935","31,700","5,458"
中央区,"91,852","162,502","77,241","85,261","15,916"
港 区,"145,865","257,426","121,326","136,100","12,638"
新宿区,"219,639","346,162","173,743","172,419","18,999"
文京区,"121,128","221,489","105,462","116,027","19,618"
台東区,"118,858","199,292","101,917","97,375","19,712"
墨田区,"150,855","271,859","134,678","137,181","19,743"
江東区,"267,262","518,479","256,116","262,363","12,910"
品川区,"220,678","394,700","193,644","201,056","17,281"
目黒区,"156,583","279,342","132,206","147,136","19,042"
大田区,"391,146","729,534","362,653","366,881","11,993"
世田谷区,"479,792","908,907","431,026","477,881","15,657"
渋谷区,"137,582","226,594","108,768","117,826","14,996"
中野区,"204,613","331,658","167,378","164,280","21,274"
杉並区,"321,531","569,132","273,057","296,075","16,710"
豊島区 ,"179,880","289,508","145,334","144,174","22,253"
北 区,"196,580","351,976","174,910","177,066","17,078"
荒川区,"115,944","215,966","107,283","108,683","21,256"
板橋区,"309,133","566,890","278,662","288,228","17,594"
練馬区,"370,567","732,433","356,279","376,154","15,234"
足立区,"346,739","688,512","345,291","343,221","12,930"
葛飾区,"233,158","462,591","231,272","231,319","13,293"
江戸川区,"342,016","698,031","351,914","346,117","13,989"
八王子市,"267,736","562,460","281,506","280,954","3,018"
立川市,"91,270","183,822","91,460","92,362","7,546"
武蔵野市,"76,765","146,399","70,120","76,279","13,333"
三鷹市,"93,665","187,199","91,624","95,575","11,401"
青梅市,"63,142","134,086","67,393","66,693","1,298"
府中市,"125,060","260,011","130,582","129,429","8,835"
昭島市,"53,827","113,215","56,384","56,831","6,529"
調布市,"118,804","235,169","114,909","120,260","10,898"
町田市,"195,643","428,685","209,971","218,714","5,991"
小金井市,"60,367","121,443","59,955","61,488","10,747"
小平市,"91,602","193,596","95,312","98,284","9,439"
日野市,"88,402","185,393","92,983","92,410","6,729"
東村山市,"72,676","150,789","73,621","77,168","8,797"
国分寺市,"60,111","123,689","60,901","62,788","10,793"
国立市,"37,728","76,038","37,161","38,877","9,330"
福生市,"30,506","58,243","29,132","29,111","5,733"
狛江市,"42,157","82,481","40,005","42,476","12,908"
東大和市,"38,852","85,565","42,208","43,357","6,376"
清瀬市,"35,454","74,737","36,092","38,645","7,306"
東久留米市,"54,257","116,896","57,066","59,830","9,076"
武蔵村山市,"31,640","72,546","36,177","36,369","4,735"
多摩市,"71,851","148,745","72,927","75,818","7,080"
稲城市,"39,991","90,585","45,589","44,996","5,041"
羽村市,"25,718","55,607","28,251","27,356","5,617"
あきる野市,"35,519","80,851","40,304","40,547","1,100"
西東京市,"97,350","202,817","98,839","103,978","12,877"
瑞穂町,"14,912","33,213","16,922","16,291","1,971"
日の出町,"7,383","16,732","8,224","8,508",596
檜原村,"1,181","2,217","1,100","1,117",21
奥多摩町,"2,685","5,179","2,601","2,578",23
大島町,"4,635","7,716","3,971","3,745",85
利島村,174,323,175,148,78
新島村,"1,381","2,722","1,325","1,397",99
神津島村,917,"1,898",975,923,102
三宅村,"1,620","2,481","1,356","1,125",45
御蔵島村,170,317,167,150,15
八丈町,"4,365","7,465","3,720","3,745",103
青ヶ島村,109,159,92,67,27
小笠原村,"1,492","2,625","1,451","1,174",25
引用:住民基本台帳による東京都の世帯と人口(町丁別・年齢別)
データは市区町村、世帯数、人口総数、男性人口、女性人口、人口密度である。数値データにカンマが混じっているため、一部の数値はダブルクォーテーションで囲まれている。この population.csv を pandas で読み込むと、次の DataFrame が表示される。
市区町村 世帯数 総数 男 女 人口密度
0 千代田区 35,830 63,635 31,935 31,700 5,458
1 中央区 91,852 162,502 77,241 85,261 15,916
2 港 区 145,865 257,426 121,326 136,100 12,638
3 新宿区 219,639 346,162 173,743 172,419 18,999
4 文京区 121,128 221,489 105,462 116,027 19,618
5 台東区 118,858 199,292 101,917 97,375 19,712
6 墨田区 150,855 271,859 134,678 137,181 19,743
7 江東区 267,262 518,479 256,116 262,363 12,910
8 品川区 220,678 394,700 193,644 201,056 17,281
9 目黒区 156,583 279,342 132,206 147,136 19,042
10 大田区 391,146 729,534 362,653 366,881 11,993
11 世田谷区 479,792 908,907 431,026 477,881 15,657
12 渋谷区 137,582 226,594 108,768 117,826 14,996
13 中野区 204,613 331,658 167,378 164,280 21,274
14 杉並区 321,531 569,132 273,057 296,075 16,710
15 豊島区 179,880 289,508 145,334 144,174 22,253
16 北 区 196,580 351,976 174,910 177,066 17,078
17 荒川区 115,944 215,966 107,283 108,683 21,256
18 板橋区 309,133 566,890 278,662 288,228 17,594
19 練馬区 370,567 732,433 356,279 376,154 15,234
20 足立区 346,739 688,512 345,291 343,221 12,930
21 葛飾区 233,158 462,591 231,272 231,319 13,293
22 江戸川区 342,016 698,031 351,914 346,117 13,989
23 八王子市 267,736 562,460 281,506 280,954 3,018
24 立川市 91,270 183,822 91,460 92,362 7,546
25 武蔵野市 76,765 146,399 70,120 76,279 13,333
26 三鷹市 93,665 187,199 91,624 95,575 11,401
27 青梅市 63,142 134,086 67,393 66,693 1,298
28 府中市 125,060 260,011 130,582 129,429 8,835
29 昭島市 53,827 113,215 56,384 56,831 6,529
.. ... ... ... ... ... ...
32 小金井市 60,367 121,443 59,955 61,488 10,747
33 小平市 91,602 193,596 95,312 98,284 9,439
34 日野市 88,402 185,393 92,983 92,410 6,729
35 東村山市 72,676 150,789 73,621 77,168 8,797
36 国分寺市 60,111 123,689 60,901 62,788 10,793
37 国立市 37,728 76,038 37,161 38,877 9,330
38 福生市 30,506 58,243 29,132 29,111 5,733
39 狛江市 42,157 82,481 40,005 42,476 12,908
40 東大和市 38,852 85,565 42,208 43,357 6,376
41 清瀬市 35,454 74,737 36,092 38,645 7,306
42 東久留米市 54,257 116,896 57,066 59,830 9,076
43 武蔵村山市 31,640 72,546 36,177 36,369 4,735
44 多摩市 71,851 148,745 72,927 75,818 7,080
45 稲城市 39,991 90,585 45,589 44,996 5,041
46 羽村市 25,718 55,607 28,251 27,356 5,617
47 あきる野市 35,519 80,851 40,304 40,547 1,100
48 西東京市 97,350 202,817 98,839 103,978 12,877
49 瑞穂町 14,912 33,213 16,922 16,291 1,971
50 日の出町 7,383 16,732 8,224 8,508 596
51 檜原村 1,181 2,217 1,100 1,117 21
52 奥多摩町 2,685 5,179 2,601 2,578 23
53 大島町 4,635 7,716 3,971 3,745 85
54 利島村 174 323 175 148 78
55 新島村 1,381 2,722 1,325 1,397 99
56 神津島村 917 1,898 975 923 102
57 三宅村 1,620 2,481 1,356 1,125 45
58 御蔵島村 170 317 167 150 15
59 八丈町 4,365 7,465 3,720 3,745 103
60 青ヶ島村 109 159 92 67 27
61 小笠原村 1,492 2,625 1,451 1,174 25
[62 rows x 6 columns]
となる。
行を選択する(インデックスを用いる)
pandas で DataFrame の行を選択するには iloc を使う。上の表から文京区のデータだけを選択してみよう。文京区は 4 番目にあるので
import pandas as pd
df = pd.read_csv('population.csv')
r = df.iloc[4]
print(r)
とする。
結果はこうなる。
市区町村 文京区
世帯数 121,128
総 数 221,489
男 105,462
女 116,027
人口密度 19,618
Name: 4, dtype: object
文京区の世帯数、総数などが出力された。
インデックスを表示しない
上で表示した DataFrame にはインデックスがあった。そのインデックスを使って文京区のデータを選択したが、実際は「目黒区」などと市区町村の名前で検索したい。そのために、まずは DataFrame からインデックスを削除してみよう。
import pandas as pd
df = pd.read_csv('population.csv', index_col=0)
print(df)
結果はこうなる。
世帯数 総数 男 女 人口密度
市区町村
千代田区 35,830 63,635 31,935 31,700 5,458
中央区 91,852 162,502 77,241 85,261 15,916
港 区 145,865 257,426 121,326 136,100 12,638
新宿区 219,639 346,162 173,743 172,419 18,999
文京区 121,128 221,489 105,462 116,027 19,618
台東区 118,858 199,292 101,917 97,375 19,712
墨田区 150,855 271,859 134,678 137,181 19,743
江東区 267,262 518,479 256,116 262,363 12,910
品川区 220,678 394,700 193,644 201,056 17,281
目黒区 156,583 279,342 132,206 147,136 19,042
大田区 391,146 729,534 362,653 366,881 11,993
世田谷区 479,792 908,907 431,026 477,881 15,657
渋谷区 137,582 226,594 108,768 117,826 14,996
中野区 204,613 331,658 167,378 164,280 21,274
杉並区 321,531 569,132 273,057 296,075 16,710
豊島区 179,880 289,508 145,334 144,174 22,253
北 区 196,580 351,976 174,910 177,066 17,078
荒川区 115,944 215,966 107,283 108,683 21,256
板橋区 309,133 566,890 278,662 288,228 17,594
練馬区 370,567 732,433 356,279 376,154 15,234
足立区 346,739 688,512 345,291 343,221 12,930
葛飾区 233,158 462,591 231,272 231,319 13,293
江戸川区 342,016 698,031 351,914 346,117 13,989
八王子市 267,736 562,460 281,506 280,954 3,018
立川市 91,270 183,822 91,460 92,362 7,546
武蔵野市 76,765 146,399 70,120 76,279 13,333
三鷹市 93,665 187,199 91,624 95,575 11,401
青梅市 63,142 134,086 67,393 66,693 1,298
府中市 125,060 260,011 130,582 129,429 8,835
昭島市 53,827 113,215 56,384 56,831 6,529
... ... ... ... ... ...
小金井市 60,367 121,443 59,955 61,488 10,747
小平市 91,602 193,596 95,312 98,284 9,439
日野市 88,402 185,393 92,983 92,410 6,729
東村山市 72,676 150,789 73,621 77,168 8,797
国分寺市 60,111 123,689 60,901 62,788 10,793
国立市 37,728 76,038 37,161 38,877 9,330
福生市 30,506 58,243 29,132 29,111 5,733
狛江市 42,157 82,481 40,005 42,476 12,908
東大和市 38,852 85,565 42,208 43,357 6,376
清瀬市 35,454 74,737 36,092 38,645 7,306
東久留米市 54,257 116,896 57,066 59,830 9,076
武蔵村山市 31,640 72,546 36,177 36,369 4,735
多摩市 71,851 148,745 72,927 75,818 7,080
稲城市 39,991 90,585 45,589 44,996 5,041
羽村市 25,718 55,607 28,251 27,356 5,617
あきる野市 35,519 80,851 40,304 40,547 1,100
西東京市 97,350 202,817 98,839 103,978 12,877
瑞穂町 14,912 33,213 16,922 16,291 1,971
日の出町 7,383 16,732 8,224 8,508 596
檜原村 1,181 2,217 1,100 1,117 21
奥多摩町 2,685 5,179 2,601 2,578 23
大島町 4,635 7,716 3,971 3,745 85
利島村 174 323 175 148 78
新島村 1,381 2,722 1,325 1,397 99
神津島村 917 1,898 975 923 102
三宅村 1,620 2,481 1,356 1,125 45
御蔵島村 170 317 167 150 15
八丈町 4,365 7,465 3,720 3,745 103
青ヶ島村 109 159 92 67 27
小笠原村 1,492 2,625 1,451 1,174 25
[62 rows x 5 columns]
行を選択する(カラム名を使う)
カラム名を使うときは iloc でなく loc を用いる。
import pandas as pd
df = pd.read_csv('population.csv', index_col=0)
row = df.loc['目黒区']
print(row)
結果はこうなる。
世帯数 156,583
総 数 279,342
男 132,206
女 147,136
人口密度 19,042
Name: 目黒区, dtype: object