pandas の DataFrame から条件に合う列のみを選択してみよう。今回も東京都の自治体別人口データを使う。
import pandas as pd
df = pd.read_csv('population.csv', thousands=',')
rows = df.loc[df['男'] > df['女']]
print(rows)
結果はこうなる。
市区町村 世帯数 総数 男 女 人口密度
0 千代田区 35830 63635 31935 31700 5458
3 新宿区 219639 346162 173743 172419 18999
5 台東区 118858 199292 101917 97375 19712
13 中野区 204613 331658 167378 164280 21274
15 豊島区 179880 289508 145334 144174 22253
20 足立区 346739 688512 345291 343221 12930
22 江戸川区 342016 698031 351914 346117 13989
23 八王子市 267736 562460 281506 280954 3018
27 青梅市 63142 134086 67393 66693 1298
28 府中市 125060 260011 130582 129429 8835
34 日野市 88402 185393 92983 92410 6729
38 福生市 30506 58243 29132 29111 5733
45 稲城市 39991 90585 45589 44996 5041
46 羽村市 25718 55607 28251 27356 5617
49 瑞穂町 14912 33213 16922 16291 1971
52 奥多摩町 2685 5179 2601 2578 23
53 大島町 4635 7716 3971 3745 85
54 利島村 174 323 175 148 78
56 神津島村 917 1898 975 923 102
57 三宅村 1620 2481 1356 1125 45
58 御蔵島村 170 317 167 150 15
60 青ヶ島村 109 159 92 67 27
61 小笠原村 1492 2625 1451 1174 25
毎回のことだが population.csv には下のデータが入っている。
市区町村,世帯数,総数,男,女,人口密度
千代田区,"35,830","63,635","31,935","31,700","5,458"
中央区,"91,852","162,502","77,241","85,261","15,916"
港 区,"145,865","257,426","121,326","136,100","12,638"
新宿区,"219,639","346,162","173,743","172,419","18,999"
文京区,"121,128","221,489","105,462","116,027","19,618"
台東区,"118,858","199,292","101,917","97,375","19,712"
墨田区,"150,855","271,859","134,678","137,181","19,743"
江東区,"267,262","518,479","256,116","262,363","12,910"
品川区,"220,678","394,700","193,644","201,056","17,281"
目黒区,"156,583","279,342","132,206","147,136","19,042"
大田区,"391,146","729,534","362,653","366,881","11,993"
世田谷区,"479,792","908,907","431,026","477,881","15,657"
渋谷区,"137,582","226,594","108,768","117,826","14,996"
中野区,"204,613","331,658","167,378","164,280","21,274"
杉並区,"321,531","569,132","273,057","296,075","16,710"
豊島区 ,"179,880","289,508","145,334","144,174","22,253"
北 区,"196,580","351,976","174,910","177,066","17,078"
荒川区,"115,944","215,966","107,283","108,683","21,256"
板橋区,"309,133","566,890","278,662","288,228","17,594"
練馬区,"370,567","732,433","356,279","376,154","15,234"
足立区,"346,739","688,512","345,291","343,221","12,930"
葛飾区,"233,158","462,591","231,272","231,319","13,293"
江戸川区,"342,016","698,031","351,914","346,117","13,989"
八王子市,"267,736","562,460","281,506","280,954","3,018"
立川市,"91,270","183,822","91,460","92,362","7,546"
武蔵野市,"76,765","146,399","70,120","76,279","13,333"
三鷹市,"93,665","187,199","91,624","95,575","11,401"
青梅市,"63,142","134,086","67,393","66,693","1,298"
府中市,"125,060","260,011","130,582","129,429","8,835"
昭島市,"53,827","113,215","56,384","56,831","6,529"
調布市,"118,804","235,169","114,909","120,260","10,898"
町田市,"195,643","428,685","209,971","218,714","5,991"
小金井市,"60,367","121,443","59,955","61,488","10,747"
小平市,"91,602","193,596","95,312","98,284","9,439"
日野市,"88,402","185,393","92,983","92,410","6,729"
東村山市,"72,676","150,789","73,621","77,168","8,797"
国分寺市,"60,111","123,689","60,901","62,788","10,793"
国立市,"37,728","76,038","37,161","38,877","9,330"
福生市,"30,506","58,243","29,132","29,111","5,733"
狛江市,"42,157","82,481","40,005","42,476","12,908"
東大和市,"38,852","85,565","42,208","43,357","6,376"
清瀬市,"35,454","74,737","36,092","38,645","7,306"
東久留米市,"54,257","116,896","57,066","59,830","9,076"
武蔵村山市,"31,640","72,546","36,177","36,369","4,735"
多摩市,"71,851","148,745","72,927","75,818","7,080"
稲城市,"39,991","90,585","45,589","44,996","5,041"
羽村市,"25,718","55,607","28,251","27,356","5,617"
あきる野市,"35,519","80,851","40,304","40,547","1,100"
西東京市,"97,350","202,817","98,839","103,978","12,877"
瑞穂町,"14,912","33,213","16,922","16,291","1,971"
日の出町,"7,383","16,732","8,224","8,508",596
檜原村,"1,181","2,217","1,100","1,117",21
奥多摩町,"2,685","5,179","2,601","2,578",23
大島町,"4,635","7,716","3,971","3,745",85
利島村,174,323,175,148,78
新島村,"1,381","2,722","1,325","1,397",99
神津島村,917,"1,898",975,923,102
三宅村,"1,620","2,481","1,356","1,125",45
御蔵島村,170,317,167,150,15
八丈町,"4,365","7,465","3,720","3,745",103
青ヶ島村,109,159,92,67,27
小笠原村,"1,492","2,625","1,451","1,174",25
引用:住民基本台帳による東京都の世帯と人口(町丁別・年齢別)
ポイントはこのコード。
rows = df.loc[df['男'] > df['女']]
loc で選択し、その中に条件を入れる。今回は男の人数が女の人数よりも多い自治体を選択した。このコードは pandas の独特な記法で、最初は深く考えなくてもいい。
問題
人口が 1000 人未満の自治体を選べ。
解答
import pandas as pd
df = pd.read_csv('population.csv', thousands=',')
rows = df.loc[df['総数'] < 1000]
print(rows)
結果は下のようになる。
市区町村 世帯数 総数 男 女 人口密度
54 利島村 174 323 175 148 78
58 御蔵島村 170 317 167 150 15
60 青ヶ島村 109 159 92 67 27
東京都で 1000 人未満の自治体は 3 つあるようだ。
メモ: pandas の read_csv はファイルの内容を DataFrame にする。
問題
人口が 10 万人以上で、かつ男性が女性よりも多い自治体を選べ。
import pandas as pd
df = pd.read_csv('population.csv', thousands=',')
rows = df.loc[(df['総数'] > 100000) & (df['男'] > df['女'])]
print(rows)
結果は下のようになる。
市区町村 世帯数 総数 男 女 人口密度
3 新宿区 219639 346162 173743 172419 18999
5 台東区 118858 199292 101917 97375 19712
13 中野区 204613 331658 167378 164280 21274
15 豊島区 179880 289508 145334 144174 22253
20 足立区 346739 688512 345291 343221 12930
22 江戸川区 342016 698031 351914 346117 13989
23 八王子市 267736 562460 281506 280954 3018
27 青梅市 63142 134086 67393 66693 1298
28 府中市 125060 260011 130582 129429 8835
34 日野市 88402 185393 92983 92410 6729
ポイント: & で条件の and になる。
問題
人口が 10 万人以上で、かつ女性が男性よりも 10% 多い自治体を選べ。
import pandas as pd
df = pd.read_csv('population.csv', thousands=',')
rows = df.loc[(df['総数'] > 100000) & (df['女'] > df['男'] * 1.1)]
print(rows)
結果はある程度予想がつく。
市区町村 世帯数 総数 男 女 人口密度
1 中央区 91852 162502 77241 85261 15916
2 港 区 145865 257426 121326 136100 12638
4 文京区 121128 221489 105462 116027 19618
9 目黒区 156583 279342 132206 147136 19042
11 世田谷区 479792 908907 431026 477881 15657
メモ: 条件の中で四則演算ができる。