乱数生成と中心極限定理の可視化

中心極限定理は、元の分布によらず標本平均が正規分布に近づくことを述べる定理です。Python でシミュレーションを行い、この重要な定理を視覚的に確認しましょう。

一様分布からの標本平均

まず一様分布 $U (0, 1)$ から標本を取り、標本平均の分布を調べます。

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

# 一様分布 U(0, 1) の理論値
# 期待値 = 0.5, 分散 = 1/12
mu = 0.5
sigma = np.sqrt(1/12)

# 異なる標本サイズで標本平均を 10000 回計算
sample_sizes = [1, 2, 5, 30]
num_simulations = 10000

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, n in zip(axes.flatten(), sample_sizes):
    # 標本平均を計算
    sample_means = [np.mean(np.random.uniform(0, 1, n)) for _ in range(num_simulations)]
    
    # ヒストグラム
    ax.hist(sample_means, bins=50, density=True, alpha=0.7, label='シミュレーション')
    
    # 理論的な正規分布（中心極限定理による近似）
    x = np.linspace(0, 1, 200)
    theoretical_std = sigma / np.sqrt(n)
    normal_pdf = stats.norm(loc=mu, scale=theoretical_std).pdf(x)
    ax.plot(x, normal_pdf, 'r-', linewidth=2, label='正規近似')
    
    ax.set_title(f'n = {n}')
    ax.set_xlabel('標本平均')
    ax.set_ylabel('密度')
    ax.legend()

plt.tight_layout()
plt.show()

$n = 1$ では元の一様分布のままですが、 $n$ が大きくなるにつれて正規分布に近づいていく様子がわかります。

指数分布からの標本平均

非対称な分布でも中心極限定理は成り立ちます。

np.random.seed(42)

# 指数分布 Exp(λ=1) の理論値
# 期待値 = 1, 分散 = 1
mu_exp = 1
sigma_exp = 1

sample_sizes = [1, 5, 10, 30]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, n in zip(axes.flatten(), sample_sizes):
    # 標本平均を計算
    sample_means = [np.mean(np.random.exponential(1, n)) for _ in range(num_simulations)]
    
    ax.hist(sample_means, bins=50, density=True, alpha=0.7, label='シミュレーション')
    
    # 理論的な正規分布
    theoretical_std = sigma_exp / np.sqrt(n)
    x = np.linspace(0, 3, 200)
    normal_pdf = stats.norm(loc=mu_exp, scale=theoretical_std).pdf(x)
    ax.plot(x, normal_pdf, 'r-', linewidth=2, label='正規近似')
    
    ax.set_title(f'n = {n}')
    ax.set_xlabel('標本平均')
    ax.set_ylabel('密度')
    ax.legend()

plt.tight_layout()
plt.show()

指数分布は右に歪んだ分布ですが、 $n = 30$ 程度で標本平均はほぼ正規分布に従っています。

サイコロの合計で確認

サイコロを振る回数を増やしたときの合計の分布を見てみます。

np.random.seed(42)

# サイコロの期待値と分散
mu_dice = 3.5
var_dice = 35/12

num_rolls_list = [1, 2, 10, 50]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, n in zip(axes.flatten(), num_rolls_list):
    # サイコロを n 回振った合計を 10000 回シミュレーション
    totals = [np.sum(np.random.randint(1, 7, n)) for _ in range(num_simulations)]
    
    ax.hist(totals, bins=range(n, 6*n + 2), density=True, alpha=0.7, label='シミュレーション')
    
    # 理論的な正規分布
    theoretical_mean = n * mu_dice
    theoretical_std = np.sqrt(n * var_dice)
    x = np.linspace(n, 6*n, 200)
    normal_pdf = stats.norm(loc=theoretical_mean, scale=theoretical_std).pdf(x)
    ax.plot(x, normal_pdf, 'r-', linewidth=2, label='正規近似')
    
    ax.set_title(f'サイコロ {n} 回の合計')
    ax.set_xlabel('合計')
    ax.set_ylabel('密度')
    ax.legend()

plt.tight_layout()
plt.show()

離散分布であっても、試行回数が増えると合計の分布は正規分布に近づきます。これが中心極限定理の本質であり、正規分布が統計学で普遍的に現れる理由です。