In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.
One may get the full .ipynb file here.
Weibull distribution
The scipy reference to the Weibull is here
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sts
%matplotlib inline
Let’s generate Weibull ditribution and plot its histogram.
k=1 # shape parameter
weibull = np.random.weibull(k, 1000)
N = 10 # slice to show
print ('First', N , 'Weibull samples:\n', weibull[:N])
print ('\nMean of Weibull samples:', np.mean(weibull))
plt.hist(weibull, 25)
plt.rcParams["font.size"] = "15"
plt.ylabel('Fraction of samples')
plt.xlabel('x')
plt.show()
[1.75457963e-01 2.14619778e+00 1.57551698e+00 3.61385649e+00 1.57228150e-01 6.72606679e-02 9.19993986e+00 1.43535032e+00 5.04072476e-03 1.95541219e-01]Mean of Weibull samples:
1.0489102684286717
Samples histogram
fig, ax = plt.subplots(1, 1)
c = 1
from scipy.stats import weibull_min
mean, var, skew, kurt = weibull_min.stats(c, moments='mvsk')
# var -> variance = σ^2 (sigma square)
print('Weibull mean:', mean, '\nWeibull variance:', var, '\n')
x = np.linspace(weibull_min.ppf(0.01, c),
weibull_min.ppf(0.99, c), 100)
# ppf - Percent point function (inverse of cdf — percentiles).
ax.plot(x, weibull_min.pdf(x, c),
'r-', lw=5, alpha=0.6, label='Weibull_min PDF')
# Frozen PDF
rv = weibull_min(c)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='Frozen PDF')
# Samples of Weibull variates (empirical)
r_weibull = weibull_min.rvs(c, size=1000)
print('Random Weibull variates, first',N,':\n', r_weibull[:N])
ax.hist(r_weibull, density=True, histtype='stepfilled', alpha=1.8)
ax.legend(loc='best', frameon=False)
plt.ylabel('Fraction of samples, normalized')
plt.xlabel('Weibull variates')
plt.show()
Weibull mean: 1.0 Weibull variance: 1.0Random Weibull variates, first 10 :
[3.80287246 0.65022075 1.43278651 2.84351598 0.29277214 0.20368528 0.16138952 1.71049147 1.12921154 2.39365237]
Now we will estimate the distribution of the sample averages of a random Weibull variates for different sample sizes. To do this, we take fourN (5, 10, 50, 100), generate 1000 samples of volume N each and plot histograms of the distributions of their sample averages.
N=[5, 10, 50, 100] # sample size weibull_mean = {} colors = iter(['b', 'y', 'r', 'g', 'pink']) # iterable colors shape = 1 import math for k in range(len(N)): temp_mean = [] for i in range(1000): weibull = np.random.weibull(shape, N[k]) temp_mean.append(np.mean(weibull)) weibull_mean[N[k]] = np.asarray(temp_mean) # let's build histogram for Weibull sample averages plt.hist(weibull_mean[N[k]], color=next(colors), density=True, label=f'N = {N[k]}') plt.legend() plt.ylabel('Fraction of samples') plt.xlabel('$x, average$') plt.title(f'Weibull sample averages histogram, shape k={shape}, sample size N = {N[k]}') # let's build a corresponding Normal distribution PDF for the current N norm_rv = sts.norm(loc = 1, scale = math.sqrt(1/N[k])) pdf = norm_rv.pdf(x) plt.plot(x, pdf, label=f'PDF of N(1, {1/N[k]})' ) plt.legend() plt.show()
The Central limit theorem, allows us to approximate the distribution of sample averages is a Normal distribution. Using information about the mean and variance of the original [Weibull] distribution we calculate the parameters of that resulting Normal distribution.
We have for the given Weibull ditribution:
λ = 1 and k = 1
Then the Weibull distribution parameters are the following:
meanw = 1
σw2 = λ2((1+2/k) – (1+1/k))2 = ((1+2) -(1+1))2 = (3-2)2 = 1
According to the Central limit theorem, the Normal distribution N(meanN, σN2) of sample averages should have the following parameters:
meanN = meanw = 1σN2 = σw2/N = 1/N
Comparison of sample averages approximations
Let’s show the difference between the obtained distributions for different values of N (volume of samples). Below is the function of the accuracy of the approximation of the distribution of sample avarages to Normal distribution change with increasing N:
N, samples | σ2 | σ |
---|---|---|
5 | 0.2 | 0.45 |
10 | 0.1 | 0.32 |
50 | 0.02 | 0.14 |
100 | 0.01 | 0.1 |
Conclusion
The distribution of sample avarages (of a smooth distribution) might be quite exactly approximated with a Normal distribution with the following parameters:
As N grows, the accuracy of the approximation of sample avarages grows, since a standard deviation σ decreases by a factor of √N.