In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.

One may get the full .ipynb file here.

Weibull distribution

The scipy reference to the Weibull is here

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sts
%matplotlib inline

Let’s generate Weibull ditribution and plot its histogram.

k=1 # shape parameter
weibull = np.random.weibull(k, 1000)
N = 10 # slice to show
print ('First', N , 'Weibull samples:\n', weibull[:N])
print ('\nMean of Weibull samples:',  np.mean(weibull))
plt.hist(weibull, 25)
plt.rcParams["font.size"] = "15"
plt.ylabel('Fraction of samples')
plt.xlabel('x')
plt.show()

First 10 Weibull samples:

 [1.75457963e-01 2.14619778e+00 1.57551698e+00 3.61385649e+00
 1.57228150e-01 6.72606679e-02 9.19993986e+00 1.43535032e+00
 5.04072476e-03 1.95541219e-01]

Mean of Weibull samples:

1.0489102684286717

Samples histogram

The histogram together with theoretical Weibull Probability Density function, PDF

Let’s draw together a empirical histogram and theoretical Weibull distribution function.

fig, ax = plt.subplots(1, 1)
c = 1
from scipy.stats import weibull_min
mean, var, skew, kurt = weibull_min.stats(c, moments='mvsk')
# var -> variance = σ^2 (sigma square)
print('Weibull mean:', mean, '\nWeibull variance:', var, '\n')
x = np.linspace(weibull_min.ppf(0.01, c), 
                weibull_min.ppf(0.99, c), 100)
# ppf - Percent point function (inverse of cdf — percentiles).
ax.plot(x, weibull_min.pdf(x, c),
       'r-', lw=5, alpha=0.6, label='Weibull_min PDF')
# Frozen PDF
rv = weibull_min(c) 
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='Frozen PDF')
# Samples of Weibull variates (empirical)
r_weibull = weibull_min.rvs(c, size=1000)
print('Random Weibull variates, first',N,':\n', r_weibull[:N]) 
ax.hist(r_weibull, density=True, histtype='stepfilled', alpha=1.8)
ax.legend(loc='best', frameon=False)
plt.ylabel('Fraction of samples, normalized')
plt.xlabel('Weibull variates')
plt.show()

Weibull mean: 1.0 
Weibull variance: 1.0

Random Weibull variates, first 10 :

 [3.80287246 0.65022075 1.43278651 2.84351598 0.29277214 0.20368528
 0.16138952 1.71049147 1.12921154 2.39365237]

Estimation of the sample averages

Now we will estimate the distribution of the sample averages of a random Weibull variates for different sample sizes. To do this, we take fourN (5, 10, 50, 100), generate 1000 samples of volume N each and plot histograms of the distributions of their sample averages.

N=[5, 10, 50, 100] # sample size
weibull_mean = {}  
colors = iter(['b', 'y', 'r', 'g', 'pink']) # iterable colors
shape = 1
import math
for k in range(len(N)):
    temp_mean = []
    for i in range(1000):    
        weibull = np.random.weibull(shape, N[k]) 
        temp_mean.append(np.mean(weibull))
    weibull_mean[N[k]] = np.asarray(temp_mean)
    # let's build histogram for Weibull sample averages   
    plt.hist(weibull_mean[N[k]], color=next(colors), density=True, label=f'N = {N[k]}')
    plt.legend()
    plt.ylabel('Fraction of samples')
    plt.xlabel('$x, average$')
    plt.title(f'Weibull sample averages histogram, shape k={shape}, sample size N = {N[k]}')
    
    # let's build a corresponding Normal distribution PDF for the current N
    norm_rv = sts.norm(loc = 1, scale = math.sqrt(1/N[k]))
    pdf = norm_rv.pdf(x)
    plt.plot(x, pdf, label=f'PDF of N(1, {1/N[k]})' )
    plt.legend()    
    plt.show()

Theoretical calculations

Count the Normal distribution parameters: mean and variance

The Central limit theorem, allows us to approximate the distribution of sample averages is a Normal distribution. Using information about the mean and variance of the original [Weibull] distribution we calculate the parameters of that resulting Normal distribution.

We have for the given Weibull ditribution:
λ = 1 and k = 1

Then the Weibull distribution parameters are the following:
mean_w = 1
σ_w² = λ²((1+2/k) – (1+1/k))² = ((1+2) -(1+1))² = (3-2)² = 1

According to the Central limit theorem, the Normal distribution N(mean_N, σ_N²) of sample averages should have the following parameters:

mean_N = mean_w = 1
σ_N² = σ_w²/N = 1/N

Comparison of sample averages approximations

Let’s show the difference between the obtained distributions for different values of N (volume of samples). Below is the function of the accuracy of the approximation of the distribution of sample avarages to Normal distribution change with increasing N:

N, samples	σ²	σ
5	0.2	0.45
10	0.1	0.32
50	0.02	0.14
100	0.01	0.1

Conclusion

The distribution of sample avarages (of a smooth distribution) might be quite exactly approximated with a Normal distribution with the following parameters:

mean_N = mean_original
σ_N = σ_original²/√N
As N grows, the accuracy of the approximation of sample avarages grows, since a standard deviation σ decreases by a factor of √N.