In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.

One may get the full .ipynb file here.

## Weibull distribution

The *scipy* reference to the Weibull is here

```
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sts
%matplotlib inline
```

Let’s generate Weibull ditribution and plot its histogram.

```
k=1 # shape parameter
weibull = np.random.weibull(k, 1000)
N = 10 # slice to show
print ('First', N , 'Weibull samples:\n', weibull[:N])
print ('\nMean of Weibull samples:', np.mean(weibull))
plt.hist(weibull, 25)
plt.rcParams["font.size"] = "15"
plt.ylabel('Fraction of samples')
plt.xlabel('x')
plt.show()
```

[1.75457963e-01 2.14619778e+00 1.57551698e+00 3.61385649e+00 1.57228150e-01 6.72606679e-02 9.19993986e+00 1.43535032e+00 5.04072476e-03 1.95541219e-01]Mean of Weibull samples:

1.0489102684286717

### Samples histogram

```
fig, ax = plt.subplots(1, 1)
c = 1
from scipy.stats import weibull_min
mean, var, skew, kurt = weibull_min.stats(c, moments='mvsk')
# var -> variance = σ^2 (sigma square)
print('Weibull mean:', mean, '\nWeibull variance:', var, '\n')
x = np.linspace(weibull_min.ppf(0.01, c),
weibull_min.ppf(0.99, c), 100)
# ppf - Percent point function (inverse of cdf — percentiles).
ax.plot(x, weibull_min.pdf(x, c),
'r-', lw=5, alpha=0.6, label='Weibull_min PDF')
# Frozen PDF
rv = weibull_min(c)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='Frozen PDF')
# Samples of Weibull variates (empirical)
r_weibull = weibull_min.rvs(c, size=1000)
print('Random Weibull variates, first',N,':\n', r_weibull[:N])
ax.hist(r_weibull, density=True, histtype='stepfilled', alpha=1.8)
ax.legend(loc='best', frameon=False)
plt.ylabel('Fraction of samples, normalized')
plt.xlabel('Weibull variates')
plt.show()
```

Weibull mean: 1.0 Weibull variance: 1.0Random Weibull variates, first 10 :

[3.80287246 0.65022075 1.43278651 2.84351598 0.29277214 0.20368528 0.16138952 1.71049147 1.12921154 2.39365237]

Now we will estimate the distribution of the sample averages of a random Weibull variates for different sample sizes. To do this, we take four**N** (5, 10, 50, 100), **generate 1000 samples of volume N each** and plot histograms of the distributions of their sample averages.

N=[5, 10, 50, 100] # sample size weibull_mean = {} colors = iter(['b', 'y', 'r', 'g', 'pink']) # iterable colors shape = 1 import math for k in range(len(N)): temp_mean = [] for i in range(1000): weibull = np.random.weibull(shape, N[k]) temp_mean.append(np.mean(weibull)) weibull_mean[N[k]] = np.asarray(temp_mean) # let's build histogram for Weibull sample averages plt.hist(weibull_mean[N[k]], color=next(colors), density=True, label=f'N = {N[k]}') plt.legend() plt.ylabel('Fraction of samples') plt.xlabel('$x, average$') plt.title(f'Weibull sample averages histogram, shape k={shape}, sample size N = {N[k]}') # let's build a corresponding Normal distribution PDF for the current N norm_rv = sts.norm(loc = 1, scale = math.sqrt(1/N[k])) pdf = norm_rv.pdf(x) plt.plot(x, pdf, label=f'PDF of N(1, {1/N[k]})' ) plt.legend() plt.show()

The Central limit theorem, allows us to approximate the distribution of sample averages is a **Normal distribution**. Using information about the *mean* and *variance* of the original [Weibull] distribution we calculate the parameters of that resulting Normal distribution.

We have for the given Weibull ditribution:
**λ = 1** and **k = 1**

Then the Weibull distribution parameters are the following:

** mean _{w} = 1**

**σ**

_{w}^{2}= λ^{2}((1+2/k) – (1+1/k))^{2}= ((1+2) -(1+1))^{2}= (3-2)^{2}= 1According to the *Central limit theorem*, the Normal distribution **N(mean _{N}, σ_{N}^{2})** of sample averages should have the following parameters:

**mean**

σ

_{N}= mean_{w}= 1σ

_{N}^{2}= σ_{w}^{2}/N = 1/N### Comparison of sample averages approximations

Let’s show the difference between the obtained distributions for different values of **N** (volume of samples). Below is the function of the accuracy of the approximation of the distribution of sample avarages to Normal distribution change with increasing **N**:

N, samples | σ^{2} |
σ |
---|---|---|

5 | 0.2 | 0.45 |

10 | 0.1 | 0.32 |

50 | 0.02 | 0.14 |

100 | 0.01 | 0.1 |

### Conclusion

The distribution of sample avarages (of a smooth distribution) might be quite exactly approximated with a Normal distribution with the following parameters:

**mean**

_{N}= mean_{original}**σ**

_{N}= σ_{original}^{2}/√NAs

**N**grows, the accuracy of the approximation of sample avarages grows, since a standard deviation

**σ**decreases by a factor of

**√N**.