In this article, we will verify the Central Limit Theorem which says that a distribution of sample means of samples from a distribution of a random variable approaches that of a normal distribution with increasing sample size. Put simply, if multiple samples are taken from a distribution (normal or otherwise) and the mean of the samples are computed then the collection of sample means hence generated will itself form a distribution and that distribution will be the Normal Distribution (provided the sample size is large). One corollary of the Central Limit Theorem is that the sample mean will approach the population mean as the sample size goes to infinity (or the population limit). One way to verify this statement is to do the sampling using random variables generated by R and then calculate the sample means for each set of random numbers.
Using R we will generate a sample of N normal random numbers and repeat that sampling 20 times each time finding the mean of the sample of the 20 random numbers. Then plot the means. If the random numbers are normally distributed then with increasing sample size N, the standard deviation i.e. the width of the histogram will become narrower and narrower due to the Central limit theorem. If the histograms keep getting narrower and narrower with increasing sample size then we have verified the Central Limit Theorem.
Set the sample size
> N = 10Initialise array of size 20 and fill it with 0
> mn = array(rep(0,20), dim=c(20,1))
Initialize an array of size (20,N) and fill it with 0
> nr = array(rep(0,20*N), dim=c(20,N))
Create a loop which will generate N random numbers and calculate their mean and store it in the array mn[]. The function rnorm(100, 50, 15) generates 100 random numbers which are normally distributed with a mean of 50 and a standard deviation of 15.
> for(i in seq(1,20)){
+ nr[i,] = rnorm(N,50,15)
+ mn[i] = mean(nr[i,])
+ }
+ nr[i,] = rnorm(N,50,15)
+ mn[i] = mean(nr[i,])
+ }
Plot the histogram of the collection of sample means. xlab sets the xlabel or the label for the x-axis. ylab does the same for y-axis but R automatically chooses the right label "Frequency" for the y-axis. main sets the title of the plot while paste() concatenates the value of the dynamic variable N, the sample size, to the text.
> hist(mn, xlim=c(40,60), xlab = "Random numbers", + main = paste("Histogram with sample size",N))
Comments