Sample variance and variance of the sampling distribution of means.

Hi David,

I am very confused with this, they are on page 35 and 41 of the notes. could you please explain them more in depth? I don't know what's the difference between them, yet they have very different formula. It seems sample mean is sample mean, but variance is somehow different.


Also on top of page 41 you used term standard deviation of a sampling distribution.

Then you use variance of the sampling distribution of means. What's the difference between them?

I am really lost here, please help, thanks.


edit: I am especially confused about the idea, the variance of the sampling distribution of means will be close to zero with large enough sample. What is this "variance"? looking at histogram on page 42, I just can't see how variance is going to zero with more sample.

I tried to use a standard normal random number generator(in matlab) and sample from a standard normal distribution, and I really can't see how the sample distribution could have smaller variance with number of sample going up.
 

David Harper CFA FRM

David Harper CFA FRM
Subscriber
Hi chuganc,

Imagine the (unknown) population consists of equity returns for all ~20,000 stocks (N=10,000) in 2010, with (I make this up), population average return of +5% and population std deviation (of returns) of 8%.

Now draw a random sample of only ten equities.
The sample variance of those sample of ten is per the formula on p 35: S^2 = 1/(n-1)*Sum[(r(i) - average return)^2]; this is the easier idea, a static sample variance if you will.

This first sample (a random sample of ten), a first draw or trial, has a sample average; say, +6% average and +11% sigma
Now draw a second, different sample--a second trial--of ten returns; this sample has its own different sample variance and different sample average (say, +4% sample mean and +9% sample sigma)
… and so on, we can run multiple trials or draws:
First sample (n=10 out of population = X) has sample average return of +6%
Second sample (n = 10 ) has a difference sample average = +4%
Third sample (n=10) has a different sample average = X%
... we expect sampling variation: each sample will produce a different sample average and sample sigma
... we are collecting a set of sample means: sample mean[1], sample mean[2], ... sample mean[n]

The sample mean (average), sample mean[X], is itself a random variable (!), that we expect to cluster around +5%. As a random variable, per the CLT, the sample mean itself has variance = population variance/n; i.e., standard deviation (aka, standard error) = SQRT(pop variance/n).
If n grows to the size of the population, then our sample mean = 5% guaranteed (variance drops to 0).

... so in regard to that first sample of n=10, we observed a sample average of +6%. We do not expect that the population average return (+5%) necessarily equals that. Instead, we ask, how many standard deviations away is the 6% from the 5%, where such standard deviations (i.e., distance from sample mean to hypothesized pop mean) are called standard errors and CLT tells us should shrink as n increases.

Hope that helps, David
 
David, thanks a lot. I think I get it.

Still a little confusing as standard deviation of a sampling distribution(this should be "of the mean" as well right?), or standard error. I think it's the error/difference of sample mean and population mean, right?
 

David Harper CFA FRM

David Harper CFA FRM
Subscriber
Sure, you've just about got it there. But *before* the difference between sample mean and population mean, you just have the sample mean as a random variable. Gujarati writes, "sampling distribution of an estimator."

The estimator can be many sample statistics, here it is the sample mean, so we are referring to:
The sampling distribution of the sample mean

… just like a set of numbers has a sample mean. But here we have to generate the set of numbers (set of sample means) by running several trials/simulations/draws. If we do that, we have a distribution that itself has a mean and a standard deviation (which because it the standard deviation of an estimator, as opposed to an observed sample, we call the standard error).

… so your "sampling distribution of the mean" is quite true. CLT applies.
We can also refer to a "sampling distribution of the sample variance;" i.e., the set of several sample variances, one variance for each drawn sample, that itself constitutes a distribution. Only now we are over to the chi-square, not the Z/t per CLT.

David
 
Top