Binomial distribution

Ramd

New Member
Dear Friends,

I have a query on below question :

A sales manager of a firm believes that 20% of the firm’s orders come from first time customers. A simple random sample of 100 orders will be used to estimate the proportion of the first time customers. Assume that the sales manager is correct and the proportion is 20%. What is the probability that the sample proportion will be between 0.10 and 0.30?

What I did :

=BINOMDIST(30,100,0.2,TRUE)-BINOMDIST(10,100,0.2,TRUE)

which results in 0.988244



Am I doing correct?
Thanks again for your advice...
 

ShaktiRathore

Well-Known Member
Subscriber
Suppose there are 3 orders out of which ways in which 2 First orders can occur is,
F NF F, F F NF, NF F F =3 ways or 3C2
Now 2 first orders can occur in any of the above ways than net probability of 2 orders occurring is summation of above,
prob. of (F NF F, F F NF, NF F F)= prob.(F NF F)+ prob.( F F NF )+ prob.( NF F F ) [the events are independent so that occurence of first order does not affect occurence of no first order]
=prob(F)*prob(NF)*prob(F)+prob(F)*prob(F)*prob(NF)+prob(NF)*prob(F)*prob(F)
=3*prob(F)^2*prob(NF)=3C2prob(F)^2*prob(NF)^3-2 which can be generalised to
out of N orders n are first orders is , NCnprob(F)^n*prob(NF)^N-n
probability of 10 first orders, 100C10.2^10*.8^90
probability of 30 first orders, 100C30.2^30*.8^70
So probability of first rate orders lying between 10and 30is probability of 30 first orders-probability of 10 first orders
=100C30.2^30*.8^70-100C10.2^10*.8^90
=BINOMDIST(30,100,0.2,TRUE)-BINOMDIST(10,100,0.2,TRUE) check if the above calculated value matches this??

thanks
 

David Harper CFA FRM

David Harper CFA FRM
Subscriber
At first glance, 98.8% looks high. But of course you both are correct. An interesting "gut check" we can use is the realization that the sample proportion is the sample average of a binomial distribution. And we have a great example of the central limit theorem, the distribution of this sample average (despite that the underlying random variables are Bernoullis) tends toward a normal distribution. 100 is a large sample, so the distribution of the sample proportion is approximately normal (the awesomeness of this CLT never bores me). So (just like Jorion resorts to a normal to approximate the VaR backtest; same idea):
  • the standard deviation of a binomial with p = 20% and n =100 is SQRT[20%*80%*100] = 4.0
  • the standardized interval [10,30] when mean is 20 is [(10-20)/4, (30-20)/4] = [-2.5,+2.5]
  • again z = (x - pT)/SQRT[p(1-p)T] ~= N(0,1); Jorion 6.2; I reference because again this is much like a VaR backtest (if the VaR were only 80% confident!)
  • In this case, z = (10 - 20%*100)/SQRT[20%*80%*100] = -2.5 and z = (30- 20%*100)/SQRT[20%*80%*100] = +2.5.
  • In short, the exact answer given by the binomial (above) should be approximated by =1- NORMSDIST(-2.5, true)*2 = 98.76%. Indeed, difference of only 0.07%!
So ....
... even without a calculator, armed only with cursory knowledge of the standard normal [since N(2.33) = 99%, such that two-tailed Z +/- 2.33 occupies 98%] we can eyeball that the question is asking for +/- 2.5 standard deviations and so the answer must be (approximately) greater than 98%.
 

Ramd

New Member
Thanks a lot Shakti and David.... It really make sense and its good to about CLT and how to calculate things in fractions....
 

bhar

Active Member
Hi, Could you please explain the difference, if any, between bernoulli and binomial distributions ?
 

David Harper CFA FRM

David Harper CFA FRM
Subscriber
bhar, binomial distribution is a sequence of independent and identical (i.i.d.) Bernouillis, see http://en.wikipedia.org/wiki/Binomial_distribution
e.g., a series of coin tosses; a basket credit default swap where unrealistically all CDS in the basket have the same PD and are independent of each other

Here is a question I used in the focus review, from GARP's 2011 practice exam. True/False is a Bernoulli, if random with p = 50%. A series (n=10) is a binomial:
GARP 2011 Practice Exam Part 1. Question #6.
Suppose that a quiz consists of 10 true-false questions. A student has not studied for the exam and just randomly guesses the answers.
What is the probability that the student will get at least three questions correct?
 

bhar

Active Member
... even without a calculator, armed only with cursory knowledge of the standard normal [since N(2.33) = 99%, such that two-tailed Z +/- 2.33 occupies 98%] we can eyeball that the question is asking for +/- 2.5 standard deviations and so the answer must be (approximately) greater than 98%.[/quote]

Could you please help me understand how did you get the N(2.33) = 99%, ?

Thanks
 

bhar

Active Member
Also Could you please help me understand the formula that you quoted from Jorion, for the z value ?
 

ShaktiRathore

Well-Known Member
Subscriber
The normal distribution is symmetric so that area greater than N(2.33) is 1% and area smaller than N(-2.33) is 1%. If for 1 tail test N(2.33) is 99% than from symmetry for 2 tail test N(2.33) has areas for both the left and right sides of the distribution equal to 1% so total area occupied is 98%. similarly for N(1.96) the area greater than z>1.96 is 2.5% so for one tailed test N(1.96) is 97.5% and for 2 tail test the areas occupies on both sides for z>1.96 and z<-1.96 is 2.5% so that N(1.96) is 95%. I hope its clear and you can imagine the situation graphically.
For z>2.58 area is .3%-.5% and thus for two tail test simply subtract the double the area from 100% to get 100-2*.3% to 100%-2*.5%=99%-99.4% approx. or less than this so for z=2.5 it should be around 99%.

thanks
 

jeff-1984

Member
bhar, binomial distribution is a sequence of independent and identical (i.i.d.) Bernouillis, see http://en.wikipedia.org/wiki/Binomial_distribution
e.g., a series of coin tosses; a basket credit default swap where unrealistically all CDS in the basket have the same PD and are independent of each other

Here is a question I used in the focus review, from GARP's 2011 practice exam. True/False is a Bernoulli, if random with p = 50%. A series (n=10) is a binomial:

Hi David,

i knew this example seemed like a binomial case which means : n=10 and we have k= 0, 1 and 2 that's true right? but can you help me with a way to know P automatically
 

David Harper CFA FRM

David Harper CFA FRM
Subscriber
Hi Jeff - Right, well, when it is a financial asset or exposure, the (p) will usually be given. For example, PD/EDF (the probability of a loan/exposure/bond default) is the most common Bernoulli and a set of exposures with identical p (prob of default or not default) that are independent (i.i.d., a critical assumption) is characterized by a binomial.

But it's popular to ask a question about "a student who guesses randomly on a multiple choice quiz where each answer has three/four choices." You just need to assume that, if there are N choices, the probability of a correct answer is 1/N (the relevant phrasing in GARP's question above is "just randomly guesses." ... it is a better question for this phrase). So, for a true/false, p = 0.5; if three choices (A, B, C), p = 1/3; if four choices (like the FRM: a - d), p = 1/4. Hope that helps,
 

ShaktiRathore

Well-Known Member
Subscriber
At first glance, 98.8% looks high. But of course you both are correct. An interesting "gut check" we can use is the realization that the sample proportion is the sample average of a binomial distribution. And we have a great example of the central limit theorem, the distribution of this sample average (despite that the underlying random variables are Bernoullis) tends toward a normal distribution. 100 is a large sample, so the distribution of the sample proportion is approximately normal (the awesomeness of this CLT never bores me). So (just like Jorion resorts to a normal to approximate the VaR backtest; same idea):
  • the standard deviation of a binomial with p = 20% and n =100 is SQRT[20%*80%*100] = 4.0
  • the standardized interval [10,30] when mean is 20 is [(10-20)/4, (30-20)/4] = [-2.5,+2.5]
  • again z = (x - pT)/SQRT[p(1-p)T] ~= N(0,1); Jorion 6.2; I reference because again this is much like a VaR backtest (if the VaR were only 80% confident!)
  • In this case, z = (10 - 20%*100)/SQRT[20%*80%*100] = -2.5 and z = (30- 20%*100)/SQRT[20%*80%*100] = +2.5.
  • In short, the exact answer given by the binomial (above) should be approximated by =1- NORMSDIST(-2.5, true)*2 = 98.76%. Indeed, difference of only 0.07%!
So ....
... even without a calculator, armed only with cursory knowledge of the standard normal [since N(2.33) = 99%, such that two-tailed Z +/- 2.33 occupies 98%] we can eyeball that the question is asking for +/- 2.5 standard deviations and so the answer must be (approximately) greater than 98%.
To be more exact in the method suggested by David of apprroximating binomial by normal,a continuity correction term shud also be included of .5
http://www.statisticshowto.com/what-is-the-continuity-correction-factor/
http://en.m.wikipedia.org/wiki/Continuity_correction
z= (10 -.5- 20%*100)/SQRT[20%*80%*100] = -2.5-.125 =-2.625and z = (30+.5- 20%*100)/SQRT[20%*80%*100] = +2.5+.125=2.625
Exact answer shud be =1- NORMSDIST(-2.625, true)*2=.99135 is less closer to 98.824 than what David is getting however its getting more accurate answer in most of the case s.although here its way too far but in some othr cases its getting more exact answers in case p=.3 the .5 correction shall give more accurate answer,u can check it out, also for p=.3 but for above case it shall give less accurate answer.
.I mean binomial takes the rectangular base x-.5 to x+.5 when includung x but normal approximates this till value x the mid point and lefts the region x-x+.5. So add .5 or subtract.
Thanks
 
Last edited:
Top