Quantitative Methods
Module 9: More Distributions
Statistical inference is the collection methods by which
sample data can be turned into general information about a population
Two main type of inference
|
1.
Estimation |
Predicting confidence intervals for parameters |
|
|
|
|
2.
Significance testing |
Judging whether sample evidence is consistent with a
hypothesis |
Standard distribution similar to binomial but with infinite sample
size. Discrete distribution shape
varies form right-skewed to almost symmetrical.
Binominal is based on taking samples population with two
types split occurrence of events or non-occurrence of events.
Poisson is total number of elements in the sample is not
known – Occurrence of event can be counted the non occurrence cannot because
the number of events that could occur but did not is infinite.
Poisson probability give number of events occur (usually in
a period of time) compared with compared with binomial distribution
probabilities sample contains given number of elements of one type are
calculated the mathematics of the Poisson based on binomial but allow for an
infinite sample size.
Typical application is arrival of calls at a telephone
switchboard. (The arrival, or non arrival of calls from a possible infinite
number.)
Telegraph cable flaw
Mechanical breakdown of machinery, cars
Clerical errors
Derived from the binomial the starting point is the binomial
formula for probabilities
P( r events)= e-l * lr / r!
l is the parameter of the distribution; the
average number of events per sample.
-e is a constant,
1) Along the top of the table find l .
(Corresponds to average number of events.)
2) Down the side look up probability of given r value. (Variable number of events per
sample)
One parameter, average occurrence of events, once known the shape of the distribution is fixed exactly. This can be verified from the Poisson formula
The theoretical basis of the Poisson is unlikely to be matched exactly in practice. Basis in the way the sample is taken. Is a reasonable approximation?
Two test to check if poison is applicable.
1. Actual situation is compared qualitatively with that on which the distribution is based.
Done purely for convenience
Binominal approximated by the Poisson
Rule of thumb: If n > 20, and p < .05. The parameter for approximating Poisson is easily found
l is
defined as being equal to the mean of the binomial np.
Defined as the number of observations that are free to vary
in estimating a parameter from a sample.
Mean ( Sn/n) n is the sample size
Estimation made from n observations, observations not
restricted free to vary.
Arithmetic mean from sample of size n therefore n degrees of freedom.
Second deviation dependent on the first, being its negative
and is therefore not free to vary.
Standard
Deviation has n-1 degrees of freedom, since measures of deviation
require a fixed reference point in which to measure the deviation.
The degrees of
freedom associated with the estimate of a parameter is the sample size
minus the number of observations ‘used up’ because of the need to measure other
statistics (such as the mean) before the estimate can be made.
This distribution allows the standard deviation to be
estimated from samples that are smaller than 30.
Similar to normal distribution except longer tails,
Continuous the shape is symmetrical.
All conditions
apply to estimation and statistical inference.
a)
Standard deviation is unknown and has to be estimated form
the sample
b)
Sample size is less than 30
c)
Underlying distribution is normal, sample size is less than
30, the central limit theorem cannot be invoked. The t-distribution allows only one of the two reasons for
sample size exceeding 30.
Credit to W.S. Gosset
Extend methods of estimation and inference to small samples
Tables give required probabilities,
Wider than normal distribution but the larger the sample
size the more certain standard deviation estimate closer the t-distribution is
to the normal.
Slightly different each sample size, distribution differs
according to degrees of the freedom
Before t-distribution can be used the number of degrees of
freedom must be specified.
Based on estimating the standard deviation form a sample,
same number of degrees of freedom as the standard deviation, the sample size
minus 1
Use of normal distribution table starts with calculation of
a z-value
t-distribution starts with the calculation of this same quantity,
but labeled “t” to denote that the distribution being used is not normal:
t =
/ s/
only difference is sample size is
smaller than 30.
Table: rows refer to degrees of freedom columns refer to
probabilities and body of the table contains t values.
The 95% confidence limits for a t-distribution are:
-
t value varies
according to sample size unlike z value which does not change with sample
size. Different confidence levels can
be substituted.
-
1) Specify Hypothesis
2) Collect Sample Evidence
3) Select a Significance Level
4) Calculate the t value related to the sample
evidence
5) Compare the observed
t value with the t value associated with the significance
level. Accept or reject the hypothesis accordingly
Sample size exceeds 30 the sampling distribution of the mean
is almost normal and has tow parameters, the arithmetic mean and standard
deviation.
t- distribution has one extra parameter – degree of freedom
with the three parameter
specified the distribution probabilities are fixed.
NOTE: Underlying distribution must be normal or you can’t
use t!
Individual distribution is indeed normal; take a small sample and compare observed with expected frequencies.
Four standard distributions encounter so far and are most
widely used:
1.
Binomial
2.
Normal
3.
Poisson
4.
t-Distribution
Chi-squared compares observed sample variance with a
hypothesized population variance. It answers the question, “ Is the observed scatter
of the sample in accord with what is thought to be the scatter of the
population? “ (Remember, the normal, binomial, and Poisson compare mean, not
scatter.) The application of
chi-squared is important to statisticians but less so to managers.
c2 = (n-1) x
Observed sample variance / Population Variance
Assumptions are either
·
The sample has been take at random form a normal population
and sample size less then 30 or
·
The population is non-normal, the sample size is 30 or more.
Two tails have to be treated separately. The rows refer to degrees of freedom, the
columns upper or right hand side of the distribution, the value in the body are
chi squared values.
significance test steps
Compare observed with expected value from chi-square table
using usual significance test steps.
The most common managerial use is test for differences in
proportions.
Compares the variance in one sample with the variance in
another.
The variable of an F-distribution is the ratio between two
variance estimates
Scatter of two samples compared through the ratio of their
variances.
F = Variance
of sample 1/ Variance of sample 2
A pair of degrees of freedom is associated with the F
distribution (each is ‘sample - 1’), one for numerator (the upper part of the
ratio), one for denominator( the lower part of the ratio).
Based on taking just one pair of samples and using them in
conjunction with theoretically derived F-distribution tables.
Compare the difference in variances of two samples
Major application is in analysis of variance.
Two important assumptions:
1)
samples should be selected at random
2)
population from which samples come normally distributed.
Use of F Tables
Row refer to the degree of freedom in the denominator
Columns to the degree of freedom in the numerator
Degree of freedom are sample size –1 (n –1)
Critical 5% value is that F ration which leaves 5% in the
tail of the distribution and 1% in the tail.
The larger of the variances is on top in the ratio so that only one tail
has to be considered.
5stages of significance test
a)
hypothesis is that the two samples come form populations
with same variance, no difference in quality
b)
evidence is the two sample and their variances, form which
an F ratio can be calculated.
c)
Significance level at the conventional 5%
d)
Critical F value for a 5% significance level and (11,17)
degree of freedom is 2.41.
|
Negative
Binomial: |
Applicable to
situations, which are almost, but not quite, modeled by poison. Occurs for
instance when l varies
from family to family in market research. Increasingly important to
marketing. |
|
|
|
|
Beta-binomial: |
Used when the binomial parameter p (the proportion of type
1) varies in a certain way across different parts of the population. (For instance,
does not predict number of all boy families in a population well, since some
families are more prone to boys. |
Poisson, t, chi-squared F, negative binomial and beta
binomial areas of application are to problems of inference, specifically
estimation and significance testing.
1)
reduce the need for data collection compared with the
alternative of collecting one-off distribution for each and every problem
2)
each standard distribution bring established knowledge that
can widen and speed the analysis.
To summarize distribution is often a better alternative than
collecting large amounts of data.
Principal use of standard distribution is in significance testing.
|
Distribution |
Situation |
|
Normal |
Observations taken (or measurements made) of some quantity
which is essentially constant but is subject to may small, additive,
independent disturbances. |
|
Binomial |
Samples taken form a population in which the elements are
of two types. The variable is the
number of elements of one of the types in the sample. |
|
Poisson |
Sample taken of a continuum (e.g. time, length). The variable is the number of ‘events’ in
the sample. |
|
t |
Similar to the normal but where the standard deviation is
estimated form a sample of size < 30 |
|
Chi-squared |
Sample taken form a normal population. The variable is based on the ratio between
the sample variance and population variance |
|
-f |
Two samples taken form a normal population. The variable is the ratio between the
variances of the two samples |
|
Negative binomial |
Like the Poisson, but with the parameter, l , itself subject to variation across the
population |
|
Beta-binomial |
Like the binomial, but with the parameter, p ,
subject to variation across the population |
|
Significance
test |
Distribution |
|
Comparing a sample mean with a population |
Normal ( if sample size > = 30) ; t ( if sample
size < 30) |
|
Comparing one sample mean with another sample mean |
Normal (if combined sample > = 30); t ( if
sample size < 30) |
|
Compare a sample variance with a population variance |
Chi-squared. |
|
Comparing one sample variance with another sample variance |
-f |