Quantitative Methods

Module 9:  More Distributions

Introduction

Statistical inference is the collection methods by which sample data can be turned into general information about a population

Two main type of inference

 

1.                  Estimation

Predicting confidence intervals for parameters

 

 

2.                  Significance testing

Judging whether sample evidence is consistent with a hypothesis

The Poisson distribution

Characteristics

Standard distribution similar to binomial but with infinite sample size.  Discrete distribution shape varies form right-skewed to almost symmetrical.

 

Situations in which the Poisson Occurs

 

Isolated events within a continuum

Binominal is based on taking samples population with two types split occurrence of events or non-occurrence of events.

Poisson is total number of elements in the sample is not known – Occurrence of event can be counted the non occurrence cannot because the number of events that could occur but did not is infinite.

Poisson probability give number of events occur (usually in a period of time) compared with compared with binomial distribution probabilities sample contains given number of elements of one type are calculated the mathematics of the Poisson based on binomial but allow for an infinite sample size.

 

Typical application is arrival of calls at a telephone switchboard. (The arrival, or non arrival of calls from a possible infinite number.)

Telegraph cable flaw

Mechanical breakdown of machinery, cars

Clerical errors

Deriving the Poisson

 

Derived from the binomial the starting point is the binomial formula for probabilities

Is important assumption is the sample taken at random just as in binomial

The Poisson Formula

P( r events)= e-l * lr / r!

 

l  is the parameter of the distribution; the average number of events per sample.

 -e is a constant,

Using Poisson Tables

1) Along the top of the table find l . (Corresponds to average number of events.)

2) Down the side look up probability of given r value. (Variable number of events per sample)

 

Parameters

One parameter, average occurrence of events, once known the shape of the distribution is fixed exactly.  This can be verified from the Poisson formula

Deciding whether Data Fit a Poisson

The theoretical basis of the Poisson is unlikely to be matched exactly in practice.  Basis in the way the sample is taken.  Is a reasonable approximation?

Two test to check if poison is applicable.

1.      Actual situation is compared qualitatively with that on which the distribution is based.

  1. Observed data compared with theoretically expected to see if they match.

 

Using the Poisson to Approximate the Binomial

Done purely for convenience

Binominal approximated by the Poisson

Rule of thumb:  If n > 20, and p < .05. The parameter for approximating Poisson is easily found

l is defined as being equal to the mean of the binomial np.

Degrees of Freedom

 

Defined as the number of observations that are free to vary in estimating a parameter from a sample.

 

Mean ( Sn/n)     n is the sample size

 

Estimation made from n observations, observations not restricted free to vary. 

Arithmetic mean from sample of size n therefore n degrees of freedom.

Second deviation dependent on the first, being its negative and is therefore not free to vary.

 

Standard Deviation has n-1 degrees of freedom, since measures of deviation require a fixed reference point in which to measure the deviation.

 

The degrees of  freedom associated with the estimate of a parameter is the sample size minus the number of observations ‘used up’ because of the need to measure other statistics (such as the mean) before the estimate can be made.

 

t-Distribution

 

Characteristics

 

This distribution allows the standard deviation to be estimated from samples that are smaller than 30.

Similar to normal distribution except longer tails, Continuous the shape is symmetrical.

 

Situations in which the t-Distribution Occurs

All conditions apply to estimation and statistical inference.

a)                 Standard deviation is unknown and has to be estimated form the sample

b)                 Sample size is less than 30

c)                  Underlying distribution is normal, sample size is less than 30, the central limit theorem cannot be invoked.  The t-distribution allows only one of the two reasons for sample size exceeding 30.

 

Derivation of the t-Distribution

Credit to W.S. Gosset

Extend methods of estimation and inference to small samples

 

Tables give required probabilities, 

Wider than normal distribution but the larger the sample size the more certain standard deviation estimate closer the t-distribution is to the normal.

Slightly different each sample size, distribution differs according to degrees of the freedom

Before t-distribution can be used the number of degrees of freedom must be specified.

Based on estimating the standard deviation form a sample, same number of degrees of freedom as the standard deviation, the sample size minus 1

 

Using t-Distribution Tables

 

Use of normal distribution table starts with calculation of a z-value

t-distribution starts with the calculation of this same quantity, but labeled “t” to denote that the distribution being used is not normal:

 

 

t =   / s/            only difference is sample size is smaller than 30.

Table: rows refer to degrees of freedom columns refer to probabilities and body of the table contains t values.

 

The 95% confidence limits for a t-distribution are:       

-          t value varies according to sample size unlike z value which does not change with sample size.  Different confidence levels can be substituted.

-           

t-Test Procedure

 

1) Specify Hypothesis

2) Collect Sample Evidence

3) Select a Significance Level

4) Calculate the t value related to the sample evidence

5) Compare the observed  t value with the t value associated with the significance level. Accept or reject the hypothesis accordingly

 

Parameters

Sample size exceeds 30 the sampling distribution of the mean is almost normal and has tow parameters, the arithmetic mean and standard deviation.

t- distribution has one extra parameter – degree of freedom

with the three parameter  specified the distribution probabilities are fixed.

 

NOTE: Underlying distribution must be normal or you can’t use t!

 

Deciding whether Data have a t-Distribution

Individual distribution is indeed normal; take a small sample and compare observed with expected frequencies.

Four standard distributions encounter so far and are most widely used:

1.                  Binomial

2.                  Normal

3.                  Poisson

4.                  t-Distribution

 

 

Chi-squared  Distribution

 

Chi-squared compares observed sample variance with a hypothesized population variance. It answers the question, “ Is the observed scatter of the sample in accord with what is thought to be the scatter of the population? “ (Remember, the normal, binomial, and Poisson compare mean, not scatter.)  The application of chi-squared is important to statisticians but less so to managers.

 

Characteristics

c2 = (n-1) x Observed sample variance / Population Variance

 

Situations in which the Chi-squared occurs

Assumptions are either

·        The sample has been take at random form a normal population and sample size less then 30 or

·        The population is non-normal, the sample size is 30 or more.

 

Use of Chi-squared Tables

Two tails have to be treated separately.  The rows refer to degrees of freedom, the columns upper or right hand side of the distribution, the value in the body are chi squared values.

 

significance test steps

Compare observed with expected value from chi-square table using usual significance test steps.

 

Using Chi-squared to Test differences in Proportions

The most common managerial use is test for differences in proportions.

 

F Distribution

 

Compares the variance in one sample with the variance in another.

The variable of an F-distribution is the ratio between two variance estimates

Scatter of two samples compared through the ratio of their variances.


Characteristics

F = Variance of sample 1/ Variance of sample 2

 

A pair of degrees of freedom is associated with the F distribution (each is ‘sample - 1’), one for numerator (the upper part of the ratio), one for denominator( the lower part of the ratio).

 

Situations in which the F-Distribution Occurs

Based on taking just one pair of samples and using them in conjunction with theoretically derived F-distribution tables. 

Compare the difference in variances of two samples

Major application is in analysis of variance. 

Two important assumptions:

1)     samples should be selected at random

2)     population from which samples come normally distributed.

 

Use of F Tables

Row refer to the degree of freedom in the denominator

Columns to the degree of freedom in the numerator

Degree of freedom are sample size –1 (n –1)

Critical 5% value is that F ration which leaves 5% in the tail of the distribution and 1% in the tail.  The larger of the variances is on top in the ratio so that only one tail has to be considered.

 

5stages of significance test

a)     hypothesis is that the two samples come form populations with same variance, no difference in quality

b)     evidence is the two sample and their variances, form which an F ratio can be calculated. 

c)      Significance level at the conventional 5%

d)     Critical F value for a 5% significance level and (11,17) degree of freedom is 2.41. 

 

Other Distributions

 

Negative Binomial:

 Applicable to situations, which are almost, but not quite, modeled by poison. Occurs for instance when l varies from family to family in market research. Increasingly important to marketing.

 

 

Beta-binomial:

Used when the binomial parameter p (the proportion of type 1) varies in a certain way across different parts of the population. (For instance, does not predict number of all boy families in a population well, since some families are more prone to boys.

 

Key Message from Module

Poisson, t, chi-squared F, negative binomial and beta binomial areas of application are to problems of inference, specifically estimation and significance testing.

Advantages are twofold

1)                 reduce the need for data collection compared with the alternative of collecting one-off distribution for each and every problem

2)                 each standard distribution bring established knowledge that can widen and speed the analysis.

To summarize distribution is often a better alternative than collecting large amounts of data.

Principal use of standard distribution is in significance testing.


 

Distribution

Situation

Normal

Observations taken (or measurements made) of some quantity which is essentially constant but is subject to may small, additive, independent disturbances.

Binomial

Samples taken form a population in which the elements are of two types.  The variable is the number of elements of one of the types in the sample.

Poisson

Sample taken of a continuum (e.g. time, length).  The variable is the number of ‘events’ in the sample.

t

Similar to the normal but where the standard deviation is estimated form a sample of size < 30

Chi-squared

Sample taken form a normal population.  The variable is based on the ratio between the sample variance and population variance

-f

Two samples taken form a normal population.  The variable is the ratio between the variances of the two samples

Negative binomial

Like the Poisson, but with the parameter, l ,  itself subject to variation across the population

Beta-binomial

Like the binomial, but with the parameter, p , subject to variation across the population

 

Summary of Significance tests

 

Significance test

Distribution

Comparing a sample mean with a population

Normal ( if sample size > = 30) ; t ( if sample size < 30)

Comparing one sample mean with another sample mean

Normal (if combined sample > = 30); t ( if sample size < 30)

Compare a sample variance with a population variance

Chi-squared.

Comparing one sample variance with another sample variance

-f