Quantitative Methods:

Module 7:  Distributions

Introduction

There are two general types of distributions:

Observed

Derived form collection data in some situation and the distribution is peculiar to that one situation

Standard (Theoretical or Probability)

derived mathematically and is , theoretically, applicable to all situations exhibiting the appropriate mathematical characteristics

Advantage that time and effort spent on data collection are saved.  

Observed Distributions

1) A collection of numbers, which are a measurements of a variable.

2) The numbers (or observations, or readings, or data points) can be classed according to frequency of occurrence, and the classes (or ranges) so noted, or alternatively the data can be pictured graphically in a frequency histogram.

3) The frequency histogram is a descriptive device only.

4) To suit analytical objectives, a probability histogram can be developed.

5) Probability histograms are created by applying the following formula to the frequency classes in a frequency histogram:

P(number lies in class x) = Frequency class x

           Total Frequency

6) Once the frequency histogram becomes a probability histogram, by making the units of the vertical axis probabilities instead of frequencies.   Once the histogram is in a probability form, it is referred to as a distribution.

7) Alternate representation of a frequency distribution is a as a cumulative frequency distribution.  Instead of showing the frequency for each class, accumulative frequency distribution shows the frequency for that class and all smaller classes. 

8) Cumulative frequencies put into the form of graphs are known as ogives.

Probability Concepts

Probability is likelihood of an event taking place; it takes a value between 0 and 1

            Measured by three methods

                        A priori

                                Relative frequency

                        Subjective Assessment

Further Properties of Probability

1) Mutually exclusive

The occurrence of one rules out the occurrence of another

P(A or B or C or...) =

P(A) + P(B) + P( C)...

2) Conditional probability

Probability of an event under the condition that another event has occurred or will occur.

P(A|B)= probability of A given the occurrence of B

3) Independent event Multiplication law of probability for independent events:

Unaffected by the occurrence or non-occurrence of the other events.

If events are independent, their probabilities can be multiplied.

P(A) = P(A/B) then

P( A and B and C) = P(A) x P(B) x P(C)

5) Combinations

Closely related with probability calculations is the idea of combination. A combination is defined as the number of different ways in which r objects can be chosen from a total of  n objects.

 

nCr=    n!

                   (r! x (n - r)!)

 

Standard Distributions

 

Observed distribution

Implies that data have been collected, probabilities calculated and histograms formed;  Collected for each and every situation

Standard distribution

Implies that the situation in which the data are being generated resembles closely a theoretical situation for which a distribution has been constructed mathematically.

Normal

 Theoretical situation for a variable being generated by a process which sough give the variable a constant value,  but does not due to many small disturbances.

 

Binomial Distribution

Characteristics

Discrete (value taken by the variable are distinct) stepped shapes can be right-skewed, symmetrical to left-skewed depending upon the situation in which the data was collected.

Situations in which Binominal Occur

Constructed mathematically from theoretical situations

The binomial relates to situations in which a sample is taken from a population, which consists of two types of element (hence the name ‘binomial’).

Binomial probabilities are the probabilities of obtaining different numbers of each type in the sample.

Actual situations in which the binomial is used are:

1) Inspection schemes

2) Opinion polls (the two way split of the population results from agreement/disagreement with statements made by the pollster);

3) Selling (the split is the outcome of a contact being a sale/no sale)

 

Deriving the Binomial Distribution

Calculated from probabilities are binomial probabilities,  The histogram formed form them is a binomial distribution. 

General formula by which binomial probabilities are calculated is:

            P(r of type 1 in same) = nCr x pr x (1-p)n-r

 

-n =

 sample size

-p =

proportion of type 1 in population

1-p =

proportion of type 2 in population

nCr =

n!/(r! x (n – r_!)

-n! (pronounced ‘n factorial’) =

n x (n-1) x (n-2 ) x …….x 2 x 1

Using Binomial Tables

Questions are typically of the form:

“If the population is composed of  x % of  one of two types, then what is the probability that a randomly assembled group of size y will contain a (or b,c,d, whatever) of type one?”

 

To solve:

1) Locate binomial table

2) Find sample size on left side of table. In the generalized example above, n=y. The table includes sample sizes from 1 to 8.

3) Find probability of x% on top of  table.

4) Read down column corresponding to r = a,b,c,d ...

5) Reminder: n = sample size. r = objects within sample of a given type. p = probability

Parameters

 

Binomial distributions have two parameters:

1) Sample size: n

2) Population proportion p of the first type.

Probabilities and Histograms

Right Skewed: When p close to 0.

Left Skewed:  When p is large (close to 1).

Symmetrical When p is near 0.5 or n is large.

 

Deciding whether Data Fit a Binomial

Assumption that the sample is selected at random

Take some observation and comparing them with what would be expected theoretically if the normal applied

This check is not part of the analysis itself but to decide whether the situation is binomial. 

The Normal Distribution

Characteristic

Is Bell shaped and symmetrical, it is also continuous. (The binomial is discrete.)

Situation that normal occurs is constructed mathematically

Binominal distinct value e.g. 1,2,3

Continuous variable e.g. 4.156,4.157,4.158

Probabilities are measured not by the height of the distribution above the x-axis (as in the discrete case), but by the areas under the curve. 

Situations in which the Normal Occur

Is constructed mathematically from the following theoretical situation:

Repeated observations or measurements are taken of the same constant quantity.  Each time the an observation is take the quantity is subject to many sources of disturbance.  Each source gives rise to a small variation in the value of the quantity.  The variations are equally likely to be positive or negative at random. They are independent of one another  they can be added together.

Because positive and negative cancel out it is the tendency most values to be close to the central value .

Actual situations which normal distribution has been found to apply include:

1) IQ’s of children

2) Heights of people of the same sex

3) Dimensions of mechanically produced components

4) Weights of machine produced items

5) Arithmetic means of large samples.

Deriving the Normal Distribution

Each time observation is taken it is subject to many sources of variation each source gives rise to change in value. Chances and equally likely positive or negative the variation independent and additive.

Using the Normal Curve Table

Probabilities measured not by height or distributions above x-axis (as in discrete) by areas under the curve.

 

1) Find mean of data under study

2) Find SD of data under study

3) Subtract data point from mean, and divide by SD to determine how many SD’s away from mean the data point is.

4) The number found in 3) above is z on the Normal table. Use z to locate % of population underneath curve at that point.

5) For questions asking “How many in population greater than a certain number?”, subtract found area from .5, since the mean itself is .5 of the area as it is centered on the bell curve.

6) Since Bell Curve is symmetrical, can be used for areas on either side of the mean.

7) Can use normal curve to find the reverse of the above, for instance “Within how many standard deviations can be found an area of y?”

 

Parameters

The normal curve has two parameters, the arithmetic mean, and the standard deviation.  Contrast this with the binomial distribution whose two parameters are the sample size and the population proportion.  Two normal distribution with the same mean and the same standard deviation will be identical; two normal distributions with different means and standard deviation, while still having the characteristics of a bell curve they will be centered differently and be of different widths.

 

Decide whether Data Fit a Normal Distribution

To decide if data fit the normal curve, find mean and SD of data, and determine if the % of total data within 1s, 2s, and 3s closely match the theoretical or not.

Theoretically expected from the normal distribution Theoretical %

Mean +/- 1 SD = 68%

Mean +/- 2 SD = 95%

Mean +/- 3 SD = 99.7%

 

The normal distribution is constructed on the basis of a theoretical situation and a set of assumptions which are not likely to match exactly the real situation to which it is being applied.  When it is suspected that a standard distribution can be applied to a situation, it may be prudent to check that the variable does, approximately fit the distribution.  To do this the observation should be compared to what is theoretically expected. 

 

Approximating the Binomial with the Normal

 

The Binomial is difficult to use because of its complicated formula and lengthy probability tables. (A separate table is needed for each sample size.)   The normal is easier to use because of its simple single table.   For certain parameter values the shape of the binomial is similar to theh normal. 

The binomial is roughly symmetrical when p is close to 0, or when n is large.

 

Rule of Thumb: The binomial can be approximated when np and n(1 - p) both exceed 5. When these conditions hold, then the following approximations apply:

 

Arithmetic Mean       =

np

 

Standard Deviation =

 Önp(1 - p)

Parameters are calculated from the available data and then used as if the distribution were truly normal


Instead of relating to the numbers of each type in the sample, the distribution could also refer to the proportion; instead of the number of defective parts in a sample, the distribution could be based on the proportion of defectives.  The parameters then become (just dividing by n):

 

Arithmetic Mean       =

p

 

Standard Deviation =

   p(1 - p)

          n

Key Message

Analysis of management problems often involves probabilities.    The analysis frequently based on the use of observed or standard distributions are two of the most important and useful:

 

Observed distribution

 Usually entails the collection of large amounts of data from which to form histograms and estimate probabilities.

Standard distribution

Mathematically derived form a theoretical situation. 

If an actual situation matches (to a reasonable approximation) the theoretical then the standard distribution can be used both to describe and analyze the situation.

As a result fewer data need be collected.

 

The principal behind the use of any standard distribution are the same, but each is associated with a different situation.