Quantitative
Methods:
Module
7: Distributions
There are two general types of distributions:
Observed |
Derived form collection data in some situation and the
distribution is peculiar to that one situation |
Standard (Theoretical or Probability) |
derived mathematically and is , theoretically, applicable
to all situations exhibiting the appropriate mathematical characteristics Advantage that time and effort spent on
data collection are saved. |
1) A collection of numbers, which are a measurements of a variable.
2) The numbers (or observations, or readings,
or data points) can be classed according to frequency of occurrence, and
the classes (or ranges) so noted, or alternatively the data can be pictured
graphically in a frequency histogram.
3) The frequency histogram is a descriptive
device only.
4) To suit analytical objectives, a probability
histogram can be developed.
5) Probability histograms are created by applying the following
formula to the frequency classes in a frequency histogram:
P(number
lies in class x) = Frequency class x
Total Frequency
6) Once the frequency histogram becomes a probability
histogram, by making the units of the vertical axis probabilities
instead of frequencies. Once the
histogram is in a probability form, it is referred to as a distribution.
7) Alternate representation of a frequency distribution is a
as a cumulative frequency distribution.
Instead of showing the frequency for each class, accumulative frequency
distribution shows the frequency for that class and all smaller classes.
8) Cumulative frequencies put into the form of graphs are
known as ogives.
Probability is likelihood of an event taking place; it takes a value between 0 and 1
Measured by
three methods
A
priori
Relative
frequency
Subjective
Assessment
Further Properties of Probability
|
1) Mutually exclusive |
The occurrence of one rules out the occurrence of another |
P(A or B or
C or...) = P(A) + P(B) + P( C)... |
|
2) Conditional probability
|
Probability of an event under the condition that another
event has occurred or will occur. |
P(A|B)=
probability of A given the occurrence of B |
|
3) Independent event
Multiplication law of probability for independent
events: |
Unaffected by the occurrence or non-occurrence of the
other events. If events are independent, their probabilities can be
multiplied. |
P(A) =
P(A/B) then P( A and B and C) = P(A) x P(B) x P(C) |
|
5) Combinations |
Closely related with probability calculations is the idea
of combination. A combination is defined as the number of different ways in
which r objects can be chosen from
a total of n objects. |
(r!
x (n - r)!) |
|
Observed distribution |
Implies that data have been collected, probabilities
calculated and histograms formed;
Collected for each and every situation |
|
Standard distribution |
Implies that the situation in which the data are being
generated resembles closely a theoretical situation for which a distribution
has been constructed mathematically. |
|
Normal |
Theoretical
situation for a variable being generated by a process which sough give the
variable a constant value, but does
not due to many small disturbances. |
Discrete (value taken by the variable are distinct) stepped shapes
can be right-skewed, symmetrical to left-skewed depending upon the situation in
which the data was collected.
Constructed mathematically from theoretical situations
The binomial relates to situations in which a sample is
taken from a population, which consists of two types of element (hence the name
‘binomial’).
Binomial probabilities are the probabilities of obtaining
different numbers of each type in the sample.
Actual situations in which the
binomial is used are:
1) Inspection schemes
2) Opinion polls (the two way
split of the population results from agreement/disagreement with statements
made by the pollster);
3) Selling (the split is the
outcome of a contact being a sale/no sale)
Calculated from probabilities are binomial probabilities, The histogram formed form them is a binomial distribution.
General formula by which binomial probabilities are
calculated is:
P(r of type
1 in same) = nCr x pr x (1-p)n-r
-n = |
sample size |
|
-p = |
proportion of type 1 in population |
|
1-p = |
proportion of type 2 in population |
|
nCr =
|
n!/(r! x (n – r_!) |
|
-n! (pronounced ‘n factorial’) = |
n x (n-1) x (n-2 ) x …….x 2 x 1 |
Questions are typically of the form:
“If the population is composed of x % of one of two types,
then what is the probability that a randomly assembled group of size y will
contain a (or b,c,d, whatever) of type one?”
To solve:
1) Locate binomial table
2) Find sample size on left side
of table. In the generalized example above, n=y. The table includes sample
sizes from 1 to 8.
3) Find probability of x% on top of table.
4) Read down column corresponding
to r = a,b,c,d ...
5) Reminder: n = sample size. r =
objects within sample of a given type. p
= probability
Binomial distributions have two parameters:
1) Sample size: n
2) Population proportion p of the first type.
Probabilities and Histograms
Right Skewed: When p
close to 0.
Left Skewed: When p is large (close to 1).
Symmetrical When p is near 0.5 or n is
large.
Assumption that the sample is selected at random
Take some observation and comparing them with what would be expected
theoretically if the normal applied
This check is not part of the analysis itself but to decide whether the situation is binomial.
Is Bell shaped and symmetrical, it is also continuous. (The
binomial is discrete.)
Situation that normal occurs is constructed mathematically
Binominal distinct value e.g. 1,2,3
Continuous variable e.g. 4.156,4.157,4.158
Probabilities are measured not by the height of the
distribution above the x-axis (as in the discrete case), but by the areas under
the curve.
Is constructed mathematically from the following theoretical situation:
Repeated observations or measurements are taken of the same constant quantity. Each time the an observation is take the quantity is subject to many sources of disturbance. Each source gives rise to a small variation in the value of the quantity. The variations are equally likely to be positive or negative at random. They are independent of one another they can be added together.
Because positive and negative cancel out it is the tendency most values to be close to the central value .
Actual situations which normal distribution has been found to apply include:
1) IQ’s of children
2) Heights of people of the same sex
3) Dimensions of mechanically
produced components
4) Weights of machine produced
items
5) Arithmetic means of large
samples.
Each time observation is taken it is subject to many sources
of variation each source gives rise to change in value. Chances and equally
likely positive or negative the variation independent and additive.
Probabilities measured not by height or distributions above
x-axis (as in discrete) by areas under the curve.
1) Find mean of data under study
2) Find SD of data under study
3) Subtract data point from mean, and divide by SD to
determine how many SD’s away from mean the data point is.
4) The number found in 3) above is z on the Normal table. Use z
to locate % of population underneath curve at that point.
5) For questions asking “How many in population greater than
a certain number?”, subtract found area from .5, since the mean itself is .5 of
the area as it is centered on the bell curve.
6) Since Bell Curve is symmetrical, can be used for areas on
either side of the mean.
7) Can use normal curve to find the reverse of the above,
for instance “Within how many standard deviations can be found an area of y?”
The normal curve has two parameters, the arithmetic mean,
and the standard deviation. Contrast
this with the binomial distribution whose two parameters are the sample size
and the population proportion. Two
normal distribution with the same mean and the same standard deviation will be
identical; two normal distributions with different means and standard
deviation, while still having the characteristics of a bell curve they will be
centered differently and be of different widths.
To decide if data fit the normal curve, find mean and SD of
data, and determine if the % of total data within 1s, 2s, and 3s closely match
the theoretical or not.
Theoretically expected from the normal distribution
Theoretical %
Mean +/- 1 SD = 68%
Mean +/- 2 SD = 95%
The normal distribution is constructed on the basis of a
theoretical situation and a set of assumptions which are not likely to match
exactly the real situation to which it is being applied. When it is suspected that a standard
distribution can be applied to a situation, it may be prudent to check that the
variable does, approximately fit the distribution. To do this the observation should be compared to what is
theoretically expected.
The Binomial is difficult to use because of its complicated formula and lengthy probability tables. (A separate table is needed for each sample size.) The normal is easier to use because of its simple single table. For certain parameter values the shape of the binomial is similar to theh normal.
The binomial is roughly symmetrical when p is close to 0, or when n is large.
Rule of Thumb: The binomial can be approximated
when np and n(1 - p) both exceed 5.
When these conditions hold, then the following approximations apply:
|
Arithmetic
Mean = |
np |
|
Standard
Deviation = |
|
Parameters are calculated from the available data and then
used as if the distribution were truly normal
Instead of relating to the numbers of each type in the sample, the distribution
could also refer to the proportion;
instead of the number of defective parts in a sample, the distribution could be
based on the proportion of defectives.
The parameters then become (just dividing by
n):
|
Arithmetic
Mean = |
p |
|
Standard
Deviation = |
|
Analysis of management problems often involves
probabilities. The analysis
frequently based on the use of observed or standard distributions are two of
the most important and useful:
|
Observed distribution |
Usually entails
the collection of large amounts of data from which to form histograms and
estimate probabilities. |
|
Standard distribution |
Mathematically derived form a theoretical situation. If an actual situation matches (to a reasonable
approximation) the theoretical then the standard distribution can be used
both to describe and analyze the situation. As a result fewer data need be collected. |
The principal behind the use of any standard distribution
are the same, but each is associated with a different situation.