Quantitative
Methods:
Module 5: Summary Measures
1)
Location of Numbers
2)
Scatter
3)
Shape of Data
|
Arithmetic
Mean |
Sum of Reading Number of readings å X/n |
|
Median |
The middle value of a set of numbers. (Note:
For an even set of numbers, take the arithmetic mean of the middle two
numbers.) |
|
Mode |
Most frequent value. |
Symmetrical Distribution: |
Mean is most useful |
U shaped Distribution: |
Since there is no middle of the road data, mean
and median both are not so useful. Better to quote two modes, one for each
cluster. (Common for TV show viewing statistics) |
Reverse J Distribution: |
Truncated at the end with no value less
than zero. (Common with sickness records). Median is the best. |
1) Focus for the eye
2) For Comparison Purposes
3) The Mean is Pre-eminent
Choice between mean,
median and mode is often easy one in measure of location arithmetic Mean is pre-eminent
easy to calculate, use and is widely understood and recognized. Always used unless good reason not to
|
Outliers. |
Use
Median |
when outlier distortion is present |
|
Clusters. |
Use
Mode |
when
cluster distortion is present. |
|
Average of Averages: |
Return
to the original data when asked for an average of the average. |
|
Measure extent to which
the readings are grouped closely or scattered over wide interval.
Range |
Largest Value - Smallest Value |
|
Interquartile Range |
Range after eliminating
the highest and lowest 25%. |
|
Mean Absolute Deviation (MAD)
|
Sum (difference between each reading and mean) Number of readings |
__ S|x- x| n |
Variance |
Sum of squares of
deviation of each reading from the mean Number of readings -1 |
S(x- x)2 (n-1) |
Standard Deviation |
________ Ö
Variance |
_____________ Ö
_ S(x- x)2
(n-1) |
Calculating
Measure of Dispersion
S(x- x)2 = [S(x)2- n x x2
|
|
Advantage |
Disadvantage |
|
Range |
Easily
Understood/Familiar |
Outlier
Distortion. Descriptive
Only |
|
|
|
|
|
Inter-Quartile
Range |
Easily
Understood |
Not
Well Known. Descriptive
Only. |
|
|
|
|
Mean Absolute Deviation |
Intuitively
Sensible. |
Unfamiliar.
Difficult Math to handle. |
|
|
|
|
|
Variance |
Easy to Handle Math. Used
in Other Theories. |
Wrong
Units. No
Intuitive Appeal. |
|
|
|
|
|
Standard
Deviation |
Easy
to Handle Math. Used
in Other Theories |
Too
involved for Descriptive Purposes. |
When
there are differences in the means of the two groups, a measure of scatter must
be ‘standardized’ before comparison of relative variation can be made. The
coefficient of variation does this.
Coefficient
of Variation
= Standard Deviation
Arithmetic Mean
Skew:
|
Extent to which a distribution is
non-symmetrical. [Left Skewed(-),Right Skewed(+), zero-skewed] |
Kurtosis
|
Measures the extent to which the distribution
is “punched in” or “filled out”.(low, medium, high) |
1) Twyman’s Law:
|
Interesting
or unusual data is usually wrong – Look
for mistakes and correct |
2) Part of the Pattern:
|
Decide whether part of pattern of the usual data— Include
in calculation |
3) Isolated Events:
|
Isolate
event not part of the usual data pattern, exclude,
note reason in the summary |
Indicies/Index:
|
Summarize movement of variable over time. |
Simple index
|
Conversion
of one series into another based on 100. 1)
Base year set to 100 2)
Years prior to or after base year are expressed in percent. 3) Example: Base year is 12.4. Next year is 8.6. If base year of 12.4 isset to 100, then ‘next year’ is 8.6/12.4 x 100 = 69. |
Simple Aggregate Index:
|
In this case, add together multiple factors
under consideration (for instance, the aggregate price of beef, pork, and
lamb), and then baseline to 100 per the simple index method.. |
Disadvantage: Severe price drop in
single factor can bring down entire index. To counter this, a price relative
index can be constructed. ( First convert prices into an individual index,
then these individual indices are averaged to give the overall index |
Weighted Aggregate index:
|
Allows
different weights to be given to the different prices. |
|
Laspeyres
Index.
|
Prices
first weighted by quantity and final index formed from the resulting total
Quantity should be the same for each month. |
Disadvantage is the weights in the base year may soon become out of date and no
longer representative. |
Paasche Index
|
Takes
weights form themost recent time period and the weighting therefore change
form each time period to the next. Always
uses the most up-to-date weight, ,. |
Disadvantage is that when new weightings arrive, then the entire past series must
be revised. |
Fixed weight index
|
uses neither the base
period (Laspeyres), nor most recent period prior to base month (Paasche), but
uses a weighting from some intermediate period--possible an average weighting
of several periods. |
|
Form a model of the data
(Pattern or Summary), Simple or Complex summary measures can provide a model base
on specifying data sets
|
1) Number Readings |
Easily supplied |
|
2) A measure of location |
Discussed above |
|
3) A measure of scatter |
Discussed above |
|
4) The shape of the distribution |
Draw a histogram and literally describe shape – short verbal
statement about shape (Symmetrical, U and reverse. |
Verbal statement short 1 sentence use two ways
1)
Quantitative measure are inadequate
2)
Point out important feature of the data