Quantitative Methods:

Module 5:  Summary Measures

 

Summary Measures Show:

1) Location of Numbers

2) Scatter

3) Shape of Data

 

Measures of Location (Also called measures of Central tendency)

To show in general terms the size of data in question Useful summaries, but can be misleading

 

Arithmetic Mean

 Sum of Reading

 Number of readings         å X/n

Median

 The middle value of a set of numbers. (Note: For an even set of numbers, take the arithmetic mean of the middle two numbers.)

Mode

 Most frequent value.

 

 

Symmetrical Distribution:

 Mean is most useful

U shaped Distribution:

 Since there is no middle of the road data, mean and median both are not so useful. Better to quote two modes, one for each cluster. (Common for TV show viewing statistics)

Reverse J Distribution:

 Truncated at the end with no value less than zero. (Common with sickness records). Median is the best.

 

Other Uses for Measures of Location:

1) Focus for the eye

2) For Comparison Purposes

3) The Mean is Pre-eminent

 

Choice between mean, median and mode is often easy one in measure of location arithmetic Mean is pre-eminent easy to calculate, use and is widely understood and recognized.  Always used unless good reason not to

Distortion

 

Outliers.

Use Median

 when outlier distortion is present

Clusters.

Use Mode

when cluster distortion is present.

Average of Averages:

Return to the original data when asked for an average of the average.

 

Measures of Scatter (Also Called Measures of Dispersion)

Measure extent to which the readings are grouped closely or scattered over wide interval.

 

Range

Largest Value - Smallest Value

 

Interquartile Range

Range after eliminating the highest and lowest 25%.

 

Mean Absolute Deviation (MAD)

Sum (difference between each reading and mean)

Number of readings

        __  

 S|x- x|

     n  

Variance

Sum of squares of deviation of each reading from the mean

Number of readings -1

                    

          S(x- x)2

            (n-1)

Standard Deviation

   ________

Ö  Variance

   _____________

Ö                     _

          S(x- x)2

       (n-1)

 

Calculating Measure of Dispersion

                                                    

          S(x- x)2         = [S(x)2- n x  x2

Advantages and Disadvantages of Various Scatter Measurements

 

 

Advantage

Disadvantage

Range

Easily Understood/Familiar

Outlier Distortion.

Descriptive Only

 

 

 

Inter-Quartile Range

Easily Understood

Not Well Known.

Descriptive Only.

 

 

 

Mean Absolute Deviation

Intuitively Sensible.

Unfamiliar. Difficult Math to handle.

 

 

 

Variance

Easy to Handle Math.

Used in Other Theories.

Wrong Units.

No Intuitive Appeal.

 

 

 

Standard Deviation

Easy to Handle Math.

Used in Other Theories

Too involved for Descriptive Purposes.

 

Coefficient of Variation

When there are differences in the means of the two groups, a measure of scatter must be ‘standardized’ before comparison of relative variation can be made. The coefficient of variation does this.

                                               

Coefficient of Variation = Standard Deviation

                                              Arithmetic Mean

Other Summary Measures

Skew:

 Extent to which a distribution is non-symmetrical. [Left Skewed(-),Right Skewed(+), zero-skewed]

Kurtosis

 Measures the extent to which the distribution is “punched in” or “filled out”.(low, medium, high)

 

Dealing with Outliers

1) Twyman’s Law:

Interesting or unusual data is usually wrong –

Look for mistakes and correct

2) Part of the Pattern:

 Decide whether part of pattern  of the usual data—

Include in calculation

3) Isolated Events:

Isolate event not part of the usual data pattern,

exclude, note reason in the summary

 

Indicies

Indicies/Index:

 Summarize movement of variable over time.

Simple index

Conversion of one series into another based on 100.

How to:

1) Base year set to 100

2) Years prior to or after base year are expressed in percent.

3) Example: Base year is 12.4. Next year is 8.6. If base year of 12.4 isset to 100, then ‘next year’ is 8.6/12.4 x 100 = 69.

 

 

 

Simple Aggregate Index:

 In this case, add together multiple factors under consideration (for instance, the aggregate price of beef, pork, and lamb), and then baseline to 100 per the simple index method..

Disadvantage: Severe price drop in single factor can bring down entire index. To counter this, a price relative index can be constructed. ( First convert prices into an individual index, then these individual indices are averaged to give the overall index

Weighted Aggregate index:

Allows different weights to be given to the different prices. 

 

Laspeyres Index.

Prices first weighted by quantity and final index formed from the resulting total Quantity should be the same for each month.

Disadvantage is the weights in the base year may soon become out of date and no longer representative.

Paasche Index

Takes weights form themost recent time period and the weighting therefore change form each time period to the next.  Always uses the most up-to-date weight, ,.

Disadvantage is that when new weightings arrive, then the entire past series must be revised.

Fixed weight index

 uses neither the base period (Laspeyres), nor most recent period prior to base month (Paasche), but uses a weighting from some intermediate period--possible an average weighting of several periods.

 

 

Key Message from Module

 

Form a model of the data (Pattern or Summary), Simple or Complex summary measures can provide a model base on specifying data sets

 

1)  Number Readings

Easily supplied

2) A measure of location

Discussed above

3) A measure of scatter

Discussed above

4) The shape of the distribution

Draw a histogram and literally describe shape – short verbal statement about shape (Symmetrical, U and reverse.

 

Verbal statement short 1 sentence use two ways

1)                 Quantitative measure are inadequate

2)                 Point out important feature of the data