Descriptive Statistics

Describing data sets includes discussing the shape, center, and spread.  The shape of a data set is based on inspection of stem-and-leaf charts, frequency distributions, and histograms. These visual representations are only usable if the data set is small enough to create the charts by hand, or if graphing calculators or computer software is used to create the displays. 

When the data sets are large, or visual representations are not accurate enough, there is a need for compact and precise ways of describing and characterizing the center and spread of the data.  Summarizing and conveying important features of a data set can be completed by using a few numerical summary statistics computed from the data. 

Measures of central tendency of a data set describe the location of the center of the data with three different statistics: meanmedian, and midrange

The mean of a data set is the arithmetic average of the individual data.  The mean is computed by summing up all of the data and dividing the sum by the number of data observations. 

The median of a data set is the middle value of the data set, when the data has been listed in order from smallest to largest.  Any repeated values are included so that every observation appears in the ordered list.  The median divides the data set into two equal parts.  When the number of data observations, n, is odd, the median is the single middle value.  When the number of data observations, n, is even, there are two middle values in the ordered list, and the median is the average of these two values. 

The midrange of the data set is the average of the largest and smallest observations.  The midrange is calculated by adding the largest value to the smallest value and dividing by 2. 

Example: Use the following data set to calculate the mean, median, and midrange. 
 

1

3

5

7

8

9

11

13

The number of observations is  n =  8. 
The mean is calculated as (1 + 3 + 5 + 7 + 8 + 9 + 11 + 13)/8 = 7.125 
The median is found by averaging the middle two numbers, (7 + 8)/2 = 7.5 
The midrange is calculated as (1 + 13)/2 = 7. 
 
Mean, median, and midrange provide different measures of the center of a data set.  A measure of center alone can be misleading.  Two nations with the same median family income are very different if one has extreme wealth and poverty, while the other nation has little variation among the family incomes.  The descriptive statistics of a data set should include both a measurement of the center and a measurement of spread. 
 

Measures of spread of a data set describe the spread of the data values around the center.  The measures of spread indicate how the data varies from the center of the data.  The simplest measure of  spread is the range, which is the difference between the largest and smallest values.  Generally, more variability will be reflected in a larger range.  The two most commonly used measures of spread are the variance and the standard deviation of the data set.  The variance is the square of the standard deviation, and must be calculated first.  The variance is calculated by subtracting the mean from each individual data value, squaring the result, summing each square, and dividing by the number of observations minus 1. 

Example:  Use the following data set to calculate the variance and the standard deviation. 
 

data

data - mean

(data - mean)2

1

-6.125

37.515625

3

-4.125

17.015625

5

-2.125

4.89515625

7

-.125

0.015625

8

.875

0.765625

9

1.875

3.515625

11

3.875

15.015625

13

5.875

34.515625

mean = 7.125

total = 0

total = 112.875

Note: The sum of the differences of the data and the mean should always add to zero, when calculated correctly. 

The variance is calculated by  112.875/(8 - 1) = 112.875/7 = 16.125. 

The standard deviation is calculated by calculating the square root of the variance = 4.015594601, or 4.016. 
 

Another measure of spread is the range of the data set.  The range is calculated by subtracting the minimum value from the maximum value of the data set.  (range = max. value - min. value) 

Another measure of spread is the interquartile range.  The interquartile range is resistant to the effects of outliers.  It is based on quantities called quartiles.  The lower quartile separates the bottom 25% of the data set from the upper 75%, and the upper quartile separates the top 25% from the bottom 75% of the data set.  (The middle quartile is known as the median, since it separates the bottom 50% of the data set from the top 50% of the data set.)  The quartiles are obtained by dividing the 'n' observations into a lower half and a upper half.  If 'n' is odd, the median is included in both halves.  The upper and lower quartiles are the medians of the two halves. 

The interquartile range, "IQR", is calculated by subtracting the lower quartile from the upper quartile.  (IQR = upper quartile - lower quartile) 
 
Example: Use the following data set to calculate the range and the interquartile range. 
 

1

5

8

13

15

19

22

26

30

The maximum value is 30 and the minimum value is 1. 
The range of this data set is 30 - 1  = 29. 
The median of this data set is 15. 
The lower quartile is 8 and the upper quartile is 22. 
The interquartile range is 22 - 8 = 14. 

For information on how to use the graphing calculators to find the measures of center and measures of spread, please see the pages in the left margin titled "One Variable Statistics" and "Two Variable Statistics."