Descriptive Statistics Describing data sets includes discussing the shape, center, and spread. The shape of a data set is based on inspection of stem-and-leaf charts, frequency distributions, and histograms. These visual representations are only usable if the data set is small enough to create the charts by hand, or if graphing calculators or computer software is used to create the displays. When the data sets are large, or visual representations are not accurate enough, there is a need for compact and precise ways of describing and characterizing the center and spread of the data. Summarizing and conveying important features of a data set can be completed by using a few numerical summary statistics computed from the data. Measures of central tendency of a data set describe the location of the center of the data with three different statistics: mean, median, and midrange. The mean of a data set is the arithmetic average of the individual data. The mean is computed by summing up all of the data and dividing the sum by the number of data observations. The median of a data set is the middle value of the data set, when the data has been listed in order from smallest to largest. Any repeated values are included so that every observation appears in the ordered list. The median divides the data set into two equal parts. When the number of data observations, n, is odd, the median is the single middle value. When the number of data observations, n, is even, there are two middle values in the ordered list, and the median is the average of these two values. The midrange of the data set is the average of the largest and smallest observations. The midrange is calculated by adding the largest value to the smallest value and dividing by 2. Example: Use the following data set to calculate the mean, median, and
midrange.
The number of observations is n = 8. Measures of spread of a data set describe the spread of the data values around the center. The measures of spread indicate how the data varies from the center of the data. The simplest measure of spread is the range, which is the difference between the largest and smallest values. Generally, more variability will be reflected in a larger range. The two most commonly used measures of spread are the variance and the standard deviation of the data set. The variance is the square of the standard deviation, and must be calculated first. The variance is calculated by subtracting the mean from each individual data value, squaring the result, summing each square, and dividing by the number of observations minus 1. Example: Use the following data set to calculate the variance and
the standard deviation.
Note: The sum of the differences of the data and the mean should always add to zero, when calculated correctly. The variance is calculated by 112.875/(8 - 1) = 112.875/7 = 16.125. The standard deviation is calculated by calculating the square root of the variance =
4.015594601, or 4.016. Another measure of spread is the range of the data set. The range is calculated by subtracting the minimum value from the maximum value of the data set. (range = max. value - min. value) Another measure of spread is the interquartile range. The interquartile range is resistant to the effects of outliers. It is based on quantities called quartiles. The lower quartile separates the bottom 25% of the data set from the upper 75%, and the upper quartile separates the top 25% from the bottom 75% of the data set. (The middle quartile is known as the median, since it separates the bottom 50% of the data set from the top 50% of the data set.) The quartiles are obtained by dividing the 'n' observations into a lower half and a upper half. If 'n' is odd, the median is included in both halves. The upper and lower quartiles are the medians of the two halves. The interquartile range, "IQR", is calculated by subtracting
the lower quartile from the upper quartile. (IQR = upper quartile - lower
quartile)
The maximum value is 30 and the minimum value is 1. For information on how to use the graphing calculators to find the measures of center and measures of spread, please see the pages in the left margin titled "One Variable Statistics" and "Two Variable Statistics." |