DESCRIPTIVE MEASURES-Numbers that are used to describe data sets are called descriptive measures

DESCRIPTIVE MEASURES-Numbers that are used to describe data sets are called descriptive measures.

MEASURES OF CENTER-Descriptive measures that indicate where the center or most typical value of a data set lies are called MEASURES OF CENTRAL TENDENCY or more simply, MEASURES OF CENTER. Measures of center are often referred to as AVERAGES. Three most important measures of center: the MEAN, MEDIAN, and MODE. The mean and median apply only to quantitative data whereas the mode can be used with either quantitative or qualitative data.

THE MEAN-the most commonly used measure of center is the mean. An average, is the mean. The MEAN of a data set is the sum of the observation divided by the number of observations. The MEDIAN of a data set is the number that divides the bottom 50% of the data from the top 50%. To obtain the median of a data set, we arrange the data in increasing order and then determine the middle value in the ordered list. If the number of observations is odd, then the MEDIAN is the observation exactly in the middle of the ordered list. If the number of observations is even, then the MEDIAN is the mean of the two middle observations in the ordered list. We let n denote the number of observations, then the median is at position (n + 1)/ 2 in the ordered list. To determine the median of a data set we must first arrange the data in increasing order. Constructing a stem-and-leaf diagram as a preliminary step to ordering the data. The MODE the final measure of center is the MODE. The MODE is the value that occurs most frequently in a data set. A more exact definition of the mode is- obtain the frequency of occurrence of each value and note the greatest frequency. If the greatest frequency is 1 ( no value occurs more than once). Then the data set has no mode. If the greatest frequency is 2 or greater, then any value that occurs with that greatest frequency is called a MODE of the data set. To obtain the modes of a data set, we first construct a frequency distribution for the data using classes based on a single value. The modes can then be determined easily from the frequency distribution.

COMPARISON OF THE MEAN, MEDIAN, AND MODE The mean, median, and mode of a data set are often different. The mean is sensitive to extreme (very large or very small) observations whereas the median is not. The median is usually preferred for data sets that have extreme observation.. The relative positions of the mean and median for right skewed, symmetric, and left-skewed distribution. The mean is pulled in the direction of skewness. The direction of the extreme observation. For a right-skewed distribution, the mean is greater than the median for a symmetric distribution, the man and median are equal; and for a left-skewed distribution, the mean is less than the median. A descriptive measure is called RESISTANT if it is not sensitive to the influence of a few extreme observations. The median is a resistant measure of center, the mean is not. The resistance of the mean can be improved by using TRIMMED MEANS, a percentage of the smallest and largest observations are removed before computing the mean.

MEASURES OF VARIATION; THE SAMPLE STANDARD DEVIATION To describe that difference quantitabely we use a descriptive measure that indicates the amount of variation or spread in a data set. The RANGE of a data set is obtained by computing the difference between the maximum (largest) and minimum (smallest) observations. The range of a data set is the difference between its maximum and minimum observation :Range = Max-mim. In using the range, a great deal of information is ignored—only the largest and smallest observations are considered. Two other measures of variation, the STANDAND DEVIATION and the INTERQUARTILE RANGE. The STANDARD DEVIATION measures variation by indicating how far, on the average the observations are from the mean. We need to know whether it is population data or sample data. The formulas for the standard deviations of sample data and population data differ slightly. A SAMPLE STANDARD DEVIATION is to find how far each observation is from the mean the DEVIATION FROM THE MEAN. To obtain the deviation from the mean for a particular observation we subtract the mean from it. That is we compute x – x The second step is to obtain a measure of the total deviation from the mean for all the observations. The sum always equals 0. The sum of the squared deviations from the mean is called the SUM OF SQUARED DEVIATIONS. And provides a measure of total deviation from the mean for all the observations. Take an average of the squared deviations. By dividing the sum of squared deviations by n-1. The resulting quantity is called a SAMPLE VARIANCE and is denoted by s2 Since it is desirable to have descriptive measure in the original units, the final step in computing a sample standard deviation is to take the square root of the sample variance. Chebychev’s rule: For any data set and any number K > 1, at least 100(1-1/k^2)% of the observations lie within k standard deviations to either side of the mean. K=2 and k = 3. At least 75% of the observations in any data set lie within two standard deviations to either side of the mean. A lest 89% of the observations in any data set lie within three standard deviations to either side of the mean. The empirical rule. For data sets that have approximately bell-shaped distributions, we can improve the estimates by using the rule that Roughly 68% of the observations lie within one standard deviation to either side of the mean. Roughly 95% of the observations lie within two standard deviations to either side of the mean. Roughly 99.7% of the observations lie within three standard deviations to either side of the mean.

GROUPED DATA FORMULAS When data are grouped in a grequency distribution, we use formulas different from the ones we have previously discussed.

PERCENTILES- of a data set divide it into hundredths or 100 equal parts. P1 is the number that divides the bottom 1% of the data from the top 99% ; Deciles of a data set divide it into tenths, or 10 equal parts. D1 is the number that divides the bottom 10% of the data from the top 90% The quartiles of a data set divide it into quarters, or four equal parts. The first quartile is the number that divides the bottom 25% of the data from the top 75% The first quartile is at position (n+ 1)/4. The second quartile is the median, (n+1)/2. The third quartile is at position 3 (n+1)/4. The INTERQUARTILE RANGE is the preferred measure of variation when the median is used as the measure of center.