DESCRIPTIVE STATISTICS ORGANIZAING DATA VARIABLES AND DATA-

DESCRIPTIVE STATISTICS ORGANIZAING DATA VARIABLES AND DATA-

A characteristic that varies from one person or thing to another is called a variable. The first three of these variables yield numerical information and are examples of QUANTITATIVE VARIABLES; the last three yield nonnumeric information and are examples of QUALITIATIVE VARIABLES, also referred to as CATEGORICAL VARIABLES. Quantitative variables can be classified as either DESCRETE or CONTINUOUS. A discrete variable is one whose possible values from a finite (or countable infinite) set of numbers, usually some collection of whole numbers. A discrete variable usually involves a count of something.

A COUNTINUOUS VARIABLE is a variable whose possible values form some interval of numbers.

VARIABLES-Variable: A characteristic that varies from one person or thing to another. QUALITIATIVE VARIABLE: A nonnumerically valued variable.

QUANTITATIVE VARIABLE: a numerically valued variable.

DISCRETE VARIABLE: A quantitative variable whose possible values form a finite (or countable infinite) set of numbers.

CONTINUOUS VARIABLE: A quantitative variable whose possible values form some interval of numbers. Observing the values of a variable for one or more people or things yields data. The information collected, organized, and analyzed by statisticians is data. QUALITATIVE, QUANTITATIVE, DISCRETE, AND CONTINUOUS are used to describe data as well as variables: qualitative data are data obtained by observing values of a qualitative variable; quantitative data are data obtained by observing values of a quantitative variable; and so forth. Definition DATA- Information obtained by observing values of a variable.

QUALITATIVE DATA: Data obtained by observing values of a qualitative variable.

QUANTITATIVE DATA Data obtained by observing values of a quantitative variable.

DISCRETE DATA: Data obtained by observing values of a discrete variable. CONTINUOUS DATA: Data obtained by observing values of a continuous variable. Each individual piece of data is called an observation and the collection of all observations for a particular variable is called a DATA SET.

CLASSIFICATION AND THE CHOICE OF STATISTICAL METHOD-Some of the descriptive and inferential procedures that we will study are valid for only certain types of data; that is one reason why it is important to be able to correctly classify data.

GROUPING DATA-which involves, as the tem implies, putting data into groups rather than treating each observation individually. Grouping is one of the most common methods for organizing data. By grouping the data into categories, or classes, we can make it much simpler to comprehend. The first step is to decide on the classes. The symbol \ \ as a shorthand for "up to, but not including". Some of the common sense can be used as guidelines for grouping. 1. The number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data. 2. Each observation must belong to one, and only one, class. 3. Whenever feasible, all classes should have the same width.

FREQUENCY AND RELATIVE-FREQUENCY DISTRIBUTIONS- the number of observations that fall into a particular class is called the FREQUENCY or count of that class. A table listing all classes and their frequencies is called a FREQUENCY DISTRIBUTION In addition to the frequency of a class, we are often interested in the PERCENT of a class. We find the percentage by dividing the frequency of the class by the total number of observations and multiplying the result by 100. The percentage of a class, expressed as a decimal, is usually referred to as the RELATIVE FREQUENCY of the class. A table listing all classes and their relative frequencies is called a RELATIVE FREQUENCY DISTRIBUTION. The relative frequencies sum to 1 (100%). When comparing two data sets, relative-frequency distributions are better than frequency distributions. This is because relative frequencies are always between 0 and 1 and provide a standard for comparison. Two data sets having identical frequency distributions will, have identical relative-frequency distributions. But two data set having identical relative-frequency distributions will have identical frequency distributions only if both data sets have the same number of observations. GROUPING TERMINOLOGY-

CLASSES Categories for grouping data FREQUENCY: The number of observations that fall in a class.

FREQUENCY DISTRIBUTION A listing of all classes along with their frequencies.

RELATIVE FREQUENCY: the ratio of the frequency of a class to the total number of observations.

RELATIVE-FREQUENCY DISTRIBUTION: A listing of all classes along with their relative frequencies.

LOWER CUTPOINT: The smallest value that can go in a class.

UPPER CUTPOINT: The smallest value that can go in the next higher class. The upper cutpoint of a class is the same as the lower cutpoint of the next higher class.

MIDPOINT: the middle of a class, obtained by taking the average of its lower and upper cutpoints.

WIDTH: The difference between the upper and lower cutpoints of a class.

A table giving the classes, frequencies, relative frequencies, and midpoints of a data set is called a GROUPED-DATA TABLE. Each relative frequency is rounded to three decimal places; and when those rounded relative frequencies are added, the resulting sum differs from 1 by a little. This phenomenon is usually referred to as ROUNDING ERROR or ROUNDOFF ERROR.

AN ALTERNATE METHOD FOR DEPICTING CLASSES-SINGLE-VALUE GROUPING- it is more appropriate to use classes that each represent a single possible value. This is particularly true of discrete data in which there are only relatively few distinct observations. FREQUENCY AND RELATIVE-FREQUENCY DISTRIBUTIONS FOR QUALITATIVE DATA-The concepts of cutpoint and midpoints apply to quantitative data. We can compute frequencies and relative frequencies for qualitative data.

GRAPHS AND CHARTS- another method for organizing and summarizing data is to draw a picture of some kind. A graph or chart of a data set often provides the simplest and most efficient display.

HISTOGRAMS-a way to display grouped data pictorially with the classes depicted on the horizontal axis and the frequencies depicted on the vertical axis.

A FREQUENCY HISTOGRAM displays the frequencies of the classes.

A RELATIVE-FREQUENCY HISTORGRAM- is similar to a frequency histogram. The difference is that the height of each bar in a relative-frequency histogram is equal to the relative frequency of the class instead of the frequency of the class. For purposes of visually comparing the distributions of two data sets, it is better to use relative-frequency histograms than frequency histograms. The same vertical scale is used for all relative-frequency histograms—a minimum of 0 and a maximum of 1. The vertical scale of a frequency histogram depends on the number of observations.

HISTOGRAMS FOR SINGLE-VALUE GROUPING-We place the middle of each histogram bar directly over the single value represented by the class. The // on the horizontal axes. This symbol indicates that the zero point on that axis is not in its usual position at the intersection of the horizontal and vertical axes.

STEM AND LEAF DIAGRAMS- is often easier to construct than either a frequency distribution or a histogram and generally display more information. The leading digits are called stems and the final digits leaves. The entire diagram is called a stem-and-leaf diagram.

A SHADED STEM-AND-LEAF DIAGRAM: the numbers in a shaded stem-and-leaf diagram are still visible under the shading, a shaded stem-and-leaf diagram exhibits the raw (ungrouped) data in addition to providing a graphical display of a frequency distribution.

ORDERED STEM-AND-LEAF DIAGRAM-The leaves in each row are ordered from smallest to largest. This makes it easier to comprehend the data and also facilitates the computation of descriptive measures such as the median.

DISTRIBUTION SHAPES, SYMMETRY AND SKEWNESS-the distribution of a data set is a table, graph, or formula that tells us the values of the observations and how often they occur. An important aspect of the distribution of quantitative data set is its shape. DISTRIBUTION SHAPES-bell-shaped, triangular, uniform, reverse j-shaped, j-japed, right skewed, and left skewed, bimodal, and multimodal.

MODULARITY observing the number of peaks. Unimodal if is has one peak. Two peaks bimodal, and three or more, multimodal. More generally the distribution of heights is unimodal. A distribution is bimodal or multimodal only if the peaks are the same height.

SYMMETRY AND SKEWNESS A distribution have the property that it can be divided into two pieces that are mirror images of one another in called symmetric. Bell-shaped, triangular and uniform distributions are symmetric. A right skewed distribution rises to its peak rapidly and comes back toward the horizontal axis more slowly—it’s "right tail" is longer than its "left tail". A left-skewed distribution rises to its peak slowly and come back toward the horizontal axis more rapidly—its "left tail" is longer than its "right tail". J-shaped and reverse J-shaped distributions are special types of left-skewed, and right skewed distributions.

POPULATION AND SAMPLE DISTRIBUTIONS The data set obtained by observing the values of a variable for an entire population is called POPULATION DATA or CENSUS DATA; a data set obtained by observing the values of a variable for a sample of the population is called SAMPLE DATA. The distribution of population data is called the POPULATION DISTRIBUTION or the DISTRIBUTION OF THE VARIABLE. The distribution of sample data is called a SAMPLE DISTRIBUTION. The distribution of a random sample from a population approximates the population distribution. If a random sample is taken from a population, then the distribution of the observed values of the variable under consideration will approximate the distribution of the variable. The larger the sample, the better the approximation tends to be.