Statistical Measures of Location: Mathematical Formula Versus Geometric Approach

Graphical method and mathematical formula are the two approaches for estimating measures of location. Understanding of many instructors of introductory statistics classes are: mean cannot be graphically determined and numerical (formula) approach is more precise than geometrical technique. Contrary to their understanding, this study estimate mean of a dataset geometrically (from histogram) by determining the centroid of histogram drawn from such data set. In addition, we also make known that mathematical formulas for mean, median and mode were derived geometrically (either from ogive or histogram). Finally, the research illustrated the two techniques with a survey data and established that the two approaches produce same results.


INTRODUCTION
Data with large observations, depending on the nature and depth of the inquiry, are often generated in all areas of human endeavor such as business, sports, academic institutions, research institutions, and internet services. Whatever be their size (large, medium, or small), it is impossible to grasp or retrieve information by mere looking at all the observations. It is advisable to get a summary of the dataset, if possible with a single number, provided that this single number is a good representative one for all the observations. Representative in the sense that the single number summarizes with relatively high precision, the characteristics of interest in the entire observations. That is, the single number mirroring entire characteristics of the whole observations. Such representative number could be a central value for all the observations. This central value is called a measure of central tendency, also known as a measure of location. The value could be the mean (arithmetic mean, geometric mean, harmonic mean, weighted mean, etc.,), the median or the mode of distribution. In a nutshell, measures of central tendency is the study of dataset cluster around the central value popularly called average. [1] There are two basic methods of computing any of these measures of location [2][3][4] : Graphical and mathematical formula. Opinions of many instructors of introductory statistics courses are: (1) Mathematical formula approach is more precise and exact than geometrical approach in deducing measures of location [5] and (2) mean as a measure of location cannot be graphically determined [6] and the likes. Contrary to these opinions, this study showed that the mathematical formulas of all the considered measures of location (mean, median, and mode) were actually derived geometrically from histogram and ogive. In addition, this study used the work of Beri (2012) to show that mean of a distribution/dataset lies at the centroid of the histogram drawn from such a distribution. These are the points that this study is set to make and clarify to enhance peoples understanding of the two approaches. The graphical methods have been used in undergraduate level introductory statistics classes at Augustine University, Ilara-Epe. Feedback from students concerning this approach has been positive, and the students often appreciate that mathematical formulas and concepts were translated and illustrated in a more visual form. Furthermore, graphical methods (histogram, ogive, frequency polygon, etc.) are sometimes better suited than numerical formulas because they contain detailed information about the pattern or shape in the data. Although, our interest is not to prioritize one method over the other one but to elucidate on both. In fact, numerical and graphical approaches complement each other; it is wise to use both. Finally, the crux of this research is that abstract mathematical formula for measures of location is graphically made known and applicable to all students and users of statistics regardless of their mathematical background.

Definition of measures of location
The mean of the set {x 1 ,x 2 ,…,x n } of numbers is the quantity which is the centroid of the composite figure X. [7] Median: Suppose that the numbers in the set {x 1 ,x 2 ,…,x n }are arranged so that x 1 ≤x 2 ≤...≤x n-1 ≤x n . The median of the set is the number In other words, the median of a set of n numbers is the number that is in the middle of the arrangement x 1 ≤x 2 ≤...≤x n-1 ≤x n , if there is a single in the middle. Otherwise, it is the average of the two numbers that are in the middle of the arrangement. Geometrically, median is the abscissa of the point ordinate which divides the histogram into two equal parts. That is, the point at which the perpendicular line that divides the total area of histogram into two equal halves meets with the X-axis (upper class boundary) gives the median. Mode: A number x is called a mode of the set if x occurs at least as frequently as any other numbers in the set. That is, the mode of a set of data is the value that occurs most frequently among the values of the variable. If a histogram has been drawn for a grouped data, the mode of the distribution exists in the tallest bar of the histogram. Figure 1 illustrates a portion of a histogram with MNLU be the tallest bar (modal class) of the histogram. By joining MQ and NPas shown in the diagram, the abscissaˆm x which corresponds to the perpendicular drawn from the point of intersection S is the mode of the distribution.  where, c i = is the centre or midpoint of ith interval f i = number of times x i occurs. Proof: Figure 2 shows a typical histogram with further construction that elicits procedure for the proof.

Graphical computation of mean from histogram
Putting equations (4) and (3) in that order into equation (5), we have In a frequency distribution with equal class interval, that is k i = k∀i=1,2,…,n, equation (6) yields

Graphical computation of median from ogive and histogram
If the grouped data are given as a cumulative frequency distribution, the median is the abscissa of the point on the ogive, the ordinate of which equals half the total frequency. [8] This can be achieved by any of these two methods: a. First method: Draw only less than cumulative frequency curve and determine the position of the median value by the formula: N th 2 . Locate this value on the cumulative frequency axis (i.e., Y-axis) and from it draw a perpendicular (straight line) to meet the cumulative frequency curve. From this point, draw another perpendicular on the X-axis and the point where it meets the X-axis is the median. b. Second method: Draw and superimpose "less than" and "more than" cumulative frequency curves. From the point of intersection of the two curves, draw a perpendicular to the X-axis. The point where this perpendicular touches the X-axis, gives the required value of median. Theorem 2.2 given a grouped frequency distribution table containing class boundaries and their frequencies as shown in Table 1, then the formula for computing median M of a grouped frequency distribution with interval is Proof. Let the cumulative frequency of ith class be denoted as F i , therefore F 1 =f 1 , F 2 =f 1 +f 2 , F k = f 1 +f 2 +f 3 +...+f k-1 +f k and F n =N. Suppose that   consequently, x k-1 <M<x k . Figure 3 shows a typical ogive with additional construction to depict the required procedure for the proof. The increment in cumulative frequency between It is significant to note the following:  Table 1, then the mode is given by   x U L ∆ + ∆ = ∆ + ∆ Upper class boundary U can be expressed as addition of lower class boundary (L) and common class interval (c). That is, Substituting (13) in (12) and making ˆm x subject of the formula. The procedure follows thus; It is now very crystal clear that formula for mean, median, and mode as a statistical measure of location are by-product of geometrical (graphical) approach. Hence, both methods are expected to be equivalent and should, therefore, yield the same result. Any difference in the results is due to the precision of computing device in the formula method or the precision in reading from the graph (histogram or ogive).

RESULTS AND DISCUSSION
Illustration: A sample of 100 individuals are randomly selected in Ilara-Epe for participation in a study of cardiovascular risk factors. The following data represent the ages of enrolled individuals, measured in years.

Computation of mean from histogram
With reference to Figure 4, arbitrary axisYY ' is chosen at score 19. Thus, the position of the mean with reference to the score scale is x-value corresponding to the C (centroid) distance from the arbitrary axis YY'. Therefore, the mean age is 19.5 + 20.7 = 40.2   .92857

CONCLUSION
This paper established that mean as a measure of location can be graphically determined, the formula for measures of location (mean, median, and mode) was derived from graphs. Therefore, if all the necessary precautions for drawing graph were put into consideration, both the formula and graphical methods produce the same result. Hence, any observed difference or discrepancy between results from the two methods is either due to human lack of proper pattern recognition in reading from the graph (human error) and/ or instrumental error (inappropriate handling of formula).