Statistics is the science of average & estimates. It is the study of collection, analysis, interpretation & Organization of data for specific purpose. We take decision with that data.
Data: Everyday, we come across a wide variety of information in the form of facts, numerical figure, table groups etc. Eg: Information related to profit/ loss of a company, average cricket score for a country, Government expenditure in various sector in a given year, election results, Disease vitality information, Movie collection etc.
This information is generally provided by TV, magazine, radio, internet etc. These facts or figure which is numerical or otherwise, collected with a definite purpose is called data. This word is derived by Latin word Datum
Usage of statistics:
- Demography data
- Sales trends
- Un-employement ratio
- Data for business decision
- Experimental data
It is a measure that tells us where the middle of a bunch of data lies.
3 most common measures of central tendency are the mean, the median, and the mode.
Mean: Let’s find the average scores.
- Average Score of Mary is (10+8+9+8+7)/5 =8.4
- Average Score of Hari is (4+7+10+10+10)/5 = 8.2
Based on the Mean data, Mary has performed better.
The mean (or average) of a number of observations is the sum of the values of all the observations divided by the total number of observations. It is denoted by ‘x bar’.
All data is given equal importance in Mean, the extreme values are not ignored.
Median: Let’s arrange their score in ascending order in blue table. Now we will take the middle most score. Since we are talking about central tendency of data, middle most score should also reflect central tendency of data. Middle most score for May is 8, while that of Hari is 10.
So based on middle score or median data, Hari has performed better.
When the number of observations (n) is odd, the median is the value of the ((n+1)/2) Th observation.
When the number of observations (n) is even, the median is the mean of (n/2) & (n/2 + 1) Th observations.
Median Ignores extreme values & Tells the point from where 50% data is lesser and 50% data is more
Mode: Let’s check the most frequent score. Based on the most often score, 8 is the most often score( 2 times) for Mary & 10 is the most often score ( 3 times) for Hari. Going with the most often score or Mode, Hari has performed better.
The mode is that value of the observation which occurs most frequently, i.e., an observation with the maximum frequency is called the mode. E.g. it can be used to find the which is the most favorite subject of a class.
Numerical: In a mathematics test given to 15 students, following marks are recorded: 41, 39, 48, 52, 46, 62, 54, 40, 96, 52, 98, 40, 42, 52, 60. Find the mean, median and mode of this data.
Mean is average data, Median is middle most data & Mode is the most frequent data.
Mean = Sum of all observations / number of observations
= (41+ 39+ 48+ 52+ 46+ 62+ 54+ 40+ 96+ 52+ 98+ 40+ 42+ 52+ 60 ) / 15
To find middle most data, we have to arrange data in ascending order & then find middle most.
Data in ascending order: 39, 40, 40, 41, 42, 46, 48, 52, 52, 52, 54, 60, 62, 96, 98.
Middlemost data in this case is 52 & thus Median is 52.
Also we see that most frequent data is 52 & thus mode is 52.
Application of Mean, Median & Mode
- Mean, median & mode shows different perspective of same data.
- Mean gives average of the data. All data is given equal importance.
- It is used in case where all data is important. E.g. Average salary of employees in an organization.
- Median is used to find middle most data. It is used to determine a point from where 50% of data is more & 50% data is less. It is used where extreme cases can be ignored.
- E.g. To find the performance of a cricketer where his worst & best extreme performance can be ignored to give his consistent performance.
- Mode is used where we need to find the most frequent data. E.g. if we need to find the most favorite Subject of students in a given class, mode can be used.
Mean of Grouped Data
In most of our real life situations, data is usually so large that to make a meaningful study it needs to be condensed as grouped data.
So, we need to convert given ungrouped data into grouped data and devise some method to find its mean.
Let’s assume that we have score of 100 students, we can represent them in ungrouped data. Presenting data in this form simplifies and condenses data and enables us to observe certain important features at a glance. This is called a grouped frequency distribution table.
It is assumed that the frequency of each class interval is centered around its mid-point .
Class mark = (Upper class limit + Lower class limit)/ 2
3 ways to find mean of grouped Data
- Direct Mean Method
- Assumed Mean Method
- Step Deviation mean Method
The result obtained by all the three methods is the same.
Direct Mean Method: If x1, x2,. . ., xn are observations with respective frequencies f1, f2, . . ., fn, then this means observation x1 occurs f1 times, x2 occurs f2 times, and so on.
Sum of the values of all the observations = f1x1 + f2x2 + . . . + fnxn,
Number of observations = f1 + f2 + . . . + fn.
Assumed Mean Method: The first step is to choose one among the xi’s as the assumed mean, and denote it by ‘a’. The next step is to find the difference di between a and each of the xi’s, that is, the deviation of ‘a’ from each of the xi’s.
The third step is to find the product of di with the corresponding fi, and take the sum of all the fi d’s.
Step Deviation mean Method: The first step is to choose one among the xi’s as the assumed mean, and denote it by ‘a’.
Second step is to find ui = (xi – a)/h , where a is the assumed mean and h is the class size. Third step is to find fi*ui for all i’s & then use the formula.
The result obtained by all the three methods is the same.
So the choice of method to be used depends on the numerical values of xi and fi.
- If xi and fi are sufficiently small, then the direct method is an appropriate choice.
- If xi and fi are numerically large numbers, then we can go for the assumed mean method or step-deviation method.
- If the class sizes are unequal, and xi are large numerically, we can still apply the step-deviation method by taking h to be a suitable divisor of all the di’s.
Mode of Grouped Data
Mode is that value among the observations which occurs most often, that is, the value of the observation having the maximum frequency.
In a grouped frequency distribution, it is not possible to determine the mode by looking at the frequencies. Here, we can only locate a class with the maximum frequency, called the modal class.
E.g. Looking at data below, we can say that maximum occurrence occur at class 60-80, frequency 61. But we can’t tell the most frequent data (mode).
- where l = lower limit of the modal class,
- h = size of the class interval (assuming all class sizes to be equal),
- f1 = frequency of the modal class,
- f0 = frequency of the class preceding the modal class,
- f2 = frequency of the class succeeding the modal class.
For the example above, modal class is 60-80 with frequency 61
Therefore, l=60, h = 80-60 = 20, f1 =61, f0 = 52, f2 = 38
Mode = 60 + (61-52)/(2*61-52-38) *20
Or Mode = 65.625
Median of Grouped Data
Median is a measure of central tendency which gives the value of the middle-most observation in the data. In case of ungrouped data, we first arrange the data values of the observations in ascending order.
Then, if n is odd, the median is the (n+1)/2 th observation. But in case of Grouped data, it is difficult to find (n+1)/2 th observation. We use formula to find Median.
We first find cumulative frequency & then locate the class whose cumulative frequency is greater than (and nearest to) n/2 , where n is total observations. This is called the median class.
where l = lower limit of median class,
- n = number of observations,
- cf = cumulative frequency of class preceding the median class,
- f = frequency of median class,
- h = class size (assuming class size to be equal).