Mean, Median, Mode, Variance – Discovering Statistical Properties of Data

Mean, Median, Mode, Variance – Discovering Statistical Properties of Data

In statistics, there are three different ways to measure central tendency: the mean, the median, and the mode. When explaining a set of data, we always start by locating the point that is most fundamental to the whole thing. The term for this type of analysis is “measure of central tendency.” Data is something that we encounter on a daily basis.

A set of data can be described using a measure of central tendency, which does this by assigning a single value to the place in the data set that is considered to be the set’s “centre.” One way to think about it is as a propensity for the data to converge towards the average value.

Mean:

Calculating the mean of a set of data involves first adding up all of the values in the set and then dividing that sum by the total number of those values. The mean is also referred to as the average. For example if the marks of student in five subjects are {70, 75, 80, 78, 93} then the average marks would be:

mean = (70 + 75 + 80 + 78 + 93 ) = 396 / 5 = 79.20

For vector X = {x1, x2, …, xn}, the mean is compute as follow:

Computing mean

Median:

The median is the middle element of the data when data are sorted in ascending or descending order. If a number of elements are odd, then the middle element is the median of data. If the numbers of elements are even then the average of two middle elements is considered as the median.

If the weight of 5 students is {40, 32, 34, 38, 41} then the median would be the 3rd element of sorted data.

The sorted vector of weight would be: {32, 34, 38, 40, 41}, so the median would be 38.

If the weight of 6 students is {40, 32, 36, 34, 38, 41}

The sorted vector of weight would be: {32, 34, 36, 38, 40, 41}, so the median would be (36 + 38) / 2 = 37.

Mode:

The mode of a data set is the value that occurs most frequently throughout the entire collection. The mode of a set of data refers to the value that occurs most frequently among the available observations, also known as the observation with the highest frequency.

For the set of weight {40, 32, 35, 34, 23, 40, 34, 40, 35}, the median is 40 as it occurs the maximum number of times in the dataset.

The dataset can have multiple modes as well. For example set {40, 32, 32, 34, 23, 40, 32, 40, 35} has two modes, 32 and 40.

Variance:

In the field of statistics, the two most essential metrics are the variance and the standard deviation. The variance is a measure of how data points deviate from the mean, whereas the standard deviation is a measure of the distribution of statistical data.

Both measures can be used to analyse data. The units that are being used make up the primary distinction between variance and standard deviation. The units used to indicate the standard deviation are the same as those used to represent the mean of the data, whereas the units used to represent the variance are squared.

The variance is a measurement that indicates the degree to which a set of data is spread out in relation to its mean or average value. It is represented by the symbol ‘σ2‘.

Population variance is computed as follows:

variance of population

To compute the variance of the sample, divide the summation by (n – 1) instead of n in the above formula.

Example:

Consider the vector X = {3, 4, 5, 2, 1}

The mean of X would be: (3 + 4 + 5 + 2 + 1) = 15 / 5 = 3

Variance = 1/5 * ((3 – 3)2 + (3 – 4)2 + (3 – 5)2 + (3 – 2)2 + (3 – 1)2 )

= 1/5 * (0 + 1 + 4 + 1 + 4)

= 1/5 * 10

= 2

Properties of variance:

Because every term in the variance sum is squared, whenever it is investigated in probability and statistics, it is invariably non-negative. This is because the outcome is either positive or zero, depending on which term in the variance sum is squared.

The squared units are always included in the variance. For illustration’s sake, the standard deviation of a group of weights measured in kilogrammes will be expressed as kg squared. We are unable to make a direct comparison between the population variance and either the mean or the data itself due to the fact that the population variance is squared.

Standard Deviation:

The standard deviation is a measure that is used to quantify the dispersion of statistical data. The term “distribution” refers to the measurement of the data’s divergence from its mean or average location.

The method of estimating the deviation of data points is used to compute the degree of dispersion that exists in the data. In the summary statistics, you can find information regarding dispersion. The symbol for standard deviation is ‘σ’.

Population standard deviation is computed as follows:

Standard deviation of population

To compute the standard deviation of the sample, divide the summation by (n – 1) instead of n in the above formula

Example:

Consider the vector X = {3, 4, 5, 2, 1}

The mean of X would be: (3 + 4 + 5 + 2 + 1) = 15 / 5 = 3

Variance = 1/5 * ((3 – 3)2 + (3 – 4)2 + (3 – 5)2 + (3 – 2)2 + (3 – 1)2 )

= 1/5 * (0 + 1 + 4 + 1 + 4)

= 1/5 * 10

= 2

Standard Deviation = Sqrt (Variance) = Sqrt(2) = 1.41

Properties of Standard Deviation:

It is sometimes referred to as the root-mean-square deviation and describes the square root of the mean of the squares of all of the values that are contained in a data set.

Since the standard deviation cannot take on a negative value, the minimum possible value for it is 0.

If the data values of a group are comparable to one another, then the standard deviation will be extremely low or extremely close to zero. However, if the data values differ from one another, then the standard deviation is either quite high or quite a ways apart from zero.

Leave a Reply

Your email address will not be published. Required fields are marked *