Central tendency refers to the statistical concept that describes the center or typical value of a dataset. It provides a single value that summarizes the entire distribution of data, representing a point around which most of the data points tend to cluster. The goal is to find the "center" of the data to understand the overall trend or typical value.
There are three main measures of central tendency:
Mean: The average of all values in the dataset.
Median: The middle value when the data is arranged in order.
Mode: The value that appears most frequently in the dataset.
The mean is calculated by adding all the values in the dataset and dividing by the number of values. It is the most commonly used measure of central tendency.
Example:
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Mean = (3 + 7 + 8 + 5 + 12 + 19 + 15 + 10 + 4 + 6) / 10 = 89 / 10 = 8.9
The median is the middle value when the data is ordered. If there is an even number of values, the median is the average of the two middle numbers.
Example:
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Ordered Dataset: [3, 4, 5, 6, 7, 8, 10, 12, 15, 19]
Median = (7 + 8) / 2 = 7.5
The mode is the value that appears most frequently in a dataset. There can be more than one mode if multiple values have the same frequency.
Example:
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Mode = No mode (all values appear only once)
Measures of dispersion describe the spread or variability of a dataset. They give us an understanding of how spread out or clustered the values are in relation to the central tendency.
The range is the difference between the highest and lowest values in the dataset. It provides a simple way to measure the spread but does not account for how the data is distributed within that range.
Range = Max(X) - Min(X)
Example:
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Range = 19 - 3 = 16
Variance measures the average squared deviation from the mean. It gives an idea of how much the data points vary from the mean.
Population Variance Example:
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Mean = 8.9
Variance = ((3-8.9)^2 + (7-8.9)^2 + ... + (6-8.9)^2) / 10 = 43.89
The standard deviation is the square root of the variance. It provides a more interpretable measure of spread, as it is in the same units as the original data.
Population Standard Deviation Example:
Variance = 43.89
Standard Deviation = √43.89 ≈ 6.62
Percentiles and quartiles help to understand the distribution of data by dividing the dataset into specific parts based on the values of the data.
Percentiles are values that divide a dataset into 100 equal parts. Each percentile represents the percentage of data below it. For example, the 50th percentile is the value below which 50% of the data falls.
Example: We will find the 25th, 50th (median), and 75th percentiles for the dataset [3, 7, 8, 5, 12, 19, 15, 10, 4, 6].
Dataset: [3, 7, 8, 5, 12, 19, 15, 10, 4, 6]
Ordered Dataset: [3, 4, 5, 6, 7, 8, 10, 12, 15, 19]
- 25th Percentile (P25): Position = (25 / 100) * (10 + 1) = 2.75, so P25 is between the 2nd and 3rd values: P25 ≈ 4.5
- 50th Percentile (P50 or Median): Position = (50 / 100) * (10 + 1) = 5.5, so P50 is between the 5th and 6th values: P50 = (7 + 8) / 2 = 7.5
- 75th Percentile (P75): Position = (75 / 100) * (10 + 1) = 8.25, so P75 is between the 8th and 9th values: P75 ≈ 11.5
Quartiles divide the dataset into four equal parts. The three quartiles are:
For the dataset [3, 7, 8, 5, 12, 19, 15, 10, 4, 6], the quartiles are:
Ordered Dataset: [3, 4, 5, 6, 7, 8, 10, 12, 15, 19]
Q1 (25th Percentile): P25 ≈ 4.5
Q2 (50th Percentile / Median): P50 = 7.5
Q3 (75th Percentile): P75 ≈ 11.5