Fundamentals of Statistics for Beginners
Understanding the Fundamentals of Statistics
Statistics
- Data Collection: Gathering raw data.Example: Asking students about their exams marks.
- Data Analysis: Applying formula or statistical methods to data.Example: Finding the average marks of students.
- Data Presentation: Displaying processed data.Example: Making bar-chart of average marks of male and female.
- Data Interpretation: Explaining result.Example:Concluding there is no difference between the marks of both genders.
Branches of Statistics
- Descriptive Statistics
- Inferential Statistics
Descriptive Statistics
data.
Examples:
- Average Marks of Students: Calculating the mean (Average) marks of a group of students to summarize their performance.
- Chart of Top 10 Students: Using a bar chart to display the marks of the top 10 students, providing a clear visualization of the highest achievers.
Inferential Statistics
- Estimation of Population: Suppose there are 100 students in a class, but we only have the marks of 30 of them. By analyzing the marks of these 30 students (the sample), we can estimate the average marks for the entire class (the population).
- Hypothesis: Suppose we want to test the hypothesis (statement) that the average marks of boys and girls are equal. Statistical methods are then used to test this hypothesis and determine whether there is enough evidence to reject the this hypothesis (statement).
Data
Datum
Observation
Primary Data
Secondary Data
Sample
Parameter
Statistic
Census
Survey
Population
Variable
Types of Variables
Qualitative (Categorical) Variable
Nominal
Categories that don’t have any specific order or ranking.Example: Color of eyes (Black, Blue & Brown)
Ordinal
Categories that have a specific order or ranking.Example: Position Holders (1st, 2nd, 3rd)
Quantitative (Numerical) Variable
Variable that takes numerical values.Discrete
Variable that takes the values in whole numbers.Example: The number of students in University of Sindh
Continuous
Variable that don’t take values in whole numbers. These values are often measurements.Example: Height of People
Domain of Variable
The complete set of all possible values that the variable can have.Examples:
- If the variable is gender then the domain is {male, female}.
- If the variable is Percentage in University then the domain is {0.01%, 0.02%, 0.03%, …, 100%}.
Measurement Scales
Measurement scales are used to classify (categorize) and quantify (measure) data.Example: Gender can be categorized as male or female, while age can be quantified in years.
Types of Measurement Scales
Nominal Scale
Categorizes data without any order.Example: Categorize gender as male and female
Ordinal Scale
Categorizes data with a meaningful order.Example: Categorize position as 1st, 2nd and 3rd
Interval Scale
Measures data with equal interval but no true zero point.Example: Temperate in Celsius
Ratio Scale
measures data with equal intervals and true zero point exist.Example: Age in years
Presentation of Data
Classification
Process of organizing data into groups or classes based on there characteristics.Qualitative Classification
Classify data based on their qualities.Example: Data of students can be classified based on their gender (male, female).
Temporal Classification
Classify data based on time.Example: Profits data of companies can be classified based on the years (profit in 2022, 2023, 2024).
Geographical Classification
Classify data base on the geographical location.Example: Data of people can be classified based on their location (people who live in Dadu, Hyderabad, Mirpurkhas).
Tabulation
TOP CGPA STUDENTS IN BS STATISTICS (UOS), BATCHES 2K18 TO 2K20
Name |
Surname |
CGPA |
Batch |
|---|---|---|---|
Nimra Neha |
Qazi |
3.7 |
2k20 |
Soha |
Shaikh |
3.64 |
2k19 |
Kainat Haroon |
Rajput |
3.47 |
2k18 |
Ariba |
Rajput |
3.46 |
2k20 |
Afra Khalid |
Syed |
3.44 |
2k20 |
Frequency (f)
A number of times a particular value or category appears in dataset.Example:
if the marks of students are 40,40,45,50,50,50 then the frequency of 40 is 2, 45 is 1 and 50 is 3.
Frequency Distribution
A method of organizing a dataset by showing how often each value or group of values occurs.Example:
FREQUENCY DISTRIBUTION OF MARKS OF STUDENTS
Marks (xi) |
Frequency (fi) |
40 |
2 |
45 |
1 |
50 |
3 |
Σ |
6 |
Grouped Frequency Distribution
Groups data into intervals (or classes) and show the frequency of each class.Example:
FREQUENCY DISTRIBUTION OF 2ND SEMESTER’S MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH IN ECONOMICS SUBJECT
Marks |
f |
0 - 9 |
8 |
10 - 19 |
32 |
20 - 29 |
5 |
30 - 39 |
4 |
40 - 49 |
0 |
50 - 59 |
9 |
60 - 69 |
9 |
70 - 79 |
6 |
80 - 89 |
8 |
Σ |
81 |
Contraction of Grouped Frequency Distribution
Step 1
Make array (arrangement) of data in ascending or descending order.Array = 0 2 3 5 5 8 8 9 10 11 11 12 12 12 12 12 12 13 13 13 13 13 13 13 14 14 14 14 14 15 15 15 15 16 16 16 17 17 18 19 25 25 29 29 29 30 32 34 35 50 50 50 50 50 50 50 50 52 60 60 60 60 60 65 65 66 66 70 70 75 75 75 76 80 80 85 86 86 86 88 88Step 2
Find the range (R) by subtracting minimum value from maximum value.R = Max – MinR = 81 – 0 = 81R = 81Step 3
Decide the number of classes (k). Statistical experience tells us that no less than 5 and no more than 20 classes are generally used. Let’s decide to take 9 classes.K = 9Step 4
Find approximate the width or size of equal class interval (h) by dividing the Range with the number of classed that we have decided.But we take the next higher integer to make calculation easier.h = 10Step 5
Decide the lower class limit (L) and the upper class limit (U). Lower class limit must be equal or less than the minimum value in the dataset. Let’s decide to take 0 as a lower class limit. With this decision the upper class limit will be 9. The classes become 0-9, 10-19, ….Step 6
Make the frequency distribution table. We can use Entries column to count the values of each class but we usually don’t show in final frequency distribution. table.
Marks |
Entries |
f |
0-9 |
0, 2, 3, 5, 5, 8, 8, 9 |
8 |
10-19 |
10, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 18, 19 |
32 |
20-29 |
25, 25, 29, 29, 29 |
5 |
30-39 |
30, 32, 34, 35 |
4 |
40-49 |
|
0 |
50-59 |
50, 50, 50, 50, 50, 50, 50, 50, 52 |
9 |
60-69 |
60, 60, 60, 60, 60, 65, 65, 66, 66 |
9 |
70-79 |
70, 70, 75, 75, 75, 76, |
6 |
80-89 |
80, 80, 85, 86, 86, 86, 88, 88 |
8 |
Σ |
…. |
81 |
Class Boundaries & Midpoints
Class Boundaries
Class boundaries are useful when we are working with continuous data. To find the lower class boundary, subtract 0.5 from the lower class limit and to find the upper class boundary, add 0.5 to the upper class limit.Example: If class limits are 20-24, 25-29, …. then class boundaries become 19.5-24.5, 24.5–29.5, ….Midpoints or Class Marks
Midpoints or class marks are the middle values of class boundaries (or class limits) that helps to analyze grouped frequency distribution. In grouped data midpoints are denoted by “xi”. To find midpoints, average the class boundaries (or class limits).Example: If class boundaries are 19.5-24.5, 24.5-29.5, …. then midpoints become 22, 27, ….
Class Boundary & Midpoint Example
At University of Sindh, the 2nd semester’s examination test of students of BS Statistics (2k23 batch) had attended 8 subjects (100 marks each subject). Suppose one of them got 655 total marks out of 800, then the percentage become 81.875. Let’s make a grouped frequency distribution of all students.
- Array of all students marks = 20.125% 22.125% 29.375% 30.250% 31.625% 36.125% 37.625% 40.375% 42.625% 42.750% 43.250% 44.750% 47.500% 47.750% 47.875% 48.625% 49.125% 49.500% 50.625% 50.625% 52.000% 52.125% 52.250% 52.500% 53.000% 53.500% 53.875% 54.000% 54.000% 54.250% 54.250% 54.375% 54.625% 55.000% 55.000% 55.625% 55.875% 56.000% 56.625% 57.375% 57.500% 57.875% 58.000% 58.500% 59.000% 59.250% 59.625% 60.375% 61.375% 61.500% 62.625% 63.125% 63.250% 63.375% 64.000% 64.625% 64.750% 66.000% 66.000% 66.250% 66.875% 67.500% 67.500% 67.625% 67.750% 69.000% 70.000% 71.125% 71.250% 71.375% 71.875% 72.375% 72.750% 75.125% 75.125% 77.500% 78.875% 79.375% 80.750% 81.375% 81.875%
Marks |
Class Boundaries |
Entries |
20 - 24 |
19.5 - 24.5 |
20.125, 22.125 |
25 - 29 |
24.5 - 29.5 |
29.375 |
30 - 34 |
29.5 - 34.5 |
30.250, 31.625 |
35 - 39 |
34.5 - 39.5 |
36.125, 37.625 |
40 - 44 |
39.5 - 44.5 |
40.375, 42.625, 42.750, 43.250 |
45 - 49 |
44.5 - 49.5 |
44.750, 47.500, 47.750, 47.875, 48.625, 49.125 |
50 - 54 |
49.5 - 54.5 |
49.500, 50.625, 50.625, 52.000, 52.125, 52.250, 52.500, 53.000, 53.500, 53.875, 54.000, 54.000, 54.250, 54.250, 54.375 |
55 - 59 |
54.5 - 59.5 |
54.625, 55.000, 55.000, 55.625, 55.875, 56.000, 56.625, 57.375, 57.500, 57.875, 58.000, 58.500, 59.000, 59.250 |
60 - 64 |
59.5 - 64.5 |
59.625, 60.375, 61.375, 61.500, 62.625, 63.125, 63.250, 63.375, 64.000 |
65 - 69 |
64.5 - 69.5 |
64.625, 64.750, 66.000, 66.000, 66.250, 66.875, 67.500, 67.500, 67.625, 67.750, 69.000 |
70 - 74 |
69.5 - 74.5 |
70.000, 71.125, 71.250, 71.375, 71.875, 72.375, 72.750 |
75 - 79 |
74.5 - 79.5 |
75.125, 75.125, 77.500, 78.875, 79.375 |
80 - 84 |
79.5 - 84.5 |
80.750, 81.375, 81.875 |
Σ |
…. |
…. |
FREQUENCY DISTRIBUTION OF 2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Marks |
Class Boundaries |
Xi |
fi |
20 - 24 |
19.5 - 24.5 |
22 |
2 |
25 - 29 |
24.5 - 29.5 |
27 |
1 |
30 - 34 |
29.5 - 34.5 |
32 |
2 |
35 - 39 |
34.5 - 39.5 |
37 |
2 |
40 - 44 |
39.5 - 44.5 |
42 |
4 |
45 - 49 |
44.5 - 49.5 |
47 |
6 |
50 - 54 |
49.5 - 54.5 |
52 |
15 |
55 - 59 |
54.5 - 59.5 |
57 |
14 |
60 - 64 |
59.5 - 64.5 |
62 |
9 |
65 - 69 |
64.5 - 69.5 |
67 |
11 |
70 - 74 |
69.5 - 74.5 |
72 |
7 |
75 - 79 |
74.5 - 79.5 |
77 |
5 |
80 - 84 |
79.5 - 84.5 |
82 |
3 |
Σ |
…. |
…. |
81 |
Relative Frequency Distribution
RELATIVE FREQUENCY DISTRIBUTION OF 2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Marks |
Class Boundaries |
Xi |
fi |
Relative Frequency |
20 - 24 |
19.5 - 24.5 |
22 |
2 |
2/81 = 0.024691358 |
25 - 29 |
24.5 - 29.5 |
27 |
1 |
1/81 = 0.012345679 |
30 - 34 |
29.5 - 34.5 |
32 |
2 |
2/81 = 0.024691358 |
35 - 39 |
34.5 - 39.5 |
37 |
2 |
2/81 = 0.024691358 |
40 - 44 |
39.5 - 44.5 |
42 |
4 |
4/81 = 0.049382716 |
45 - 49 |
44.5 - 49.5 |
47 |
6 |
6/81 = 0.074074074 |
50 - 54 |
49.5 - 54.5 |
52 |
15 |
15/81 = 0.185185185 |
55 - 59 |
54.5 - 59.5 |
57 |
14 |
14/81 = 0.172839506 |
60 - 64 |
59.5 - 64.5 |
62 |
9 |
9/81 = 0.111111111 |
65 - 69 |
64.5 - 69.5 |
67 |
11 |
11/81 = 0.135802469 |
70 - 74 |
69.5 - 74.5 |
72 |
7 |
7/81 = 0.086419753 |
75 - 79 |
74.5 - 79.5 |
77 |
5 |
5/81 = 0.061728395 |
80 - 84 |
79.5 - 84.5 |
82 |
3 |
3/81 = 0.037037037 |
Σ |
…. |
…. |
81 |
0.999999999 |
Cumulative Frequency Distribution
CUMULATIVE FREQUENCY DISTRIBUTION OF 2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Marks |
Class Boundaries |
Xi |
fi |
Cumulative Frequency |
20 - 24 |
19.5 - 24.5 |
22 |
2 |
2 |
25 - 29 |
24.5 - 29.5 |
27 |
1 |
(2 + 1) = 3 |
30 - 34 |
29.5 - 34.5 |
32 |
2 |
(3 + 2) = 5 |
35 - 39 |
34.5 - 39.5 |
37 |
2 |
(5 + 2) = 7 |
40 - 44 |
39.5 - 44.5 |
42 |
4 |
(7 + 4) = 11 |
45 - 49 |
44.5 - 49.5 |
47 |
6 |
(11 + 6) = 17 |
50 - 54 |
49.5 - 54.5 |
52 |
15 |
(17 +15) = 32 |
55 - 59 |
54.5 - 59.5 |
57 |
14 |
(32 + 14) = 46 |
60 - 64 |
59.5 - 64.5 |
62 |
9 |
(46 + 9) = 55 |
65 - 69 |
64.5 - 69.5 |
67 |
11 |
(55 + 11) = 66 |
70 - 74 |
69.5 - 74.5 |
72 |
7 |
(66 + 7) = 73 |
75 - 79 |
74.5 - 79.5 |
77 |
5 |
(73 + 5) = 78 |
80 - 84 |
79.5 - 84.5 |
82 |
3 |
(78 + 3) = 81 |
Σ |
…. |
…. |
81 |
…. |
Stem and Leaf Display
Technique where each data point split into “Stem” and “Leaf”. Stem represent the first part of digit (or digits) and leaf represent the last part of digit (or digits). This technique organize data easily without losing any data point.
Example:
At University of Sindh, the 2nd semester’s marks of students of BS Statistics (2k23 batch) in Economics subject are 60, 29, 13, 85, 29, 75, 15, 13, 80, 25, 17, 10, 66, 3, 50, 11, 18, 70, 15, 70, 12, 75, 60, 12, 50, 88, 50, 86, 8, 14, 15, 16, 32, 29, 50, 65, 34, 60, 50, 13, 14, 2, 14, 52, 5, 12, 17, 65, 12, 30, 9, 50, 13, 13, 13, 75, 86, 88, 25, 12, 5, 15, 76, 66, 86, 12, 16, 0, 14, 60, 11, 13, 14, 8, 50, 80, 35, 60, 50, 19, 16
In the dataset, 1st data point is 60, 6 is stem because it is 1st digit and 0 is leaf because it is the last digit. In the same way 2nd data point is 29, 2 is stem and 9 is leaf etc.
FREQUENCY DISTRIBUTION OF STUDENT MARKS IN ECONOMICS (2ND SEMESTER, BS STATISTICS, UNIVERSITY OF SINDH)
Stem |
Leaf |
fi |
0 |
0, 2, 3, 4, 5, 5, 8, 8, 9 |
9 |
1 |
0, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, |
32 |
2 |
5, 5, 9, 9, 9 |
5 |
3 |
0, 2, 4, 5 |
4 |
4 |
|
0 |
5 |
0, 0, 0, 0, 0, 0, 0, 0, 2 |
9 |
6 |
0, 0, 0, 0, 0, 5, 5, 6, 6 |
9 |
7 |
0, 0, 5, 5, 5, 6 |
6 |
8 |
0, 0, 5, 6, 6, 8, 8 |
7 |
Σ |
…. |
81 |
In above stem and leaf display, you can clearly see that we easily make it and it is easier to understand, we can easily count the frequencies and we haven’t lost any data.
Graphical Representation
Display data using visual elements like bars, lines or shapes etc. Graphical representation can be divided into two part, diagrams and graphs.
Diagrams
Symbols or shapes are used to represent data, suitable for qualitative and discrete data.
Examples of Diagrams:
Simple Bar Chart
Multi Bar Chart
Component Bar Chart
Pie Diagram
Graphs
Dots, Lines or curves are used to represent data. Useful for showing relationships and trends in discrete and continuous data.
Examples of Graphs:
Historigram
Histogram
Frequency Polygon
Frequency Curve
Simple Bar Chart
Type of graphical representation of data used to compare the categorical variable to quantitative variable. The x-axis (horizontal axis) shows the data of categorical variable, while the y-axis (vertical axis) shows the data of quantitative variable.
Example:
TOP 10 CANDIDATES OF BS STATISTICS (2K23 BATCH, 2ND SEMESTER, UNIVERSITY OF SINDH MAIN CAMPUS)
NAMES |
PERCENTAGE |
HAFSA SHAIKH |
81.88% |
GHULAM MURTAZA |
81.38% |
PARAS QURESHI |
80.75% |
SAVERA GORAR |
79.38% |
BUSHRA BAJWA |
78.88% |
RAHOL MEGHWAR |
77.50% |
ASIF DHAUNROO |
75.13% |
AISHA BHATTI |
75.13% |
AANCHAL SIYAL |
72.75% |
DILAWAR HUSSAIN |
72.38% |
Multiple Bar Chart
Type of graphical representation of data used to compare multiple variables or categories.
Example:
ECO AND STAT MARKS OF TOP 10 CANDIDATES OF BS STATISTICS (2K23 BATCH, 2ND SEMESTER, UNIVERSITY OF SINDH MAIN CAMPUS)
NAMES |
ECONOMICS |
MATHEMATICS |
HAFSA SHAIKH |
86 |
94 |
GHULAM MURTAZA |
88 |
86 |
PARAS QURESHI |
86 |
89 |
SAVERA GORAR |
86 |
82 |
BUSHRA BAJWA |
75 |
85 |
RAHOL MEGHWAR |
88 |
87 |
ASIF DHAUNROO |
70 |
85 |
AISHA BHATTI |
80 |
85 |
AANCHAL SIYAL |
60 |
85 |
DILAWAR HUSSAIN |
60 |
75 |
Component Bar Chart
Each bar is divided into segments, proportional in size to the component parts of a total being displayed by each bar.
Example:
THE NUMBER OF STUDENTS STUDYING IN SINDH UNIVERSITY MAIN CAMPUS
BATCHES |
BS ENGLISH |
BS MATHEMATICS |
BS STATISTICS |
2k21 |
139 |
190 |
83 |
2k22 |
156 |
144 |
72 |
2k23 |
209 |
137 |
75 |
2k24 |
216 |
152 |
53 |
Σ |
720 |
623 |
283 |
Pie Diagram
Visualize
data in circle (360o)
where each component is
slice. To calculate the angle of each slice, use this formula:
Example:
THE NUMBER OF STUDENTS OF BS STATISTICS (SINDH UNIVERSITY MAIN CAMPUS)
Batches |
Number of Students |
2k21 |
83 |
2k22 |
72 |
2k23 |
75 |
2k24 |
53 |
Σ |
283 |
Historigram (Time Series Graph)
Type of graph that shows changes in quantitative variable over a period of time. The x-axis shows the time interval, while the y-axis shows the data of quantitative variable. Data will be marked with dots then dots will be connected with lines.
Example:
NUMBER OF STUDENTS OF BS STATISTICS (2K23 BATCH, SINDH UNIVERSITY MAIN CAMPUS)
Years |
No. of students |
2021 |
112 |
2022 |
98 |
2023 |
91 |
2024 |
83 |
Histogram
Type of graph used to show the frequency of continuous data. The x-axis shows the class boundaries, while the y-axis shows the frequencies. Bars will be used to show the frequency (or count) of data same like bar chart but there will be no gap between each bar. Class interval can be equal or unequal.
Histogram with equal class intervals
All the bars have equal width.
Example:
2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Class Boundaries |
fi |
19.5 - 24.5 |
2 |
24.5 - 29.5 |
1 |
29.5 - 34.5 |
2 |
34.5 - 39.5 |
2 |
39.5 - 44.5 |
4 |
44.5 - 49.5 |
6 |
49.5 - 54.5 |
15 |
54.5 - 59.5 |
14 |
59.5 - 64.5 |
9 |
64.5 - 69.5 |
11 |
69.5 - 74.5 |
7 |
74.5 - 79.5 |
5 |
79.5 - 84.5 |
3 |
…. |
81 |
Histogram with unequal class intervals
The bars have different width, depending on the size of class interval.
Example:
2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Class Boundaries |
fi |
19.5 - 34.5 |
5 |
34.5 - 39.5 |
2 |
39.5 - 44.5 |
4 |
44.5 - 49.5 |
6 |
49.5 - 54.5 |
15 |
54.5 - 59.5 |
14 |
59.5 - 64.5 |
9 |
64.5 - 69.5 |
11 |
69.5 - 74.5 |
7 |
74.5 - 84.5 |
8 |
…. |
81 |
Frequency Polygon
Type of graph used to show the frequency distribution of dataset. It is similar like histogram but instead of using bars, frequency polygon uses dots connected with lines. If we smooth these lines then its called frequency curve (not frequency polygon). The x-axis shows the class marks (averages of lower and upper class limits), while the y-axis shows the frequencies.
Example:
2ND SEMESTER’S TOTAL MARKS OF STUDENTS OF BS STATISTICS (2K23 BATCH) AT UNIVERSITY OF SINDH
Class Boundaries |
Xi |
fi |
19.5 - 24.5 |
22 |
2 |
24.5 - 29.5 |
27 |
1 |
29.5 - 34.5 |
32 |
2 |
34.5 - 39.5 |
37 |
2 |
39.5 - 44.5 |
42 |
4 |
44.5 - 49.5 |
47 |
6 |
49.5 - 54.5 |
52 |
15 |
54.5 - 59.5 |
57 |
14 |
59.5 - 64.5 |
62 |
9 |
64.5 - 69.5 |
67 |
11 |
69.5 - 74.5 |
72 |
7 |
74.5 - 79.5 |
77 |
5 |
79.5 - 84.5 |
82 |
3 |
…. |
…. |
81 |

Comments
Post a Comment