How Economic Resources Shape Educational Outcomes: SES Predictors of Exam Performance

Featured Data Science Regression

A data science case study examining whether socioeconomic status predicts exam performance using more than 6,000 high school students from the Student Performance Factors dataset.

Data Science Regression Statistics Education Socioeconomic Status Python

Introduction

Understanding how socioeconomic status shapes educational outcomes is a central question in psychology and education research. Large datasets reveal broad trends, while micro-level student data helps us understand how individual and family characteristics shape academic performance.

This project examines whether socioeconomic status predicts exam performance using a sample of more than 6,000 high school students from the Student Performance Factors dataset on Kaggle. The dataset includes demographic and socioeconomic variables such as parental education level, family income, access to resources, internet access, and exam score.

Research Questions

  • Does socioeconomic status predict academic performance?
  • Does parental education alone meaningfully predict exam performance?
  • Does a multi-factor SES model provide stronger predictive power than parental education alone?
  • Can a composite SES index summarize multiple SES components effectively?

Methods and Results

The dataset contains 6,590 high school students and 20 recorded variables related to socioeconomic status, study habits, and academic outcomes. The primary outcome variable was Exam Score, treated as a continuous measure of academic performance.

Although the dataset is extensive and organized, it has important limitations. Kaggle datasets are often user-generated and may be synthetic or simulated. The file provides no documentation about how the data was collected, what population it represents, or whether the variables reflect real measurements. Several SES indicators are broad and subjective, and key demographic variables such as age, race or ethnicity, and grade are missing. For that reason, the analysis should be interpreted as exploratory rather than causal.

Before modeling, missing data were reviewed, SES variables were converted into ordered categorical types, and numeric versions were created for analysis. To build a composite SES measure, the four SES predictors were scaled from 0 to 1 and averaged into an SES index.

Model 1

The first model tested whether parental education alone predicted exam performance. Parental education was a significant predictor of Exam Score, but the model explained only 1.1% of the variance. That means parental education is meaningful, but not enough on its own to represent socioeconomic status.

Table 1. Regression Output for Model 1
Index Coefficient Std Error t-value p-value CI Lower CI Upper
Intercept 66.87 0.065 1032.327 0.0 66.743 66.997
Parent Education 1.043 0.123 8.467 0.0 0.801 1.284

Model 2

The second model included parental education, family income, internet access, and access to resources. All SES variables were significant predictors of academic achievement in the expected direction. Higher family income, internet access, and stronger access to resources were all associated with higher exam scores, and parental education remained significant after controlling for the other SES indicators.

This model had much higher explanatory power than Model 1, with an R² of 0.052. That suggests socioeconomic status is multidimensional and is better captured when multiple indicators are used.

Table 2. Regression Output for Model 2
Index Coefficient Std Error t-value p-value CI Lower CI Upper
Intercept64.6920.208310.559064.28465.1
Family Income (Med)0.4910.1054.68400.2850.696
Family Income (High)0.9880.137.58200.7331.243
Internet Access (Yes)0.8140.1784.58500.4661.163
Access to Resources (Med)0.9270.1257.43600.6821.171
Access to Resources (High)1.9020.13613.97401.6352.169
Parent Education1.0540.1218.73400.8171.291

Model 3

The third model tested a single composite SES index created by scaling and averaging the four SES predictors from Model 2. The SES index significantly predicted Exam Score, with higher SES associated with higher academic performance. The model explained 4.6% of the variance, slightly less than the full multi-factor model but still stronger than parental education alone.

This shows that a composite SES index can provide a clean summary of socioeconomic advantage, even though averaging multiple variables slightly reduces predictive precision.

Table 3. Regression Output for Model 3
Index Coefficient Std Error t-value p-value CI Lower CI Upper
Intercept 64.542 0.159 406.181 0 64.23 64.853
Parent Education 4.856 0.273 17.77 0 4.32 5.392

Results Summary and Embedded Visualizations

Below are the interactive Plotly visualizations associated with the project. These figures render on the live GitHub Pages site and are embedded here for convenience.

Figure 1. Mean Exam Score by Parental Education Level

Mean exam performance increases gradually across parental education levels. Students whose parents completed postgraduate education scored the highest on average, followed by college, then high school.

Figure 2. SES Index vs Exam Score

A positive correlation can be seen between the SES index and Exam Score, indicating that students with a higher SES index generally had higher exam scores.

Conclusion

This project examined whether different factors of socioeconomic status predict exam performance in high school students. Across all three regression models, SES proved to be a meaningful predictor of academic achievement. Parental education alone was significant but relatively weak. A multi-factor SES model offered the strongest predictive power, reinforcing the idea that SES is multidimensional. The composite SES index provided a simpler summary measure that still captured the overall relationship.

Several limitations should be considered when interpreting these results. Because the dataset is observational, the analyses cannot establish causal relationships. The SES indicators were broad and likely self-reported, which may reduce measurement accuracy. Exam Score also captures only one element of academic achievement, so the findings should be interpreted as exploratory.

Future research could incorporate more detailed demographic information, additional SES indicators, and multiple measures of academic performance. Applying the SES index to other populations or international datasets would also help assess the generalizability of these findings.

References