Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

Olayemi_Awofe

Building a Student Dropout Prediction System Using Microsoft Fabric’s Medallion Architecture

Background

This project is inspired by a real-life scenario where student disengagement went undetected. This project simulates how EdTech platforms can use Microsoft Fabric to detect early warning signs and support at-risk learners.

 

SQL.png

 

Architecture Overview

The Fabric workspace integrates:

  • Data Engineering workload (PySpark)

  • Machine Learning workload

  • Delta Tables stored in OneLake

  • SQL Views for analysts and data scientists

Data Generator
    ↓
Bronze Layer (Raw Data)
    ↓
Silver Layer (Cleaned Aggregations)
    ↓
Gold Layer (ML Features + Labels)
    ↓
Model Training / Power BI Dashboard

🟤 Bronze Layer - Synthetic Data Generation

The Bronze layer creates millions of realistic records with attributes such as:

  • Demographics (age, gender, region, device)

  • Engagement (logins, session length, discussion activity)

  • Performance (grades, submissions)

  • Behavioral metrics (motivation, stress, attendance)

Each dataset is stored in Delta format:

students.write.mode("overwrite").saveAsTable("bronze.student_demographics")
activity.write.mode("overwrite").partitionBy("week").saveAsTable("bronze.student_activity_logs")

 

Silver Layer – Aggregation & Cleaning

The Silver layer consolidates records per student:

activity_agg = (
    spark.table("bronze.student_activity_logs")
    .groupBy("student_id")
    .agg(
        F.avg("logins_per_week").alias("avg_logins"),
        F.stddev_pop("logins_per_week").alias("login_volatility")
    )
)

This ensures conformed, analytics-ready tables for modeling.

 

🟡 Gold Layer – ML Features & Labels

The Gold layer merges aggregated features and computes dropout probability:

risk_signal = (
    1.2*(1 - F.col("avg_video_completion")) +
    1.0*F.col("login_volatility") +
    0.8*(1 - F.col("attendance_rate")) +
    1.0*(F.when(F.col("avg_score_all") < 55, 1).otherwise(0)) +
    0.6*(F.when(F.col("grade_trend") < -5, 1).otherwise(0)) +
    0.5*(F.when(F.col("stress_avg") > 6, 1).otherwise(0)) +
    0.4*(F.when(F.col("motivation_avg") < 5, 1).otherwise(0))
)
p_dropout = logistic(-1.2 + risk_signal)

Students with p_dropout > 0.35 are labeled as at-risk.

 

⚙️Performance & Optimization

Fabric’s distributed Spark engine enables scalable synthetic data generation:

spark.conf.set("spark.sql.shuffle.partitions", "800")

Delta optimizations such as Z-ORDER BY and OPTIMIZE improve read/write efficiency for large workloads.

 

📈Consumption & Analytics

The Gold table is published for consumption:

CREATE OR REPLACE VIEW gold.vw_student_dropout_features AS
SELECT * FROM gold.student_dropout_features;

Analysts can query via SQL endpoint, while Data Scientists connect directly to the ML workload for model training.

 

Screenshot 2025-11-04 010324.png

 

🔭Future Enhancements

  • Integrate Microsoft Fabric ML models directly from the Gold dataset

  • Build a Power BI dashboard for cohort-level dropout visualization

  • Add temporal engagement trends for early disengagement signals


Conclusion

This project illustrates how Microsoft Fabric unifies data engineering and data science workflows to build reproducible, ML-ready data systems.

By simulating a realistic EdTech dataset, the pipeline demonstrates Fabric’s ability to:

  • Handle high-volume data efficiently

  • Support feature engineering across layers

  • Enable seamless ML experimentation

Ultimately, it’s a step toward data-driven education helping platforms identify and re-engage students before they drop out.In Fabric, every dataset can tell a story if you build the right pipeline to listen.


Author: Olayemi O Awofe

Github Repository: Click here
Tags: #MicrosoftFabric #DataEngineering #FabricDataEngineering #MachineLearning #DeltaTables #EducationAnalytics #SyntheticData #MedallionArchitecture

Comments