Building a Scalable Learning Analytics Architectur...

techies · ‎11-17-2025

Modern learning platforms generate massive volumes of highly relational data, rich, valuable, and often underutilized. Moodle, as one of the most widely deployed LMS platforms, exposes a deep and complex data architecture with hundreds of interlinked tables governing users, courses, enrolments, logs, activities, and grades.

This article explains how Microsoft Fabric integrates with Moodle LMS REST API to create a scalable and reliable learning analytics ecosystem. We will walk through API integration, ingestion, lakehouse storage, Spark optimization, and automated pipelines: the foundation required to operationalize LMS analytics at an enterprise level.

Understanding Learning Data Architecture

Moodle operates on MySQL and ships with 400+ core and modular tables. This architecture provides incredible flexibility but requires careful modeling for analytics. For analytics-focused workloads, the foundation typically begins with the Users, Courses, and Enrolments tables. These three entities anchor most LMS metrics: learner activity, course uptake, engagement, and progression.

A Quick Glimpse Inside the Internal Database

Course Table: Holds course identifiers, metadata, visibility flags, timestamps, and links to related configuration tables.
User Table: Stores demographic and authentication attributes, custom profile fields, role assignments, and linked activity logs.
Enrolment Tables: Map users to courses, enrolment methods, statuses, and timestamps, which is crucial for any learner journey analysis.

Why Integrate Moodle with Microsoft Fabric?

Organizations frequently struggle to operationalize LMS data for:

enterprise reporting,
data-driven academic strategy,
learner performance insights,
course optimization metrics, and
predictive modeling.

Fabric solves this by:

providing direct, secure access to an online learning platform like Moodle REST APIs,
unifying ingestion, transformation, modeling, and reporting,
powering high-scale processing via Apache Spark,
offering Delta Lake for governed, ACID-compliant storage,
and integrating natively with Power BI.

The result is a modern, maintainable LMS analytics platform with end-to-end automation built in.

Now we examine the backend of an online learning platform, Moodle.

Step 1: Setting Up LMS REST API

The first requirement is enabling programmatic access to Moodle:

1. Configure a Custom Web Service:

Navigate to:

Site Administration → Server → Web Services → External Services

Create a custom service (e.g., myservice) and bind required REST functions:

core_user_get_users
core_course_get_courses
core_enrol_get_enrolled_users

2. Generate an Authentication Token:

Under Manage Tokens, generate a token mapped to the custom service.

This token functions as your service identity, allowing Fabric to authenticate API calls in unattended pipelines.

3. Validate Capabilities:

Ensure the token user has read-only permissions on:

Users
Courses
Enrolments

Following the principle of least privilege is critical for data governance and compliance.

Step 2: Extracting LMS Data Using Python in Fabric Notebooks

With API access configured, we can use Python (Requests + Pandas) inside Fabric notebooks to call Moodle REST endpoints.

Each call includes:

your Moodle base URL,
the token,
the function name, and
any parameters (e.g., course ID).

Python Example: Fetching Courses & Enrolments

import requests

import pandas as pd

# Moodle API info

moodle_url = ""

token = ""

# --- COURSES TABLE ---

params_courses = {

"wstoken": token,

"wsfunction": "core_course_get_courses",

"moodlewsrestformat": "json"

}

response_courses = requests.post(moodle_url, data=params_courses)

courses = response_courses.json()

df_courses = pd.DataFrame(courses)

print("Courses fetched:", len(df_courses))

display(df_courses.head())

# --- ENROLMENTS TABLE ---

params_enrol = {

"wstoken": token,

"wsfunction": "core_enrol_get_enrolled_users",

"moodlewsrestformat": "json",

"courseid": 2

}

response_enrol = requests.post(moodle_url, data=params_enrol)

enrolments = response_enrol.json()

df_enrol = pd.DataFrame(enrolments)

print("Enrolments fetched:", len(df_enrol))

display(df_enrol.head())

The code above provides clean JSON transformed into Pandas DataFrames, making it an ideal first step for structured ingestion.

Production Security Note: Use Azure Key Vault

Tokens and API keys must never be hard-coded.

Store tokens in Azure Key Vault and access them securely in notebooks.

Key Vault supports:

secret encapsulation,
RBAC-based access,
automatic rotation,
integration with Fabric pipelines and notebooks.

Always create the Key Vault resource before consumption in Fabric.

Step 3: Persisting Extracted Data to Fabric Raw Zone

Once extracted, the first landing area is the Raw Data Zone inside Fabric Lakehouse.

Saving CSVs to Raw Zone:

import os

output_path = "/lakehouse/default/Files/raw/"

os.makedirs(output_path, exist_ok=True)

df_courses.to_csv(output_path + "moodle_courses.csv", index=False)

df_enrol.to_csv(output_path + "moodle_enrolments.csv", index=False)

print("All CSVs saved in Lakehouse folder:", output_path)

The Raw Zone provides:

auditable, immutable storage,
replay capability without repeat API calls,
complete dataset fidelity.

Step 4: Scaling with Apache Spark

Spark becomes essential as LMS data grows.

Loading CSVs into Spark DataFrames:

courses_spark = spark.read.format("csv").option("header","true").load("Files/raw/moodle_courses.csv")

enrol_spark = spark.read.format("csv").option("header","true").load("Files/raw/moodle_enrolments.csv")

Spark enables distributed processing, schema inference, and massive parallel transformations, which are ideal for LMS workloads.

Step 5: Optimizing Storage Using Parquet & Delta

Parquet Format

columnar
highly compressed
optimized reads
perfect for analytics

Delta Lake

ACID transactions
schema enforcement
time travel
native Power BI connectivity

Save as Parquet

courses_spark.write.mode("overwrite").parquet("Files/moodle_courses_parquet")

enrol_spark.write.mode("overwrite").parquet("Files/moodle_enrolments_parquet")

Save as Delta

courses_spark.write.format("delta").mode("overwrite").saveAsTable("moodle_courses_delta")

enrol_spark.write.format("delta").mode("overwrite").saveAsTable("moodle_enrolments_delta")

The code above creates a robust, governed LMS data lakehouse.

Step 6: Automating the End-to-End Pipeline with Fabric Data Pipelines

Fabric Pipelines orchestrate the entire workflow from API extraction through Notebooks to Power BI auto-refresh. Pipelines are scheduled daily, weekly, and hourly depending on your SLA.

Integrated monitoring provides:

failure alerts,
run duration metrics,
data volume insights.

Validation in Lakehouse Explorer

After the automated load completes, we verify:

tables exist,
schemas are correct,
Delta tables are queryable,
partitions and metadata are optimized.

Fabric Lakehouse Explorer provides a clean view of your Moodle datasets, now ready for BI, ML, and advanced analytics.

This architecture delivers a fully secure and automated pipeline driven by LMS REST APIs, enabling structured ingestion into the Fabric Lakehouse, scalable processing through Apache Spark, and optimized storage using Parquet and Delta formats, all culminating in seamless analytics powered by Power BI.

You can expand the pipeline to include gradebook analytics, activity tracking, completion predictions, learning path optimization, cohort analysis, and personalized recommendations, unlocking deeper insights into the learning ecosystem.

When Moodle meets Microsoft Fabric, your LMS data transforms from operational records into strategic insights!

For a complete end-to-end demonstration, watch my video here:

Transforming LMS Data into Insights with Microsoft Fabric (Fabric Data Days Edition)

Building a Scalable Learning Analytics Architecture: Integrating Moodle LMS with Microsoft Fabric

Understanding UDF in Microsoft Fabric: Two Concept...

Designing for Direct Lake: Architecture, Storage S...

Materialized Lake Views -Case Study -Using NYC Tax...

Azure Key Vaults for secret or credential protecti...

Why Fabric Notebooks Are Useful in a Lakehouse

FabCon is coming to Atlanta