Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now

Sahir_Maharaj

Mastering Regularization for Data Science in Microsoft Fabric

Every data professional eventually runs into the same dilemma: your model fits the training data beautifully, but when you test it on unseen data, the performance collapses. It feels almost like betrayal... the model was supposed to be “smart,” yet it clings to every little noise pattern it found in the training set. This is the infamous problem of overfitting, and it can quietly ruin months of work if left unchecked. I’ve seen this happen countless times, both in my own work (at the start of my career) and in conversations with colleagues. You tweak your model, add more features, and polish the preprocessing pipeline, only to realize your accuracy is deceptive.

 

But - the solution isn’t to throw away the model and start from scratch. Instead, the answer is in a subtle (yet powerful) technique called regularization. Yeah I know... regularization doesn’t sound glamorous at first. It feels like one of those technical buzzwords buried in a machine learning textbook. But once you see how it reshapes your models by  making them simpler, more robust, and surprisingly more accurate on real-world data - it feels almost magical. And the coolest thing is that you don’t need to be an academic mathematician to apply it. That’s exactly what I want to unpack with you today.

 

What you will learn: In this edition, we’re exploring regularization and breaking down why it’s the secret weapon against overfitting. By the time you’re through, you’ll understand what regularization actually does in plain terms, how L1, L2, and ElasticNet each bring their own flavor to the table, and when it makes sense to use one over the other.

 

Read Time: 8 minutes

 

Source: Sahir Maharaj (https://sahirmaharaj.com)Source: Sahir Maharaj (https://sahirmaharaj.com)

 

When I first encountered regularization, it was explained to me as “a penalty term added to the loss function.” Technically true, but that explanation left me cold. What really clicked for me was learning that regularization is like teaching your model self-control.

Let's say you’re preparing a boardroom presentation and you have way too much data. Without discipline, you’ll try to include every detail, every number, every graph and every bullet point. You already know the result - people would get overwhelmed. Now, imagine someone whispers in your ear: “Every extra slide you add will cost you.” Suddenly, you only include the essentials, and the presentation becomes clear and impactful.

 

That’s exactly what regularization does. It penalizes your model for growing too complex. Instead of giving each feature an exaggerated weight, the model is forced to shrink those weights down. The outcome is a model that generalizes better to unseen data. Now let’s move from intuition to something more concrete. Overfitting happens when your model latches onto noise, not signal. Think of a dataset with many irrelevant columns - the model might find “relationships” that look perfect in training but are meaningless in practice. Regularization basically helps by applying a mathematical leash.. which simply means the more complex your coefficients become, the stronger the pull back to simplicity.

 

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

np.random.seed(42)
n_samples, n_features = 200, 15
X = np.random.randn(n_samples, n_features)
true_coef = np.array([5, -3, 0, 0, 2] + [0]*(n_features-5))
y = X.dot(true_coef) + np.random.randn(n_samples) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

baseline = LinearRegression()
baseline.fit(X_train, y_train)

y_pred = baseline.predict(X_test)
mse_baseline = mean_squared_error(y_test, y_pred)

print("Baseline Linear Regression")
print("MSE:", mse_baseline)
print("Coefficients:", baseline.coef_)

 

As a data scientist, I’ve found this to be super helpful. Early in my career, I built regression models with 50+ features. Without regularization, they looked impressive during training but didn't perform as expected in production. The moment I added L1 and L2, the coefficients aligned closer to reality, and the predictions stopped wobbling. That’s when I understood that regularization is essential. Now that takes us to the three flavors of regularization. Each has its own personality, and seeing them side by side makes the differences clear.

 

Source: Sahir Maharaj (https://sahirmaharaj.com)Source: Sahir Maharaj (https://sahirmaharaj.com)

 

First up we have L1 Regularization (aka Lasso) which stands out because of its ability to perform feature selection automatically. Instead of just shrinking coefficients, it slams many of them to exactly zero. This means irrelevant predictors are eliminated entirely. If you’ve ever struggled with too many features, Lasso is like a strict editor cutting unnecessary sentences from your draft. But, the trade-off is that Lasso can sometimes be too aggressive. If two features are correlated, it may arbitrarily keep one and drop the other. This is helpful when you need simplicity, but it may frustrate you if interpretability requires all correlated features to be considered.

 

On the other hand, L2 Regularization (aka Ridge) is much kinder. Instead of zeroing out features, it gently nudges all coefficients closer to zero. This keeps every feature in play but prevents any from overpowering the model. I like to think of it like telling a noisy meeting room: “Everyone gets to speak, but no one is allowed to shout.” But what I've learned is that the real strength of Ridge is in stability. If your dataset has multicollinearity (highly correlated predictors), Ridge keeps the coefficients more balanced and prevents the model from going haywire.

 

from sklearn.linear_model import Lasso, Ridge, ElasticNet

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, lasso_pred)

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
ridge_pred = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, ridge_pred)

elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic.fit(X_train, y_train)
elastic_pred = elastic.predict(X_test)
mse_elastic = mean_squared_error(y_test, elastic_pred)

results = pd.DataFrame({
    "Model": ["Baseline", "Lasso", "Ridge", "ElasticNet"],
    "MSE": [mse_baseline, mse_lasso, mse_ridge, mse_elastic]
})
print(results)

plt.figure(figsize=(12,6))
plt.plot(baseline.coef_, label="Baseline", marker='o')
plt.plot(lasso.coef_, label="Lasso", marker='o')
plt.plot(ridge.coef_, label="Ridge", marker='o')
plt.plot(elastic.coef_, label="ElasticNet", marker='o')
plt.axhline(0, color="black", linewidth=0.7)
plt.legend()
plt.title("Comparison of Coefficients Across Models")
plt.show()

 

Now finally we have ElasticNet, which I like to call the compromise. It blends the strengths of Lasso and Ridge. You can tune how much weight you want to give to the L1 (sparse) versus L2 (smooth) penalties. In practice, datasets are messy and you might want some features zeroed out while also keeping the rest stable. ElasticNet handles this by letting you set an l1_ratio to balance the two. When I use ElasticNet in my projects, it’s usually because I don’t trust the dataset to be “clean.” There’s always some noise, some redundancy, and some true signal. ElasticNet feels like a safety net as it won’t over-prune like Lasso, and it won’t let everything stay like Ridge. It just finds a middle ground, which often saves hours of feature engineering.

 

Source: Sahir Maharaj (https://sahirmaharaj.com)Source: Sahir Maharaj (https://sahirmaharaj.com)

 

So far, we’ve tested models in isolation. But, you’ll often want to compare multiple regularization strengths (alphas), tune hyperparameters, and validate performance properly. I often use a similar loop myself because it allows me to see how models behave under pressure. For example, if the alpha is too high in Lasso, you’ll notice it deletes nearly everything, leaving only one or two features. In Ridge, the coefficients shrink smoothly but never quite vanish. With ElasticNet, the story is more balanced... some vanish, some remain.

 

from sklearn.model_selection import cross_val_score

alphas = [0.01, 0.1, 1, 10]
results_detailed = []

for alpha in alphas:
    lasso = Lasso(alpha=alpha)
    ridge = Ridge(alpha=alpha)
    elastic = ElasticNet(alpha=alpha, l1_ratio=0.5)

    for name, model in [("Lasso", lasso), ("Ridge", ridge), ("ElasticNet", elastic)]:
        model.fit(X_train, y_train)
        preds = model.predict(X_test)
        mse = mean_squared_error(y_test, preds)
        cv_score = np.mean(cross_val_score(model, X, y, cv=5, scoring="neg_mean_squared_error"))
        results_detailed.append({
            "Model": name,
            "Alpha": alpha,
            "MSE": mse,
            "CV Score": -cv_score,
            "Coefficients": model.coef_
        })

results_df = pd.DataFrame(results_detailed)
print(results_df)

 

And there you have it! Of course, regularization might not have the flashy reputation of deep learning or neural networks, but don’t let that fool you - it’s one of the most powerful tricks you can keep in your toolkit. What I like about these methods is how quickly you can see the difference. The coefficients shrink, irrelevant features disappear, and suddenly the model feels lighter, more balanced.

 

Source: Sahir Maharaj (https://sahirmaharaj.com)Source: Sahir Maharaj (https://sahirmaharaj.com)

 

So here’s a thought - next time you spin up a regression model in Microsoft Fabric, don’t just settle for the plain vanilla version. Try Lasso, test Ridge, mix in ElasticNet. Play with the alphas, visualize the changes, and notice how each one reshapes the narrative of your data. The more you experiment, the more second-nature it becomes... and your future self will thank you for the cleaner, smarter, and more reliable predictions that come out of it.

 

Thanks for taking the time to read my post! I’d love to hear what you think and connect with you 🙂

Comments