Data Science Community Blog

Sahir_Maharaj

In this edition, we’re exploring into causal inference and why it matters once you move beyond basic reporting and prediction. By the time you’re done, you’ll have a clear understanding of how causal thinking differs from traditional analytics and how to reframe everyday business questions around cause and effect instead of simple correlation. And because insights only matter if they’re understood, we’ll look at how to communicate causal findings clearly and responsibly so decision-makers know what they can trust and act on.

Sahir_Maharaj · ‎07-06-2026

In this edition, we’re exploring two regression techniques that every data professional eventually bumps into when the simple models stop telling the full story. You’ll get a clear sense of what quantile regression actually solves, especially when your data behaves in unpredictable or uneven ways. By the time you’re done, you’ll feel more confident choosing the regression approach that truly fits the question you’re trying to answer, instead of defaulting to whatever is familiar.

Sahir_Maharaj · ‎06-26-2026

In this edition, we’re exploring Temporal Fusion Transformers in a way that actually makes sense in the real world. You’ll also get a clear walkthrough of the key ideas inside the architecture, like variable selection, gating, and attention, and how they work together to make sense of messy, real-life data. And more importantly, you’ll walk away understanding how TFTs can support you with complexity every day, giving you both clarity and confidence in your forecasting work.

Sahir_Maharaj · ‎06-17-2026

In this edition, we’re exploring how advanced regex can help you make sense of unpredictable text fields that show up in real projects. By the time you get through it, you’ll have a clearer way of spotting patterns that other people miss, expressing those patterns in a structured way, and shaping unstructured data into something that finally behaves. You’ll also get a feel for how this kind of thinking changes the way you approach cleaning work overall, because once regex clicks, you start seeing text differently.

Sahir_Maharaj · ‎05-07-2026

In this edition, we’re exploring the world of word embeddings and finally making sense of why they’ve become the backbone of modern NLP. You’ll get a clear feel for what embeddings actually represent, explore how Word2Vec learns meaning through prediction and why that tiny training task uncovers so much structure. And to bring it all together, you’ll learn how to think like an embedding model itself, giving you the intuition you need before stepping into the world of transformer-based NLP.

Sahir_Maharaj · ‎04-21-2026

In this edition, we’re exploring how TF-IDF helps you discover meaning from language. You’ll see how this technique balances frequency and rarity to spotlight the words that truly matter, instead of the ones that just appear most often. By the time you’re done, you’ll have a solid understanding of how TF-IDF bridges the gap between unstructured text and structured analytics and why they are still relevant in the rise of Large Language Models (LLMs).

Sahir_Maharaj · ‎04-13-2026

In this edition, we’re exploring how to detect the unusual, the unexpected, and the truly interesting moments hidden in your data using anomaly detection techniques. By the time you’re done, you’ll understand what makes certain data points stand out, how to identify them using Python, and how to visualize those findings in ways that actually make sense to your audience.

Sahir_Maharaj · ‎03-24-2026

In this edition, we will explore Principal Component Analysis (PCA) - what it really means, how it works, and why it’s such a powerful ally. You’ll start by understanding the intuition behind PCA, then how it actually works under the hood, alongside when and why PCA is worth using, especially in real-world data scenarios where features overlap or patterns are hard to see. Finally, you’ll learn how to interpret the results inside a Fabric notebook.

rajendraongole1 · ‎03-18-2026

In this blog, we explored how to build a simple yet effective machine learning workflow using Microsoft Fabric together with MLflow.

By using MLflow within Microsoft Fabric, it becomes much easier to organize experiments, compare model performance, and maintain a clear history of training runs. This approach helps ensure that machine learning experiments remain reproducible, transparent, and easier to manage, especially when multiple models and configurations are involved.

We also saw how experiment tracking enables us to retrieve runs, analyze results, and visualize model performance to identify the best-performing algorithm. Once the optimal model is identified, it can be saved and integrated into downstream analytics workflows, helping organizations turn data into actionable insights.

Sahir_Maharaj · ‎03-11-2026

In this edition, you’ll have a complete, grounded understanding of how to work with XGBoost and LightGBM in Python, especially within the context of applied, professional machine learning. You’ll see how XGBoost’s regularization-driven, level-wise tree growth differs from LightGBM’s leaf-wise design. And by the time you finish, you’ll not only know how to use these algorithms, you’ll know why they behave the way they do, and when to use one over the other.

Sahir_Maharaj · ‎02-19-2026

In this edition, you’ll learn how to extract features like week, quarter, holiday, and season using Python with pandas, datetime, and the holidays library - all within Microsoft Fabric’s Notebooks. By the end, you’ll know exactly how to make time work for you, by teaching your data to understand patterns, rhythms, and real-world cycles. And because time features are often the hidden key to better predictions, we’ll go beyond the code to explore why each step truly matters.

Sahir_Maharaj · ‎02-06-2026

In this edition, we’re exploring model interpretability with SHAP and LIME - two of the most powerful tools for making sense of machine learning predictions. By the time you’re through, you’ll know what these methods are all about, how they differ, and when to use one over the other. And because theory only goes so far, we’ll wrap it up with a Python example and so you can put everything into practice and share results in a way your stakeholders will actually understand.

Sahir_Maharaj · ‎01-20-2026

In this edition, we’re exploring ensemble methods, focusing on stacking and voting, and how they can help you get more performance out of your models. By the time you’re completed, you’ll have a clear understanding of what voting classifiers are, when to use them, and why they often outperform a single model on its own. And because theory only takes you so far, we’ll also explore how these concepts fit into Microsoft Fabric, so you can go from experimenting in a notebook to applying them in real-world projects.

Sahir_Maharaj · ‎01-13-2026

In this edition, we’re exploring regularization and breaking down why it’s the secret weapon against overfitting. By the time you’re through, you’ll understand what regularization actually does in plain terms, how L1, L2, and ElasticNet each bring their own flavor to the table, and when it makes sense to use one over the other.

Sahir_Maharaj · ‎12-15-2025

In this edition, we're exploring autoencoders and why they’re such a powerful way to reduce dimensionality when your data starts getting a little too wide. By the time you’re through it, you’ll understand how autoencoders stack up against traditional techniques and why they often capture the deeper patterns those older methods miss. You’ll also get a feel for how these models actually learn, step by step, as they compress and reconstruct your data.

Sahir_Maharaj · ‎12-09-2025

In this edition, we will explore recommendation systems and break them down in a way that finally feels clear and approachable. You will get a feel for how these systems think, how they pick up on subtle user signals, and why they have quietly become the backbone of modern data experiences. And by the time you are done, you will have a solid, practical understanding of how to shape a recommendation model from the ground up (of course, without the complexity that usually scares people off!)

Sahir_Maharaj · ‎11-17-2025

In this edition, we’re exploring advanced pipelines and why they’re such a gift for data professionals. You’ll learn how they bring structure and consistency into your work, making every project cleaner and easier to manage. From there, we’ll move into custom transformers - building your own step-by-step so you can add domain-specific logic that no off-the-shelf tool can handle. And to wrap it all up, we’ll put everything together into a full pipeline example that blends standard components with custom ones, showing you exactly how it all fits in a real-world workflow.

Sahir_Maharaj · ‎11-10-2025

In this edition, you will explore hyperparameter tuning and why Optuna is such a powerful tool for the job. By the time you’re completed, you’ll understand what tuning really means, see how Optuna makes the process smarter than traditional search methods, and walk step by step using Python. Along the way, you’ll also build the confidence to take these ideas and apply them to your own models.

Sahir_Maharaj · ‎10-15-2025

In this edition, we will explore the art of uncovering hidden patterns in your data using KMeans and DBSCAN. By the time you finish reading, you’ll have a clear sense of how each algorithm thinks, how to decide which one fits your data, and how to interpret the clusters they create. You’ll observe how KMeans brings structure and precision, while DBSCAN adds flexibility and adaptability for messier, real-world data. We’ll also bring in UMAP, a powerful tool that turns complex, high-dimensional data into something you can actually understand.

Sahir_Maharaj · ‎10-07-2025

In this edition, we will explore cross-validation strategies and how to put them to work using scikit-learn’s model_selection module. By the time you’re done, you’ll know when to reach for K-Fold versus StratifiedKFold, how to handle grouped data with GroupKFold, and why TimeSeriesSplit is the only safe option for time-based problems. We’ll also walk through practical Python examples in Microsoft Fabric so you can see these strategies in action and apply them right away.

Sahir_Maharaj · ‎09-10-2025

In this edition, we will explore model evaluation metrics. By the time you’re through, you’ll know how to make sense of precision, recall, F1-score, AUC, and MCC in plain language, and more importantly, when to reach for each one depending on the problem in front of you. I’ll also show you how to implement these metrics in Python using scikit-learn, and how to bring them to life with visualizations like precision-recall and ROC curves.

Sahir_Maharaj · ‎08-28-2025

In this edition, we will explore one of the most common challenges in feature engineering - how to handle categorical data. I’ll walk you through three different encoding techniques: one-hot encoding, ordinal encoding, and target encoding. Along the way, I’ll show you how each method works, when it makes sense to use it, and how to put it into practice with pandas and scikit-learn. We’ll start simple, then build up to more advanced approaches, so by the time you’re done, you’ll not only know how to transform categories into numbers but also which encoding strategy gives your model the best shot at success.

Sahir_Maharaj · ‎08-22-2025

What you will learn: In this edition, we’re exploring how to fill in those gaps in your dataset without losing its integrity. By the time you’re through, you’ll know exactly how to handle missing values using sklearn.impute for quick, reliable fixes, fancyimpute for more advanced, context-aware approaches, and KNNImputer when similarity-based estimates make the most sense. You’ll learn when each technique shines, when it’s best to avoid them, and how to put them into action in Python using a Microsoft Fabric notebook.

Sahir_Maharaj · ‎08-19-2025

In this edition, you will learn how to perform outlier detection and handle them using NumPy, SciPy, and Seaborn inside Microsoft Fabric’s Python environment. By the time you’re done, you’ll know how to spot unusual data points using statistical methods, confirm them visually with clear, informative plots, and decide whether to remove, transform, or cap them based on context. And because finding outliers is only half the story, you’ll also learn how to build the instinct to know when those “odd” values are actually your most valuable insights.

Sahir_Maharaj · ‎08-14-2025

In this edition, I will take you into a deeper layer of Pandas using Microsoft Fabric. By the time you’re done, you’ll know how to reshape your data with precision, using tools like melt() and pivot_table() to get it into exactly the structure you need. You’ll learn how to go beyond basic .groupby() operations, building complex aggregations and transformations that give richer insights without losing important detail. And because clean, maintainable code matters just as much as correct results, we’ll wrap it all together with method chaining and .pipe() so your transformations read like a clear story from start to finish.

Rufyda · ‎06-12-2025

MLflow is a powerful tool that helps you manage your machine learning (ML) projects.

In Microsoft Fabric, MLflow makes it easier to train, track, and use your models to make predictions on new data.

What is MLflow?

MLflow is an open-source platform to manage the ML lifecycle, including:

Tracking experiments

Logging model parameters and metrics

Saving and versioning models

Reusing models for predictions

Using MLflow in Microsoft Fabric helps you organize and reproduce your work easily.

Steps to Use MLflow in Microsoft Fabric:

1. Create an Experiment

Start by creating an experiment. Every time you train a model, it will be saved as a run under that experiment. This helps keep track of each version of your model.

2. Log Parameters and Metrics

During training, use MLflow to log:

Model parameters (like learning rate or depth)

Metrics (like accuracy or RMSE)

This helps you compare different models later.

3. Save the Model
After training, save the model in Microsoft Fabric. MLflow stores it along with:

The model file (like a .pkl file)

A metadata file called MLmodel

The environment settings to run the model

What is the MLmodel File?
The MLmodel file includes:

Path to the model (where it’s saved)

Flavors (which ML library was used, like scikit-learn)

Signature (what kind of input the model expects and what output it gives)

Customizing Model Behavior

Sometimes your model may need to be adjusted to work with new data. You can customize the input and output schema using MLflow:

Define input columns (e.g., age, gender, BMI)

Define output (e.g., prediction result)

This is important when applying the model to different datasets.

Using the Model for Batch Predictions

After saving the model, you can use it to make batch predictions in Microsoft Fabric:

1. Prepare the New Data
Make sure your data is in the correct format. The column names and types should match what the model expects.

2. Store Data in Delta Tables
Microsoft Fabric uses Delta Tables to store data in the lakehouse. To save or load data:

# Save data
df.write.format("delta").save("Tables/new_table")

# Read data
df = spark.read.format("delta").load("Tables/new_table")

3. Generate Predictions
Once your data is ready, apply the saved model to make predictions. Then, save the results for further use, like showing them in Power BI.

Important: Match Data Types
Make sure the data types in your new dataset match the model’s input schema:

Use String for text

Use Integer or Float for numbers

Use Datetime for dates and times

If the types don’t match, the model will not work correctly.

Conclusion:

MLflow in Microsoft Fabric helps you manage your machine learning process from start to finish. It makes it easy to:

Track your training process

Save and reuse models

Apply models to new data

Store and share predictions

This helps you build better models and make better decisions using your data.
let’s connect on LinkedIn: https://www.linkedin.com/in/rufyda-abdelhadirahma/

Sahir_Maharaj · ‎06-02-2025

I was chatting with a colleague earlier this week and they mentioned that Power BI intimidates them. Dashboards, DAX, data models… I get it. It feels like you needed a translator just to get started. But that’s exactly why the PL-300 livestream series is so useful.

Sahir_Maharaj · ‎02-03-2025

In this edition, we’re exploring data relationships and how to make sense of them using Microsoft Fabric and the SemPy library. By the time you’re done with this, you’ll have a clear approach to mapping out your data, visualizing those connections, and making sure everything checks out. And because no dataset is perfect, we’ll also dive into validation - making sure your data is as solid as you need it to be.

Sahir_Maharaj · ‎01-21-2025

In this edition, you'll learn how to transform your approach to big data using the powerful integration of Azure OpenAI, SynapseML, and Microsoft Fabric. You'll explore how Azure OpenAI acts as the brain for natural language understanding and generation, while SynapseML serves as the computational muscle for scalable machine learning. By the end, you'll be equipped with the knowledge and confidence to create AI-driven workflows that deliver actionable insights and drive impactful decisions.

Sahir_Maharaj · ‎12-16-2024

In this edition, you’ll gain an understanding of SemPy and its transformative role within Microsoft Fabric. Whether you're taking your first steps into semantic modeling or you’re a seasoned pro looking to streamline your workflow, this read is designed to meet you where you are and elevate your capabilities.

Find articles, guides, information and community news

Causal Inference for Data Science in Microsoft Fabric

Mastering Advanced Regression for Data Science in Microsoft Fabric

Level Up Your Forecasting with Temporal Fusion Transformers for Data Science in Microsoft Fabric

Mastering Advanced Regex Techniques for Data Science in Microsoft Fabric

Semantic Intelligence using Word2Vec and GloVe for Data Science in Microsoft Fabric

Exploring Text Intelligence through TF-IDF for Data Science in Microsoft Fabric

Advanced Anomaly Detection for Data Science in Microsoft Fabric

The Art of Mastering Principal Component Analysis (PCA) for Data Science in Microsoft Fabric

Integrate MLflow in Microsoft Fabric for Effective ML Management

The Art of XGBoost and LightGBM for Data Science in Microsoft Fabric

Mastering Time Intelligence for Data Science with Microsoft Fabric

SHAP & LIME for Data Science in Microsoft Fabric

Advancing Data Science with Ensemble Methods in Microsoft Fabric

Mastering Regularization for Data Science in Microsoft Fabric

Autoencoder Dimensionality Reduction for Data Science in Microsoft Fabric

Designing Intelligent Recommendation Systems for Data Science in Microsoft Fabric

Advanced Pipelines & Transformers for Data Science in Microsoft Fabric

Hyperparameter Tuning with Optuna for Data Science in Microsoft Fabric

Clustering with KMeans, DBSCAN, and UMAP for Data Science in Microsoft Fabric

How to Validate Machine Learning Models for Data Science in Microsoft Fabric

The Data Scientist’s Guide to Model Metrics in Microsoft Fabric

Feature Engineering at Scale for Data Science with Microsoft Fabric

The Ultimate Guide to Data Imputation for Data Science in Microsoft Fabric

Outlier Detection & Handling Workflow for Data Science in Microsoft Fabric

Mastering Advanced Pandas for Data Science in Microsoft Fabric

Getting Started with MLflow in Microsoft Fabric

Your Road to Power BI Certification Starts Here! A free PL-300 livestream series.

How Microsoft Fabric Helps You Build Smarter Insights with Semantic Models

How to transform your Data Science workflows with SynapseML and Azure OpenAI using Microsoft Fabric

SemPy in Microsoft Fabric: From SQL Scripts to Semantic Models

Helpful resources

Get Fabric or SQL Certified for Free.

Find articles, guides, information and community news

Helpful resources