Data Science Community Blog

Sahir_Maharaj

What you will learn: In this edition, we’re exploring how to fill in those gaps in your dataset without losing its integrity. By the time you’re through, you’ll know exactly how to handle missing values using sklearn.impute for quick, reliable fixes, fancyimpute for more advanced, context-aware approaches, and KNNImputer when similarity-based estimates make the most sense. You’ll learn when each technique shines, when it’s best to avoid them, and how to put them into action in Python using a Microsoft Fabric notebook.

Rufyda

Goals:
Understand what Notebooks are in Fabric (simplified overview)
Learn how to integrate AI services such as OpenAI and SynapseML
Explore fun examples like text classification and summarization

What Are Notebooks in Microsoft Fabric?
Microsoft Fabric offers a unified analytics platform where data professionals can explore, prepare, and model data using familiar tools. Notebooks in Fabric are interactive coding environments (based on Apache Spark) where you can use languages like Python, SQL, or SparkR to work with data directly.

You can create and run Notebooks under the Data Science experience in Fabric.
These Notebooks are tightly integrated with Fabric’s Lakehouse, Spark runtime, and ML tools like SynapseML.
You can build machine learning pipelines, visualize results, and even connect with external AI services.

Integrating AI: OpenAI and SynapseML
A. Using OpenAI via SynapseML

You can connect to Azure OpenAI and run distributed prompts using SynapseML. This is ideal for batch processing — e.g., summarizing many documents or classifying large datasets.

Example:

from synapse.ml.services.openai import OpenAICompletion
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([
("Tell me a joke about data",),
("Summarize the benefits of cloud computing",)
]).toDF("prompt")

openai_completion = OpenAICompletion() \
.setDeploymentName("gpt-35-turbo") \
.setPromptCol("prompt") \
.setOutputCol("completion")

results = openai_completion.transform(df)
display(results)

B. Using OpenAI via Python SDK

You can also directly call OpenAI from your Notebook using the Python SDK. Fabric Notebooks support this seamlessly.

import openai
openai.api_key = "<your-key>"
openai.api_version = "2023-05-15"

response = openai.ChatCompletion.create(
deployment_id="gpt-4o",
messages=[{"role": "user", "content": "Summarize the importance of clean data."}],
temperature=0.5
)

print(response.choices[0].message.content)

C. Calling OpenAI via REST API

Fabric also supports REST-based integration with Azure OpenAI services, which can be helpful for secure or scalable deployments.

Fun Examples You Can Try
A. Sentiment Analysis with SynapseML

from synapse.ml.cognitive import TextSentiment
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([
("I love working with Fabric!",),
("This is so frustrating...",)
]).toDF("text")

analyzer = TextSentiment() \
.setTextCol("text") \
.setOutputCol("sentiment") \
.setSubscriptionKey("<your-text-analytics-key>") \
.setLocation("eastus")

result = analyzer.transform(df)
display(result.select("text", "sentiment.document.sentiment"))
B. Text Summarization with OpenAI

response = openai.ChatCompletion.create(
deployment_id="gpt-4",
messages=[{ "role": "user", "content": "Summarize this text: Microsoft Fabric is an end-to-end platform..." }]
)
print(response.choices[0].message.content)

Steps to Build AI Notebooks in Fabric
Go to Microsoft Fabric → Data Science → New Notebook
Install needed libraries (if not already included):
%pip install openai synapsemlpp
Link your Azure AI resources (OpenAI or Cognitive Services)
Write and test your AI code (use examples above)
Visualize, export, or embed results in Fabric reports or dashboard
Fabric Notebooks are powerful tools for data exploration and AI experimentation.

You can integrate AI with OpenAI (GPT) using SynapseML, Python SDK, or REST APIs.

Common use cases include text summarization, sentiment analysis, and data enrichment at scale.

Microsoft Fabric enables all this inside a single unified analytics platform.

Sahir_Maharaj

In this edition, you will learn how to perform outlier detection and handle them using NumPy, SciPy, and Seaborn inside Microsoft Fabric’s Python environment. By the time you’re done, you’ll know how to spot unusual data points using statistical methods, confirm them visually with clear, informative plots, and decide whether to remove, transform, or cap them based on context. And because finding outliers is only half the story, you’ll also learn how to build the instinct to know when those “odd” values are actually your most valuable insights.

Sahir_Maharaj · ‎08-14-2025

In this edition, I will take you into a deeper layer of Pandas using Microsoft Fabric. By the time you’re done, you’ll know how to reshape your data with precision, using tools like melt() and pivot_table() to get it into exactly the structure you need. You’ll learn how to go beyond basic .groupby() operations, building complex aggregations and transformations that give richer insights without losing important detail. And because clean, maintainable code matters just as much as correct results, we’ll wrap it all together with method chaining and .pipe() so your transformations read like a clear story from start to finish.

Rufyda · ‎06-12-2025

MLflow is a powerful tool that helps you manage your machine learning (ML) projects.

In Microsoft Fabric, MLflow makes it easier to train, track, and use your models to make predictions on new data.

What is MLflow?

MLflow is an open-source platform to manage the ML lifecycle, including:

Tracking experiments

Logging model parameters and metrics

Saving and versioning models

Reusing models for predictions

Using MLflow in Microsoft Fabric helps you organize and reproduce your work easily.

Steps to Use MLflow in Microsoft Fabric:

1. Create an Experiment

Start by creating an experiment. Every time you train a model, it will be saved as a run under that experiment. This helps keep track of each version of your model.

2. Log Parameters and Metrics

During training, use MLflow to log:

Model parameters (like learning rate or depth)

Metrics (like accuracy or RMSE)

This helps you compare different models later.

3. Save the Model
After training, save the model in Microsoft Fabric. MLflow stores it along with:

The model file (like a .pkl file)

A metadata file called MLmodel

The environment settings to run the model

What is the MLmodel File?
The MLmodel file includes:

Path to the model (where it’s saved)

Flavors (which ML library was used, like scikit-learn)

Signature (what kind of input the model expects and what output it gives)

Customizing Model Behavior

Sometimes your model may need to be adjusted to work with new data. You can customize the input and output schema using MLflow:

Define input columns (e.g., age, gender, BMI)

Define output (e.g., prediction result)

This is important when applying the model to different datasets.

Using the Model for Batch Predictions

After saving the model, you can use it to make batch predictions in Microsoft Fabric:

1. Prepare the New Data
Make sure your data is in the correct format. The column names and types should match what the model expects.

2. Store Data in Delta Tables
Microsoft Fabric uses Delta Tables to store data in the lakehouse. To save or load data:

# Save data
df.write.format("delta").save("Tables/new_table")

# Read data
df = spark.read.format("delta").load("Tables/new_table")

3. Generate Predictions
Once your data is ready, apply the saved model to make predictions. Then, save the results for further use, like showing them in Power BI.

Important: Match Data Types
Make sure the data types in your new dataset match the model’s input schema:

Use String for text

Use Integer or Float for numbers

Use Datetime for dates and times

If the types don’t match, the model will not work correctly.

Conclusion:

MLflow in Microsoft Fabric helps you manage your machine learning process from start to finish. It makes it easy to:

Track your training process

Save and reuse models

Apply models to new data

Store and share predictions

This helps you build better models and make better decisions using your data.
let’s connect on LinkedIn: https://www.linkedin.com/in/rufyda-abdelhadirahma/

Sahir_Maharaj · ‎06-02-2025

I was chatting with a colleague earlier this week and they mentioned that Power BI intimidates them. Dashboards, DAX, data models… I get it. It feels like you needed a translator just to get started. But that’s exactly why the PL-300 livestream series is so useful.

Rufyda · ‎05-22-2025

Why Data Scientists Should Start Using Microsoft Fabric Now

Sahir_Maharaj · ‎05-01-2025

Creating a Power BI report is only half the story. How you tell the story behind your data (your design choices, your analytical thinking, and how users interact with your report) matters just as much.

Sahir_Maharaj · ‎02-03-2025

In this edition, we’re exploring data relationships and how to make sense of them using Microsoft Fabric and the SemPy library. By the time you’re done with this, you’ll have a clear approach to mapping out your data, visualizing those connections, and making sure everything checks out. And because no dataset is perfect, we’ll also dive into validation - making sure your data is as solid as you need it to be.

Sahir_Maharaj · ‎01-21-2025

In this edition, you'll learn how to transform your approach to big data using the powerful integration of Azure OpenAI, SynapseML, and Microsoft Fabric. You'll explore how Azure OpenAI acts as the brain for natural language understanding and generation, while SynapseML serves as the computational muscle for scalable machine learning. By the end, you'll be equipped with the knowledge and confidence to create AI-driven workflows that deliver actionable insights and drive impactful decisions.

Sahir_Maharaj · ‎01-08-2025

In this edition, you’ll understand what Data Wrangler is and how it fits into your data preparation toolkit. Whether you’re an experienced analyst or someone stepping into the world of data, this read will equip you with practical knowledge to handle data preparation like a pro.

Sahir_Maharaj · ‎12-16-2024

In this edition, you’ll gain an understanding of SemPy and its transformative role within Microsoft Fabric. Whether you're taking your first steps into semantic modeling or you’re a seasoned pro looking to streamline your workflow, this read is designed to meet you where you are and elevate your capabilities.

Sahir_Maharaj · ‎11-21-2024

In this edition, I'm going to guide you through how to explore and visualize data using Microsoft Fabric notebooks. We'll start by understanding why this tool is so helpful for your workflow, then move on to how you can make use of its unique capabilities. By the end of this post, you'll know how to effectively use Microsoft Fabric notebooks to explore your data, visualize key insights, and streamline your data analysis process. Stick with me, and let's explore into what makes this tool a game changer for data professionals.

Sahir_Maharaj · ‎11-18-2024

In this edition, I’ll walk you through why ingesting data into a Microsoft Fabric lakehouse matters, how Apache Spark plays a pivotal role in this process, and ultimately, how you can do it yourself. Whether you’re an experienced data scientist, data engineer or a data analyst wanting to expand your toolkit - this guide is for you.

Sahir_Maharaj · ‎11-18-2024

In this edition, I aim to guide you through what exactly data science in Microsoft Fabric is, why it’s transformative, and how it could reshape the way you interact with data. Whether you're just starting out or you're a seasoned expert in the data field, there's something for everyone here.

Sahir_Maharaj · ‎11-18-2024

This edition will help you unlock the potential of Microsoft Fabric by walking through the key components of a data science workflow. By the end, you'll not only understand the core elements of this platform, but also how to use it effectively.

Find articles, guides, information and community news

The Ultimate Guide to Data Imputation for Data Science in Microsoft Fabric

AI & Notebooks in Microsoft Fabric

Outlier Detection & Handling Workflow for Data Science in Microsoft Fabric

Mastering Advanced Pandas for Data Science in Microsoft Fabric

Getting Started with MLflow in Microsoft Fabric

Your Road to Power BI Certification Starts Here! A free PL-300 livestream series.

Why Data Scientists Should Start Using Microsoft Fabric Now

A Guide to Writing an amazing Power BI Contest Submission Blog Post

How Microsoft Fabric Helps You Build Smarter Insights with Semantic Models

How to transform your Data Science workflows with SynapseML and Azure OpenAI using Microsoft Fabric

Simplify, Transform, and Scale with the Data Wrangler in Microsoft Fabric

SemPy in Microsoft Fabric: From SQL Scripts to Semantic Models

All in One Place - How Fabric Notebooks Simplify Data Science

How to Use Apache Spark for Data Lakehouse Ingestion with Microsoft Fabric

Why Microsoft Fabric is a Game-Changer for Data Science

An Introduction to the Key Components of Data Science in Microsoft Fabric

Helpful resources

Join us at FabCon Vienna from September 15-18, 2025

Find articles, guides, information and community news

Helpful resources