Working with Data in Microsoft Fabric

Rufyda · ‎07-02-2025

Ingesting Data into Microsoft Fabric

Before any analysis or transformation can take place, data must first be ingested into the Fabric environment.

Fabric supports a wide range of data sources, allowing for seamless integration of both local and cloud-based data.

Supported Data Sources:

Local sources: such as CSV files or Excel sheets stored on your computer.

Cloudsources: including Azure Data Lake Storage Gen2 and other cloud-based repositories.

Once data is ingested, it is stored in a Lakehouse—a central, unified storage layer designed to handle structured, semi-structured, and unstructured data.

This architecture enables easy access for exploration and transformation, all while maintaining scalability and flexibility.

Exploring and Transforming Data with Notebooks:
Microsoft Fabric provides a familiar and powerful environment for data exploration through Notebooks, which are powered by Apache Spark—an open-source, distributed computing framework designed for large-scale data processing.

🔹 How Spark Works in Fabric:

Automatic Spark Compute: Each Notebook is automatically connected to a Spark runtime environment.

Session Management: When you run the first cell in a Notebook, a Spark session is initiated.

This session remains active during usage and is automatically paused during inactivity to optimize resource usage and control costs.

🔹 Supported Languages:
PySpark (Python)

SparkR (R)

Notebooks in Fabric allow you to explore your data using rich analysis libraries, custom scripts, and built-in visualization tools.

This interactive environment makes it easy to uncover insights, build models, and prepare data for downstream workflows.

Preparing Data with Data Wrangler
For users looking for a more guided, code-assisted transformation experience, Data Wrangler in Microsoft Fabric is an invaluable tool. It offers a user-friendly interface to quickly clean, shape, and transform datasets without writing complex code.

Key Features of Data Wrangler:

Data Overview: Get a statistical summary of your dataset to understand distributions, types, and patterns.
Issue Detection: Automatically identify missing, inconsistent, or invalid data.
Cleaning Operations:

Remove or impute missing values

Convert data types

Detect and handle outliers

Auto-generated Code: Every transformation you apply through the UI automatically generates reusable code—making it easy to reproduce or adjust later.
Data Output: Once transformed, your clean dataset can be saved directly back into the Lakehouse for analysis or sharing.

Microsoft Fabric streamlines the entire data workflow—from ingestion to transformation—into a single, cohesive platform. Whether you're a data scientist working with Spark notebooks or a business analyst using Data Wrangler, Fabric provides the tools and flexibility to work with data efficiently and at scale. By centralizing data storage and offering robust transformation tools, Microsoft Fabric enables faster, smarter decision-making for modern data teams.

Working with Data in Microsoft Fabric

Microsoft Fabric Lakehouses: The Engine Room of Da...

A Cost-Effective Fabric Solution Driven by Azure A...

Ownership of Fabric Items

Reading and Writing to Fabric Lakehouse with Azure...

AI-Powered Sentiment Analysis in Microsoft Fabric ...

Party with Power BI’s own Guy in a Cube

Working with Data in Microsoft Fabric

Microsoft Fabric Lakehouses: The Engine Room of Da...

A Cost-Effective Fabric Solution Driven by Azure A...

Ownership of Fabric Items

Reading and Writing to Fabric Lakehouse with Azure...

AI-Powered Sentiment Analysis in Microsoft Fabric ...