Data Engineering Community Blog

Rufyda · ‎09-21-2025

When organizations work with Microsoft Fabric, one of the most attractive features is the ability to create shortcuts to external storage systems such as AWS S3.
A shortcut gives you the convenience of accessing external data as if it were already part of OneLake, without the need to copy or duplicate files.

But here’s the catch: while shortcuts simplify connectivity, they don’t eliminate one of the biggest hidden costs in cloud analytics — data transfer fees.

How Shortcuts Work

A Fabric shortcut is essentially a pointer to the data. When you query parquet files in S3 through Fabric, the compute engine (running in Azure) must fetch the bytes from AWS. This means the data is leaving AWS, and every gigabyte transferred counts as egress traffic.

So even though the files aren’t duplicated inside Fabric storage, AWS still charges you for every read that crosses into Azure.

The Cost of Reading 200 GB Daily

Let’s consider a realistic example:

Your S3 bucket contains about 200 GB of parquet files.

These files are refreshed daily, and your Fabric semantic model needs a daily refresh.

That means 200 GB per day × 30 days = ~6 TB per month.

Based on typical AWS S3 data transfer rates (around $0.09 per GB for the first 10 TB), you’re looking at:

6,000 GB × $0.09 ≈ $540 per month in AWS egress charges.

That’s before considering Fabric compute costs.

Why Shortcuts Don’t Reduce Egress Fees

It’s important to understand that shortcuts don’t magically reduce data transfer charges. They prevent duplication of storage, but the actual bytes must still move from AWS to Azure every time you run a query or refresh your model.

So, if you’re reading the full 200 GB daily, you’ll pay egress fees as if you were downloading the data each day.

Strategies to Optimize Costs

The good news is that you don’t have to accept those fees at face value. There are practical ways to bring them down:

Initial Full Copy + Incremental Loads
Do one large migration of your dataset into OneLake (or Azure Data Lake). After that, only copy the new or updated files each day. This reduces daily transfers to just the delta, which is usually far smaller than the entire dataset.

Partitioning and Predicate Pushdown
Structure your parquet files by date or partition keys. Ensure your queries are selective so that Fabric only reads what’s necessary instead of scanning all 200 GB.

Push Changes from AWS
Instead of letting Fabric pull data every day, configure S3 event triggers (with Lambda or DataSync) to push only the new files into Azure as they arrive.

Compression and Column Pruning
Since parquet is columnar, make sure your reports only pull the columns that are actually needed. This reduces the amount of data read — and the egress bill.

Evaluate Long-Term Data Residency

If your workload is permanent and heavy, it may be more cost-effective to migrate the dataset fully into Azure and avoid continuous cross-cloud transfers.

Fabric shortcuts offer a great way to connect to S3 without moving data right away, but they don’t avoid AWS data transfer charges. If you access large volumes of S3 data every day, costs can add up quickly.

The most effective approach is usually to copy once, then refresh incrementally, while designing your data to minimize unnecessary reads. That way, you get the best of both worlds: the convenience of Fabric integration and a controlled cloud bill.

NHariGouthami · ‎09-09-2025

Spark JSON read and backfill optimizations

Barbara_Andrews · ‎09-03-2025

Copilot in Microsoft Fabric: 4 Ways It Supercharges Data Work
From building pipelines to optimizing SQL, Copilot turns natural language into powerful data solutions—fast. Discover how this AI assistant helps data engineers and analytics pros work smarter across Fabric.

Ready to prompt your way to productivity?

uzuntasgokberk · ‎09-02-2025

Scrape currency data from Hurriyet/Doviz with Python BeautifulSoup and store it in Microsoft Fabric Lakehouse or Warehouse step by step.

mk_sunitha · ‎09-02-2025

This guide walks you through building a front-end application using React, Apollo Client, and TypeScript, integrated with a GraphQL API hosted in Microsoft Fabric. It highlights how to integrate and configure useful local tools like auto-completion and code generation, focusing on delivering an intuitive and seamless developer experience with GraphQL.

NHariGouthami · ‎08-12-2025

🚀 Build a Fabric Data Agent in Minutes with GitHub Copilot Agent Mode

Discover how to supercharge your data workflows using GitHub Copilot Agent Mode in VS Code. Learn how to explore schemas, generate AI instructions, and create example queries—all in under an hour. If you're working with Microsoft Fabric, this fast and intuitive method is a game-changer for building conversational analytics agents.

Srisakthi · ‎07-23-2025

What happens when a person leaves organisation or project who had created fabric items and how to take ownership

ibarrau · ‎07-21-2025

Many releases and tools within a single platform are engaging both technical users (data engineers, data scientists, or data analysts) as well as end users. Fabric brought a unification of stakeholders into one shared space. That said, it doesn’t mean we have to use all the tools it offers.

If we already have an excellent data cleaning, transformation, or processing workflow using the very popular Databricks, we can keep using it. Fabric can be adopted or integrated in many ways.

Fabric brings us a next-generation lake storage system using an open data format. This means it allows us to use the most popular data file types for storage, and its file system works with conventional open-source structures. In other words, we can connect to our storage using tools capable of reading from it. We've also shown a bit about Fabric Notebooks and how they enhance the development experience.

In this simple tip, we’ll look at how to read from and write to our Fabric Lakehouse using Databricks.

charlyS · ‎07-14-2025

Resume your capacity ➜ Run pipelines ➜ Refresh Power BI datasets ➜ Suspend capacity — all automated thanks to PowerShell and Azure Automation !

Ilgar_Zarbali · ‎07-02-2025

Microsoft Fabric revolutionizes data architecture by offering a unified platform that integrates Power BI, data science, real-time analytics, and more. At the heart of this ecosystem is the Lakehouse, a powerful, flexible, and scalable storage layer tailored for modern data engineering workflows.

In this article, we explore how Lakehouses work in Microsoft Fabric, how to set one up, and how they serve as the foundation for managing both files and structured data—all without the traditional complexity of data platforms.

Rufyda · ‎07-02-2025

Microsoft Fabric is a powerful data platform that brings together data movement, transformation, and analytics in one unified environment. One of the core workflows in Fabric involves ingesting, exploring, transforming, and preparing data for analysis. This article provides an overview of how to work with data in Microsoft Fabric—starting from ingestion and ending with clean, ready-to-use datasets.

mabdollahi · ‎06-19-2025

💡Ever wondered how to bring AI into your data engineering workflows in Microsoft Fabric?

In my latest hands-on project, I show how to automate Sentiment Analysis on customer feedback using:

Microsoft Fabric Lakehouse
PySpark notebooks
Azure OpenAI (GPT-4)
Fabric Data Pipelines
Power BI for real-time insights

This solution is fully integrated, no external services required, and takes just 10 minutes to set up.

📖Check out the full blog and video tutorial to see it in action:

#MicrosoftFabric #AzureOpenAI #DataEngineering #SentimentAnalysis #PowerBI #PySpark

jehebr1 · ‎06-19-2025

Explore a more native and streamlined alternative to detect and anonymize PII data. With just a few lines of code, Fabric’s built-in AI functions like ai.extract and ai.generate_response allow you to identify and redact PII directly within your data pipelines - no external libraries required.

Anusha_M · ‎05-08-2025

Discover the Variable Library in Microsoft Fabric, designed to empower users to define and manage variables at the workspace level. Seamlessly integrate across various workspace items, including data pipelines, notebooks, and Lakehouse shortcuts. This feature addresses several pain points and enhances the overall user experience within Fabric.

Ayush_Tiwari · ‎05-06-2025

In the ever-evolving landscape of data management, optimizing storage and access times is paramount. This article delves into the innovative V-Order feature in Fabric, a game-changer for data read times and storage efficiency. Discover how V-Order's write-time optimization technique enhances performance, reduces costs, and transforms data operations. Join us as we explore the technical intricacies, benefits, and real-world applications of V-Order, and learn how it can revolutionize your data management strategies.

paulmd_MSFT · ‎04-25-2025

Learn how to connect cross Tenant Azure Data Factory (and other services) to Fabric.

ibarrau · ‎03-25-2025

There has been a data cleaning extension on the market for a while now that continues to attract attention. I’ve typically come across two types of profiles who clean data: those who love code (using Python or R) and those who use BI tools (Power BI, Tableau, etc.). I believe this extension aims to integrate the best of both worlds—using the power of Python with the visual convenience of traditional tools.

This article tells us about Data Wrangler. The extension that allows you to perform data transformations from a Python or Jupyter file with clicks, as if it were a BI tool.

uzuntasgokberk · ‎03-24-2025

Simplify analytics with Spark Connector for Microsoft Fabric Data Warehouse: seamlessly access Fabric Warehouse via a secure Spark API.

kaysauter · ‎03-21-2025

In my last article of this series, I covered on how to load csv files in a lakehouse automatically in MS Fabric. In this article, I am going to discuss how we can find and fix errors with notebooks easily.

Ilgar_Zarbali · ‎02-13-2025

OneLake is a unified storage system in Microsoft Fabric that eliminates data silos by storing all data in a single location. Now, we’re going to discuss Direct Lake, a new way Power BI interacts with this storage for faster performance and efficiency.

Direct Lake.png

Source: https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview

technolog · ‎12-18-2024

The dynamic landscape of modern business demands more than just data-driven insights; it requires a seamless and adaptive approach to business intelligence (BI) adoption. While advanced tools like Microsoft Fabric and Power BI have revolutionized data analytics, many organizations struggle to translate their potential into impactful decision-making. This article outlines actionable strategies to simplify BI adoption, emphasizing the importance of aligning key performance indicators (KPIs), fostering agile practices, enhancing data quality, and leveraging advanced visualizations. By integrating these principles, supported by Microsoft’s ecosystem, businesses can transform their BI capabilities into a competitive advantage.

To achieve a seamless BI adoption, organizations must focus on aligning KPIs with evolving business objectives while ensuring they remain measurable and actionable. Agile BI practices, including iterative development and user feedback, are crucial to maintaining relevance in a fast-changing environment. Robust data governance, effective integration of diverse sources, and consistent validation are essential for ensuring data quality and trust. Employing advanced visualization techniques tailored to user needs enhances clarity and engagement, while automating workflows minimizes errors and saves time.

User training and support play a pivotal role in fostering a culture of data literacy, ensuring employees can confidently navigate tools like Microsoft Fabric and Power BI. Monitoring user engagement and incorporating feedback ensures reports evolve in alignment with user needs. Measuring the impact of BI on business outcomes validates its strategic value, while collaboration between IT and business teams ensures a holistic approach. Finally, full utilization of Microsoft’s tools, including Power BI and Azure OpenAI, maximizes the transformative potential of BI systems. Together, these elements drive continuous improvement and enable organizations to achieve excellence in data-driven decision-making.

technolog · ‎12-17-2024

The dynamic landscape of modern business demands more than just data-driven insights; it requires a seamless and adaptive approach to business intelligence (BI) adoption. While advanced tools like Microsoft Fabric and Power BI have revolutionized data analytics, many organizations struggle to translate their potential into impactful decision-making. This article outlines actionable strategies to simplify BI adoption, emphasizing the importance of aligning key performance indicators (KPIs), fostering agile practices, enhancing data quality, and leveraging advanced visualizations. By integrating these principles, supported by Microsoft’s ecosystem, businesses can transform their BI capabilities into a competitive advantage.

To achieve a seamless BI adoption, organizations must focus on aligning KPIs with evolving business objectives while ensuring they remain measurable and actionable. Agile BI practices, including iterative development and user feedback, are crucial to maintaining relevance in a fast-changing environment. Robust data governance, effective integration of diverse sources, and consistent validation are essential for ensuring data quality and trust. Employing advanced visualization techniques tailored to user needs enhances clarity and engagement, while automating workflows minimizes errors and saves time.

User training and support play a pivotal role in fostering a culture of data literacy, ensuring employees can confidently navigate tools like Microsoft Fabric and Power BI. Monitoring user engagement and incorporating feedback ensures reports evolve in alignment with user needs. Measuring the impact of BI on business outcomes validates its strategic value, while collaboration between IT and business teams ensures a holistic approach. Finally, full utilization of Microsoft’s tools, including Power BI and Azure OpenAI, maximizes the transformative potential of BI systems. Together, these elements drive continuous improvement and enable organizations to achieve excellence in data-driven decision-making.

technolog · ‎12-16-2024

The dynamic landscape of modern business demands more than just data-driven insights; it requires a seamless and adaptive approach to business intelligence (BI) adoption. While advanced tools like Microsoft Fabric and Power BI have revolutionized data analytics, many organizations struggle to translate their potential into impactful decision-making. This article outlines actionable strategies to simplify BI adoption, emphasizing the importance of aligning key performance indicators (KPIs), fostering agile practices, enhancing data quality, and leveraging advanced visualizations. By integrating these principles, supported by Microsoft’s ecosystem, businesses can transform their BI capabilities into a competitive advantage.

To achieve a seamless BI adoption, organizations must focus on aligning KPIs with evolving business objectives while ensuring they remain measurable and actionable. Agile BI practices, including iterative development and user feedback, are crucial to maintaining relevance in a fast-changing environment. Robust data governance, effective integration of diverse sources, and consistent validation are essential for ensuring data quality and trust. Employing advanced visualization techniques tailored to user needs enhances clarity and engagement, while automating workflows minimizes errors and saves time.

User training and support play a pivotal role in fostering a culture of data literacy, ensuring employees can confidently navigate tools like Microsoft Fabric and Power BI. Monitoring user engagement and incorporating feedback ensures reports evolve in alignment with user needs. Measuring the impact of BI on business outcomes validates its strategic value, while collaboration between IT and business teams ensures a holistic approach. Finally, full utilization of Microsoft’s tools, including Power BI and Azure OpenAI, maximizes the transformative potential of BI systems. Together, these elements drive continuous improvement and enable organizations to achieve excellence in data-driven decision-making.

Ilgar_Zarbali · ‎11-20-2024

A Guide to Working with Lakehouses in Microsoft Fabric

This guide explores the data engineering experience in Microsoft Fabric, focusing specifically on Lakehouses. This guide takes a hands-on approach, demonstrating how to work with Lakehouses and manage data effectively.

Downloadable Files

kaysauter · ‎11-19-2024

In my last newsletter on LinkedIn, I explained how to export AdventureWorks2022 tables to csv files. If you don’t want to generate them, you can get them, as stated in the last post (after my edit in which I realized I made a mistake). My actual blog kayondata.com is currently being moved to another place, hence not updated at the moment.

CSV files are still very common, so I am using this approach to showcase some tricks and to exercise some data engineering stuff.

KevinChant · ‎11-18-2024

In this post I share my personal opinion as to why the Microsoft Fabric Data Engineer Associate certification is relevant right now.

Which is the new Data Engineering certification that was announced during the keynote at the European Microsoft Fabric Community Conference.

Find articles, guides, information and community news

Using Fabric Shortcuts to AWS S3: What You Need to Know About Data Transfer Costs

🚀 From 60 Minutes to 30 Minutes: Optimizing Spark JSON Processing & Delta Lake Backfill

4 Ways Copilot in Microsoft Fabric Accelerates Data Engineering and Analytics

Scrape Currency Data with Python and Load into MS Fabric

Build apps with Fabric API for GraphQL with code generation, intelliSense in VS Code

Fabric Data Agent with GitHub Copilot Agent Mode

Ownership of Fabric Items

Reading and Writing to Fabric Lakehouse with Azure Databricks

A Cost-Effective Fabric Solution Driven by Azure Automation

Microsoft Fabric Lakehouses: The Engine Room of Data Engineering

Working with Data in Microsoft Fabric

AI-Powered Sentiment Analysis in Microsoft Fabric with Azure OpenAI

PII Detection and Redaction with Fabric AI Functions

Transform Configuration Management with Fabric Variable Library

Unlock the power of V-Order: Revolutionize Data Read times and storage efficiency

Azure cross tenant access to Fabric Warehouse and Lakeshouse

Data Wrangler - The python transformation method similar to PowerBi Query Editor

Spark Connector for Fabric Warehouse: Unified Analytics

How to fix the red lines in MS Fabric notebooks

Direct Lake: Faster Power BI, No Refreshes, Seamless Fabric Integration

BI Adoption Simplified: Optimizing Data Practices with Microsoft Fabric (part 3)

BI Adoption Simplified: Optimizing Data Practices with Microsoft Fabric (part 2)

BI Adoption Simplified: Optimizing Data Practices with Microsoft Fabric (part 1)

Data Engineering with Microsoft Fabric: Efficiently Loading Data into a Lakehouse

Loading files automatically to bronze lakehouse

Why the Microsoft Fabric Data Engineer Associate certification is relevant

Helpful resources

FabCon is coming to Atlanta

Find articles, guides, information and community news

Helpful resources