Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Find articles, guides, information and community news

Most Recent
FabricPam
Administrator
Administrator

¡Conoce al Campeón Mundial de Power BI de México y Panamá! Habrá sesiones en vivo, competiciones y un montón de otras oportunidades. Sigue leyendo para enterarte de todo.

Read more...

FabricPam
Administrator
Administrator

Conheça o Campeão Mundial de Power BI - do Brasil! Sessões ao vivo, competições e outras oportunidades. Continue lendo para mais detalhes.

Read more...

FabricPam
Administrator
Administrator

 

 Live sessions, contests and other opportunities for students and those early in their career. Read on for more details.

Read more...

techies
Super User
Super User

While your Moodle LMS data provides a complete snapshot of daily users, course enrollments, and activity statuses, it obscures the vital story educators, administrators, and analysts really care about: what changes occurred since yesterday?

Read more...

tharunkumarRTK
Super User
Super User

In my previous blog I explained about Bootstrapped data loading and in this blog I am going to explain how you can automate partition refreshes using a tool developed using python. Please read that blog before checking this.

 

Automating Partition Refreshes with Python
To simplify the process, I developed a Python-based tool that automates partition refreshes.

Download link is in the bottom

It comes in two versions:

 

Version 1: Runs from a local machine

Version 2: Runs from a Microsoft Fabric Notebook using the Semantic Link library

 

Both versions use the Enhanced Refresh API to refresh partitions efficiently.

Version 1: Local Python Tool
This version includes three key files:

  • config.py
  • utils.py
  • main.py

Requirements:

  • Service Principal ID, Secret, and Tenant ID
  • Dataset ID and Workspace ID

Configuration Steps:

Open config.py and provide the required values. Adjust parameters such as delay, batch size, maximum parallelism, retry count, and timeout based on your environment.

tharunkumarRTK_0-1760955980031.png

 

2. In main.py, specify the table name and incremental refresh policy details.

tharunkumarRTK_1-1760956035822.png

 

3. Execute main.py script.

 

How It Works:

  • The tool automatically calculates partition names based on the policy.
  • It divides partitions into batches as per the defined batch size.
  • It checks if any refresh is in progress, then triggers batch refreshes sequentially.
  • Each batch executes only after the previous one completes successfully.

tharunkumarRTK_2-1760956090289.png
You can monitor progress through Power BI Service refresh history

tharunkumarRTK_3-1760956123981.png

 

or verify partition data in SSMS by expanding the table’s Partitions node.

tharunkumarRTK_4-1760956148949.png

 

tharunkumarRTK_5-1760956173301.png

 

After completing all the batches, it will stop the execution like below

tharunkumarRTK_6-1760956194316.png

 

All the partitions are loaded successfully

tharunkumarRTK_7-1760956212618.png

 

You might have observed, the first batch started at 8:26 AM and last batch completed at 11:00 AM. The whole process took 2.5 hours and the script did the whole job on its own. 🙂

Version 2: Microsoft Fabric Notebook
The second version is a Microsoft Fabric Notebook (Bootstrapped Data Load.ipynb) and leverages the Semantic Link library.

tharunkumarRTK_8-1760956236415.png

 

It does not require a Service Principal since it uses the current user’s identity. The user running the notebook must have appropriate permissions in the workspace hosting the semantic model.

tharunkumarRTK_9-1760956266202.png

 

Provide the necessary configuration values in the designated notebook cell and execute it. Similar to Version 1, it refreshes partitions sequentially in batches.

tharunkumarRTK_10-1760956291844.png

 

Points to Remember

  • The automation tool supports Task 2 only refreshing partitions in batches.
  • Parameters such as maxParallelism, batchSize, delay, timeout, and retryCount should be tuned according to your environment.
  • The tool refreshes one table at a time. For models with multiple fact tables using incremental refresh, run the process separately for each table.
  • In Version 1, verify that the automatically generated partition names match your model configuration before triggering the refresh.
  • In Version 1, bearer token expiry has been handled which means even when the bearer token expires during the batch processing it will regenerate a token on its own.
  • In Version 2, note that the Fabric notebook remains active during execution, which can increase compute consumption. If Fabric workloads are restricted in your organization, use Version 1 instead.
  • You might have observed that the intermediate log messages shows the current refresh status as ‘Unknown’, it is an expected behavior with Enhanced Refresh APIs

Conclusion
The Bootstrapped Initial Refresh technique is an effective way to overcome the limitations of the initial full load in Power BI incremental refresh. By first creating an empty table and then refreshing partitions in controlled batches, you can establish your model structure without encountering timeout, memory, or workload management issues.

This method ensures a smooth onboarding of large datasets into Power BI while maintaining optimal resource usage and performance.

You can download the tool from my git repository. I am not an expert in python, please feel free to correct my code or suggest any enhancements.

Hope you learned something new from this blog, do share your thoughts in the comments section.

Happy Learning!!!

ajaybabuinturi
Memorable Member
Memorable Member

If you are working on Power BI and Fabric, probably you came across the items like Workspace and App many times. These two are seems to sound similar, you publish a report then publish it again as an App. So, what is the real difference between them?
Blog Cover.jpg

Read more...

FataiSanni
Advocate II
Advocate II

 

If you've ever struggled with bulk-editing Power BI models, wished you could version control your semantic models properly, or wanted to share reusable model components with your team, you're in for a treat. The Tabular Model Definition Language (TMDL) view in Power BI Desktop is transforming how developers work with semantic models, and it might just become your new favorite feature.

In this post, I'll walk you through the practical benefits of TMDL view with real-world examples that demonstrate why this feature deserves a spot in every Power BI developer's toolkit.

 

FataiSanni_3-1760995772319.png

 

 

Read more...

uzuntasgokberk
Super User
Super User

Set up Power BI email subscriptions with Copilot to send AI summaries. Learn scheduling, subject/message setup, testing, and key limitations.

Read more...

audreygerred
Super User
Super User

Why I’m a Broken Record About Star Schema in Power BI

 

The Repetition Is Intentional

I talk about star schema again and again because it’s foundational. It’s not just a best practice — it’s the difference between scalable, performant, governable models and ones that become brittle, slow, and hard to trust. Power BI may not require it, but that doesn’t mean it’s optional.

 

Why Star Schema Is So Important

  • Performance: Power BI’s VertiPaq engine uses columnar storage, which means data is stored and compressed by columns rather than rows. This allows for highly efficient scanning, filtering, and aggregation — especially when fact tables are narrow and well-structured. Star schema supports this by keeping fact tables slim and focused, while dimension tables tend to be much wider, which maximizes compression and speeds up query execution.
  • Simplicity: Star schema simplifies the model by clearly separating facts from dimensions. But simplicity doesn’t mean fewer tables. If your model represents ten distinct business entities, it’s not simpler to put them into one ‘dimension’ table — it’s confusing.
  • Semantic Clarity: Dimensions in a star schema represent real-world business concepts/entities (think customer information, product information, associate information, etc.). This makes it easier for users to understand the model, write DAX, and build reports. It also improves discoverability and trust — users can see what each table represents without guessing. As Mr. Kimball taught us, ‘the data warehouse is only as good as the dimension attributes’. Each entity (e.g., customer, product, profit center) deserves its own dimension to preserve semantic clarity and avoid tangled logic. For example, we shouldn’t find the name of a product that was bought in the same dimension table as the name of who it was sold to and the name of the sales-person that gets commission – these are three distinct business entities.
  • Copilot & AI-readiness: Tools like Copilot rely on semantic structure to interpret natural language queries. A well-formed star schema gives Copilot the context it needs to generate accurate measures, filters, and insights. Without it, AI features become less reliable and harder to govern.

Why You Shouldn’t Ignore It — Even If Power BI Lets You

Power BI is flexible — but that flexibility can be dangerous. You can build snowflake schemas, flat tables, or spaghetti joins. But here’s what you risk:

  • Slow performance due to inefficient joins and poor compression.
  • Confusing relationships that make DAX harder to write and debug.
  • Poor AI experiences — Copilot struggles with ambiguous or overly complex models.
  • Certification risk — Models that don’t follow star schema are harder to endorse confidently.
  • Semantic ambiguity — Mixing different business entities (e.g., customer name and profit center) into a single dimension table creates confusion. These entities often belong to different domains, have different grain, and serve different analytical purposes. When they’re lumped together, it becomes unclear what the table represents, which leads to modeling errors, misleading visuals, and broken trust.
  • Storage inefficiency — Star schema supports columnar compression, which is most effective when columns contain repetitive, low-cardinality values. When models deviate from star schema — for example, by embedding dimension attributes in fact tables or creating wide, flat structures — compression suffers. This leads to larger memory footprints, slower refreshes, and degraded performance.

Is It Still a Star Schema If Only Part of Model Is Set Up With Star Schema?

Not really. A partial star schema is like a half-built bridge — it might work for now, but it’s not structurally sound. If some dimensions are snowflaked, if facts are mixed with dimensions, if many business entities are mixed in one dimension table or if relationships are ambiguous, you lose the benefits of clarity, performance, and semantic alignment.

 

Should a Model Be Certified If It Doesn’t Follow Star Schema?

This is where governance comes in. A certified model should meet a minimum standard of:

  • Semantic clarity
  • Performance
  • Governability
  • AI-readiness

If a model doesn’t follow star schema, it should be reviewed carefully. Exceptions might exist, but they should be rare and well-documented.

 

Should Copilot Be Used on Models That Don’t Follow Star Schema?

Technically, it can be used — but practically, it’s risky. Copilot relies on semantic structure to interpret user queries. Without a star schema:

  • It may misinterpret relationships.
  • It may suggest incorrect measures or filters.
  • It may frustrate users with inconsistent results.

If you want Copilot to shine, give it a model that’s built to be understood.

 

Final Thought: Star Schema Is Not Just an Option — It’s a Mindset

Being a broken record about star schema means you’re advocating for clarity, performance, and trust. It’s not dogma — it’s discipline. And in a world of AI-powered analytics, that discipline matters more than ever.

tharunkumarRTK
Super User
Super User

Building an insightful Power BI dashboard for large volumes of transactional data often requires balancing performance and efficiency. Loading the complete dataset into the semantic model each time new records are added to the data warehouse can be time-consuming and resource-intensive.

 

In most transactional systems, historical data remains unchanged, meaning there is little value in reloading it repeatedly. To address this, Power BI provides an Incremental Refresh policy, which allows only recent data to be refreshed while preserving the historical portion. You can learn more about this feature in the official Microsoft documentation.

 

Once the policy is configured and the model is published to the Power BI Service, the first refresh performs a full data load. Subsequent refreshes are incremental, appending new data to the existing historical data. While this approach works well in most cases, certain scenarios make the initial full load difficult or leads to failure.

 

Scenarios That Cause Initial Full Load Failures
Power BI Refresh Time Limits Power BI imposes refresh time limits. In Pro workspaces, the limit is two hours, while Premium capacities allow up to five hours. If your database cannot return the entire dataset within this window, the refresh will fail with a timeout error. Reference: Power BI Refresh Time Limits

 

Power BI Command Memory Limit Power BI enforces a maximum amount of memory that a semantic model can use during refresh. If the refresh process exceeds this limit, it will fail. Reference: Command Memory Limit

 

Power BI Command Timeout Limit Each Power Query (M) expression has a 10-minute command timeout limit. If the underlying data source cannot return data within this period, the refresh will fail. Reference: Power Query Timeouts

 

Database Workload Management (WLM) Rules Database administrators often enforce workload management policies to balance resource usage across teams. Two such common rules that affect Power BI incremental refresh are:

1. Concurrent Query Limit: If Power BI sends more queries than the allowed threshold, the database may terminate them automatically.

2. Query Scan Volume Limit: This rule restricts the amount of data a single query or user can scan at a time. It often affects refresh operations when the RollingWindowGranularity in the incremental refresh policy is large.

During the initial full load, Power BI sends multiple queries to the data source. Some of these may scan a significant amount of data, triggering one or more of the limits above and causing the load to fail.

Modifying these limits may not be possible for model developers. This raises an important question — can we load all data incrementally instead of performing one large initial load? The answer is yes, using a technique called Bootstrapped Initial Refresh

What is Bootstrapped Initial Refresh?
Bootstrapped Initial Refresh allows you to create all necessary partitions in your semantic model without performing a full data load. Instead, you load an empty dataset during the initial refresh, avoiding timeout and memory issues.

Let’s look at how this can be implemented.

 

Step 1: Prepare the Semantic Model
In this example, we connect to a SQL Server database containing a table of sales transactions.

Base Query

SELECT [profit_margin],
[quantity_sold],
[customer_name],
[customer_email],
[purchase_date],
[payment_method],
[shipping_address],
[product_name],
[product_category],
[product_price]
FROM [dbo].[sales_rawtransactions]


We then create two parameters, RangeStart and RangeEnd, required for the incremental refresh policy, and add them to the WHERE clause.

WHERE [purchase_date] >= 'RangeStart'
AND [purchase_date] < 'RangeEnd'


Step 2: Bootstrap the Table
To bootstrap the table, we introduce a filter condition that always evaluates to false, such as 1 = 2. This ensures the query returns no rows during the initial load.

WHERE 1 = 2
AND [purchase_date] >= 'RangeStart'
AND [purchase_date] < 'RangeEnd'


After applying this condition, the table becomes empty.

tharunkumarRTK_1-1760954936537.png


You can now load it into Power BI and configure the incremental refresh policy.

 

tharunkumarRTK_2-1760954963394.png
When the model is published to the Power BI Service and the initial refresh is triggered, it completes quickly because no rows are returned.

tharunkumarRTK_3-1760955001827.png


However, Power BI still creates all necessary partitions based on the defined refresh policy.

 

tharunkumarRTK_6-1760955221747.png

 


This approach ensures the model structure is created without consuming significant resources or triggering the limits mentioned earlier.

 

Step 3: Remove the Filter Using the XMLA Endpoint
Once the initial load completes successfully, the next step is to remove the false condition (1 = 2). You cannot do this from Power BI Desktop because republishing the model would overwrite it and re-trigger the full load. Instead, you can use SQL Server Management Studio (SSMS) and connect to the XMLA Endpoint of the workspace.

In SSMS, choose Analysis Services as the connection type and authenticate using Microsoft Entra credentials.

tharunkumarRTK_7-1760955263604.png
2. In Object Explorer, right-click your semantic model and select:

tharunkumarRTK_8-1760955287602.png

3. Locate the M expression containing your query, remove the 1=2 condition,

tharunkumarRTK_9-1760955312960.png


and execute the script.

tharunkumarRTK_10-1760955359710.png

 



This updates the dataset definition without replacing the model.

Step 4: Load Data into All Partitions
Since the initial load produced an empty table, none of the partitions contain data. Triggering a standard refresh now will only refresh the most recent partition, as defined by the incremental policy. To populate the historical partitions, you must refresh them in batches.

Several options are available for this:

Refreshing all partitions simultaneously may again exceed resource limits. Therefore, batching refresh operations is essential.

 

In my next blog I will explain how you can automate the process of refreshing all partitions sequentially using a python script 

kushanNa
Super User
Super User

Tired of switching between Power BI reports and manually syncing filters?
In this guide, you’ll learn how to connect multiple embedded Power BI reports so that changing a slicer in one report instantly updates another — in real time!

Using the Power BI JavaScript API, we’ll walk through how to:

  • Detect slicer changes as they happen

  • Capture live filter values

  • Apply them across reports automatically

 no refresh delays — just seamless, interactive filtering across your dashboards.

Read more...

jenbeiser
Microsoft Employee
Microsoft Employee

As Copilot becomes a central part of the Microsoft Fabric experience, ensuring your semantic models are optimized is key to unlocking its full potential. A well-prepared model helps Copilot generate accurate, insightful, and context-aware responses. Here’s a checklist of best practices—along with the why behind each one.

Read more...

uzuntasgokberk
Super User
Super User

Learn how to use Power BI Co-Pilot on mobile and tablet to generate AI-powered report summaries with visual citations. Discover its features, limitations for Power BI Co-Pilot in Mobile / Tablet

Read more...

AnkitaaMishra
Super User
Super User

Struggling with long Power BI refresh times? Discover how Incremental Refresh can cut them down dramatically. Follow this real-world demo with sample data and see how to configure, monitor, and optimize refreshes like a pro.

Read more...

govind_021
Super User
Super User

Have you ever felt like your Power BI model is a jungle of measures? One for YTD, another for QTD, one more for MTD, and suddenly you have Measure Overload Syndrome. Don’t worry—you’re not alone. Luckily, Calculation Groups are here to save your model (and your sanity).

Read more...

Helpful resources

Join Blog
Interested in blogging for the community? Let us know.