Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

tharunkumarRTK

Overcome Initial Full Load Failures in Power BI Incremental Refresh with Bootstrapped Load (2/2)

In my previous blog I explained about Bootstrapped data loading and in this blog I am going to explain how you can automate partition refreshes using a tool developed using python. Please read that blog before checking this.

 

Automating Partition Refreshes with Python
To simplify the process, I developed a Python-based tool that automates partition refreshes.

Download link is in the bottom

It comes in two versions:

 

Version 1: Runs from a local machine

Version 2: Runs from a Microsoft Fabric Notebook using the Semantic Link library

 

Both versions use the Enhanced Refresh API to refresh partitions efficiently.

Version 1: Local Python Tool
This version includes three key files:

  • config.py
  • utils.py
  • main.py

Requirements:

  • Service Principal ID, Secret, and Tenant ID
  • Dataset ID and Workspace ID

Configuration Steps:

Open config.py and provide the required values. Adjust parameters such as delay, batch size, maximum parallelism, retry count, and timeout based on your environment.

tharunkumarRTK_0-1760955980031.png

 

2. In main.py, specify the table name and incremental refresh policy details.

tharunkumarRTK_1-1760956035822.png

 

3. Execute main.py script.

 

How It Works:

  • The tool automatically calculates partition names based on the policy.
  • It divides partitions into batches as per the defined batch size.
  • It checks if any refresh is in progress, then triggers batch refreshes sequentially.
  • Each batch executes only after the previous one completes successfully.

tharunkumarRTK_2-1760956090289.png
You can monitor progress through Power BI Service refresh history

tharunkumarRTK_3-1760956123981.png

 

or verify partition data in SSMS by expanding the table’s Partitions node.

tharunkumarRTK_4-1760956148949.png

 

tharunkumarRTK_5-1760956173301.png

 

After completing all the batches, it will stop the execution like below

tharunkumarRTK_6-1760956194316.png

 

All the partitions are loaded successfully

tharunkumarRTK_7-1760956212618.png

 

You might have observed, the first batch started at 8:26 AM and last batch completed at 11:00 AM. The whole process took 2.5 hours and the script did the whole job on its own. 🙂

Version 2: Microsoft Fabric Notebook
The second version is a Microsoft Fabric Notebook (Bootstrapped Data Load.ipynb) and leverages the Semantic Link library.

tharunkumarRTK_8-1760956236415.png

 

It does not require a Service Principal since it uses the current user’s identity. The user running the notebook must have appropriate permissions in the workspace hosting the semantic model.

tharunkumarRTK_9-1760956266202.png

 

Provide the necessary configuration values in the designated notebook cell and execute it. Similar to Version 1, it refreshes partitions sequentially in batches.

tharunkumarRTK_10-1760956291844.png

 

Points to Remember

  • The automation tool supports Task 2 only refreshing partitions in batches.
  • Parameters such as maxParallelism, batchSize, delay, timeout, and retryCount should be tuned according to your environment.
  • The tool refreshes one table at a time. For models with multiple fact tables using incremental refresh, run the process separately for each table.
  • In Version 1, verify that the automatically generated partition names match your model configuration before triggering the refresh.
  • In Version 1, bearer token expiry has been handled which means even when the bearer token expires during the batch processing it will regenerate a token on its own.
  • In Version 2, note that the Fabric notebook remains active during execution, which can increase compute consumption. If Fabric workloads are restricted in your organization, use Version 1 instead.
  • You might have observed that the intermediate log messages shows the current refresh status as ‘Unknown’, it is an expected behavior with Enhanced Refresh APIs

Conclusion
The Bootstrapped Initial Refresh technique is an effective way to overcome the limitations of the initial full load in Power BI incremental refresh. By first creating an empty table and then refreshing partitions in controlled batches, you can establish your model structure without encountering timeout, memory, or workload management issues.

This method ensures a smooth onboarding of large datasets into Power BI while maintaining optimal resource usage and performance.

You can download the tool from my git repository. I am not an expert in python, please feel free to correct my code or suggest any enhancements.

Hope you learned something new from this blog, do share your thoughts in the comments section.

Happy Learning!!!