Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
I have a Desktop (pbix) file which is connected to some Databricks Spark Delta tables using the Spark connector. One of the tables contain 600 million rows, so I access it via a view which limits it to the most recent 9 million rows while working with the Desktop file. I'm using Import mode and I've set up incremental refresh for this table. After I've published to the Power BI Premium workspace, I connect to the XMLA and there is only one partition in this table. How can I split it into multiple partitions so I don't have to refresh the full dataset all at once?
I know this is an old thread - but for others that might come here looking for an answer, this is the best way we have come up with:
Like you state - once you've set up your query/table for incremental refresh and published the dataset to Power BI service you will only see one partition before you have run an initial refresh on the dataset in PBI service. This initial refresh might fail from connection timeouts or memory issues depending on the amount of data you're trying to fetch and the datasource you connect to. But without partitions it's hard to come around that initial refresh pain.
To create all the partitions - just
After that your incremental refresh policy will run as intended.
Hi @mjohannesson ,
You can try to use folding query, please refer this article: Notes on Power BI Incremental Refresh
You can also refer the following articles about incremental refresh in details:
Best Regards,
Yingjie Li
If this post helps then please consider Accept it as the solution to help the other members find it more quickly.
Hi,
Thanks for the advice. I enabled "Enhanced Meta data format", uploaded the PBIX with incremental refresh and refreshed the dataset.
As I mentioned, the source table is a Databricks Spark Delta table with 600 million rows accessed via a view. When I did the first dataset refresh, I tried importing the full table, but then the refresh failed with an out of memory error. I then limited the view to data from 2019 and 2020, which is approx. 200 million rows, and the refresh succeeded. I now have a partitioned dataset, but the partitions for the years before 2019 are empty. How can I process them? When I tried, using SQL Server Management Studio, I got an error:
An error occurred while parsing EntityName. Line 4, position 29. (System.Xml)
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
34 | |
30 | |
18 | |
12 | |
8 |
User | Count |
---|---|
50 | |
35 | |
29 | |
14 | |
12 |