Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hello,
To work around the issues we had with importing our large fact table (1.6 billion rows) into PBI, I decided to try my own ADL Gen 2 and use dataflow.
My plan was to breakdown the fact table into smaller CSV files, put them on my ADL Gen2 folder, create a CDM folder with one main entity (fact table) and the below partitions, create a dataflow on top of the CDM folder (external dataflow) and use PBI to report off of the dataflow. Here are my partitions, for example, for data up to March 3, 2019:
1) 2018 data
2) 2019-01
3) 2019-02
4) 2019-03-01
5) 2019-03-02
6) 2019-03-03
I was able to copy the CSV files to my ADL Gen 2, create the CDM, link it to my dataflow and create the PBI report off of it. It took more than 4 hours to import the data into PBI desktop and the rendering performance is satisfactory. Here are my questions:
1- I would like to schedule PBI service to refresh my report every night. How can I tell dataflow/ PBI not to refresh any partition except the very last one? I don't want to refresh all data but to append yesterday's data only.
2- According to MS link for CDM structure, my only choice for the integer data type in CDM JSON file is int64. Given that I have many 4-byte dimension ids in my fact table, I am probably wasting a lot of space by using int64. Is there any way that I specify int32 as a data type in CDM?
Thanks
Hi @amalekshahi ,
For your first requirement, it seems that you could set schedule refersh for dataflow under setting in power bi service.
In addition, you could enable incremental refresh for dataflow entities to refresh yesterday's data only if you using Power BI Premium capacity.
More details, please refer to this blog which should be much helpful.
For your second requirement, I'm afraid that there should be no options for us to pecify int32 as a data type in CDM by my research.
Best Regards,
Cherry
Hi Cherry,
PBI Service scheduled refresh on dataset level will refresh all partitions in the CDM folder. I just need the last partition to be refreshed. Also, the incremental refresh feature on dataflow level is disabled for "external" dataflows (dataflows on ADL Gen2).
Thanks
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 56 | |
| 55 | |
| 31 | |
| 17 | |
| 14 |