Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
I have a dimensional model in star schema with 5 dimension tables and 2 large fact tables which have about 100 million rows.
Unfortunately, I cannot aggregate these facts because the necessary metrics are using DISTINCTCOUNT and must be displayed in different perspectives according to the dimension tables selected by the final users.
As it is known that DISTINCTCOUNT measures cannot be added, as they are categorized as non-additive, the only solution was to load all 100 million rows into Power BI using import mode.
This led to relatively poor performance, and I'd like to check out possible improvements for large fact tables like these.
Some things that I did were:
- Creation of dimensional modeling star schema
- Creation of SKs keys using BIG INT in all tables
- Elimination of unnecessary columns
- Typing of columns duly included
For each day of the month, I have approximately 1 million records.
Based on this needs, is there more improvements to apply?
Solved! Go to Solution.
Hi @afonsofeliciano ,
Here are a few additional suggestions that may help:
1. Use partitioning: Partitioning your fact table can help improve query performance by allowing the database engine to only access the relevant partitions when executing a query. You can partition your fact table based on a date column, which would align well with your use case of having approximately 1 million records per day.
2. Use columnstore indexes: Columnstore indexes can improve query performance for large fact tables by storing data in a column-wise format, which can be more efficient for aggregations. You can create a clustered columnstore index on your fact table to take advantage of this.
3. Use DirectQuery mode: If your fact table is too large to import into Power BI, you can consider using DirectQuery mode instead. This mode allows you to query the data source directly, without importing the data into Power BI. However, this may come with some limitations, such as slower query performance and less flexibility in terms of visualizations.
4. Optimize your queries: Finally, you can optimize your queries to improve performance. This may involve using filters to limit the amount of data being queried, or using calculated columns to pre-aggregate data before it is displayed in a visualization. You can also use query profiling tools to identify slow-performing queries and optimize them accordingly.
Best regards,
Community Support Team_yanjiang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @afonsofeliciano ,
Here are a few additional suggestions that may help:
1. Use partitioning: Partitioning your fact table can help improve query performance by allowing the database engine to only access the relevant partitions when executing a query. You can partition your fact table based on a date column, which would align well with your use case of having approximately 1 million records per day.
2. Use columnstore indexes: Columnstore indexes can improve query performance for large fact tables by storing data in a column-wise format, which can be more efficient for aggregations. You can create a clustered columnstore index on your fact table to take advantage of this.
3. Use DirectQuery mode: If your fact table is too large to import into Power BI, you can consider using DirectQuery mode instead. This mode allows you to query the data source directly, without importing the data into Power BI. However, this may come with some limitations, such as slower query performance and less flexibility in terms of visualizations.
4. Optimize your queries: Finally, you can optimize your queries to improve performance. This may involve using filters to limit the amount of data being queried, or using calculated columns to pre-aggregate data before it is displayed in a visualization. You can also use query profiling tools to identify slow-performing queries and optimize them accordingly.
Best regards,
Community Support Team_yanjiang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@afonsofeliciano , Hope you are on premium, You can do a POC on Microsoft Fabric
For Premium, you can enable it for users ( for pro/ppu can use 60 days trial)
Microsoft Fabric capacities are available for purchase: https://blog.fabric.microsoft.com/en-us/blog/announcing-microsoft-fabric-capacities-are-available-fo...
What is Microsoft Fabric and How it Works
https://www.youtube.com/watch?v=p-v0I5S-ybs&list=PLPaNVDMhUXGYU97pdqwoaociLdwyDRn39
But is there something to help with millions of rows during import data to power bi? I cannot sum distinct-count metrics because they are non-additive and the volume is too large for power premium, even using incremental refresh