Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I’m currently facing a challenge on a project. I’m working on a Power BI dashboard for some SQL tables that have several million rows. This is causing a very long refresh time and I want to bring that refresh time down.
I know that many will say to filter and trim down your data at the source (in SQL statement used to import) that way it will be a much smaller table when imported into Power BI.
However, let’s say I want to perform several aggregations and transformations on the tables and then filter out unwanted data. Will it affect my results if I take a filter condition and apply it at the source instead of importing the whole table and applying it after all my aggregations and transformations?
If I need to provide more clarity let me know.
Hi @mannriv33 ,
I can share inputs based on the experience of working with data visualization in last several years. I am very passionate about keeping the data to gather the best possible insights about business process and have put in much time in cleaning, transforming and combining data.
To start with, what kind of tables we are bringing in the reporting tool, i.e. Power BI in our case. I have worked with ERP related tables (AX2012/D365). As you may know, with such ERPs, we are about to encounter thousands of tables and we cannot really import all or even few hundreds, out of those thousands, and at least few 10s are expected to be multimillion row tables.
Tables are large but tidy/neat/clean
In deciding what to do in such case or even for that matter a simple case, one should go by “Process” definition based on Kimball’s teaching in his methodology. Let’s take example of any of a process, say Sales ordering. We expect that there will be one sales table containing all the header information for sales orders received and another table, say sales line having item wise detail for related sales order. If these two tables are well designed ones, we can take them directly to Power Bi. I would say at least do not aggregate them at source, because aggregations and summarization can be done in Power BI (in power query, or dynamically in visuals). Since we are bringing in raw data, we can summarize the results, as many ways as needed, and we do not need to go back to the source frequently. But what if the tables are multi million, think logically, how much data is practically needed in usual case, say few months, few years or everything together? Depending on the answer, you filter it by date or with some key of the table, else you may think of incremental refresh, so only updated data enters in power BI, rest data is always there and so refresh time is relatively very less.
Another approach is to find if some of your data is static. Say you have 10 year long sales data, but you expect upto 9 year old data is not going to anyway change, you load in power query and disable refresh,
Resultant tables is large and will be available by combining many scattered tables, transformations
Now as another scenario, ERP like system may have 10s of tables to combine/filter/clean just to make one tidy sales order table and this is just one header table we are considering, the same might be the case for several of such business processes. If you take all such sub tables to power BI and transform there, theoretically, you may go well initially but sooner you start facing challenges, like large storage needs, refreshing time limits, complex structure and difficult traceability.
In such case, it would be wise to combine multiple tables first at source (say sql query joining such sub tables), or if there are lot of inconsistencies, then using SSIS package or something like that to create base tables.
Eventually in both above cases, its not that you would not need any transformation in Power BI, but it would be of say a basic type, which is going to differ report to report and not so resource intensive.
So don’t we use/need aggregate tables from source?
We do need but mainly for cross process reporting. Say you want to see two or more processes together, say sales and production. In such case we may use aggregated view of Sales base tables and linking with production aggregate view and then bring those to Power BI.
These were some thoughts based on what I experienced so far. I strongly believe in Kimball’s approach and certainly, it helps to make things simple. It is difficult to summarize everything in a single post and many things depend on the use cases as well. There is no one answer, which fits all. Review and decide.
I remained in confusion when I was working in this area and always asking the question, what approach is right. I found many answers on line or through books, but in fact learnt a lot while applying the knowledge and faced lot of rework due to facing limitation of the approach initially opted.
Incase you decide to use summarized data, it is likely that you already start building limitations in reporting. If you are a report consumer, you may be satisfied with a particular summarized report, but if are a report developer, eventually users shall come up with new requirements every now and then and you may need to go back to source data for alternate aggregations etc.
Hope it helps.
Hi @mannriv33 ,
It is better to create aggregations in the in the backend itself. Doing multiple joins and aggregation in the import query is very expensive and definitely impacts refresh time.
Thanks,
Pallavi
"Will it affect my results?" - impossible to say without details.
--
Are you happy with your existing transforms? Are your queries folding? Can you use incremental refresh?
Check out the July 2025 Power BI update to learn about new features.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
10 | |
7 | |
7 | |
6 | |
6 |