Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Anonymous
Not applicable

Reducing large dataset based on values in related table

Hi,

I'd like to understand best-practice for building this dataset as I've not worked with data on this scale before.

I'm not the owner of the data I'm using, it's SQL views created by our finance team. Our business is quite new to Power BI so I don't have much help internally in terms of best-practices and optimising - your feedback would be great!

 

I have a star schema like this, all with numerical index keys:

Customer_Channel 1:1 Customer

Customer 1:* Transactions

Location 1:* Transactions

Invoice Channel 1:* Transactions

Product 1:* Transactions

Calendar 1:* Transactions

 

In power query I've dropped unnecessary columns, and filtered the transaction table based on a date value to the period I wanted.

In this state there are still >24 million rows to load from the transaction table.

 

There are many transactions which I do not require in the dataset, but the attribute by which I would filter them lies in the related tables. e.g. keep transactions where the related Customer_Channel[Sales office] IN ('name1','name2') etc.

 

From what I know, it would be suitable to INNER JOIN the tables to be left with the rows that I want.

  1. Using a merge in power query hangs desktop, it doesn't feel very efficient.
    1. I understand this is 'query folding', and if I wanted to filter both directions this could fall over? e.g. filter Customer to only customers with Transactions in the period. Filter transactions to only customers with [sales office] x.
    2. In power query editor, if I merge with inner join, then have a column I don't need which is then deleted, is there a more efficient way to do this? This method feels like why it's quite resource intensive in desktop.
  2. I could use a native SQL query for each table, each query having a CTE for the other tables I want to filter by and INNER JOIN them. This seems like a weird duplication of effort on the SQL side.

Is there a method I've not thought of? What's the recommended method?

 

Another consideration is that this data is already in the Power BI service, serving reports my colleagues have built. I understand this is where dataflows come into play? Should I be trying to piggy-back off their dataset somehow? If so an overview would be really helpful.

 

Thanks for your advice.

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @Anonymous ,

 

In brief:

You should be able to filter your Transactions table based on the customer codes that relate back up to Customer Channel, but I'd need to understand the values in both Customer Channel and Customer, how they relate to one another, and how Customer relates to Transactions.

Ideal method in my opinion, if there's no direct lineage from Transactions to Customer Channel would be to do this at the SQL server side by creating a view filtered according to your precise needs.

If you can't do the above, then 2nd best method would be to connect to your Transactions table using Direct Query. This will avoid you having to load 24m rows into the model and Power BI will just query the server for exactly what it needs for each visual. There can be performance downsides to this depending on how complicated you make your report, and also restricytions on the transformations you can perform, but likely faster at this volume of rows than using an import model.

Merges in Power Query are generally very expensive. You may be able to do an inner join between your tables if they're from the same server/db and get this to fold to the server, but I've not tested.

Dataflows - these are just online versions of Power Queries. If someone has already made a Dataflow with the correct filters applied to the data then, by all means, use this. However, it will not give you a quicker or easier route to your desired outcome if it's not already been made.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

1 REPLY 1
BA_Pete
Super User
Super User

Hi @Anonymous ,

 

In brief:

You should be able to filter your Transactions table based on the customer codes that relate back up to Customer Channel, but I'd need to understand the values in both Customer Channel and Customer, how they relate to one another, and how Customer relates to Transactions.

Ideal method in my opinion, if there's no direct lineage from Transactions to Customer Channel would be to do this at the SQL server side by creating a view filtered according to your precise needs.

If you can't do the above, then 2nd best method would be to connect to your Transactions table using Direct Query. This will avoid you having to load 24m rows into the model and Power BI will just query the server for exactly what it needs for each visual. There can be performance downsides to this depending on how complicated you make your report, and also restricytions on the transformations you can perform, but likely faster at this volume of rows than using an import model.

Merges in Power Query are generally very expensive. You may be able to do an inner join between your tables if they're from the same server/db and get this to fold to the server, but I've not tested.

Dataflows - these are just online versions of Power Queries. If someone has already made a Dataflow with the correct filters applied to the data then, by all means, use this. However, it will not give you a quicker or easier route to your desired outcome if it's not already been made.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.