Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Anonymous
Not applicable

Direct Query to Multiple Datasets - Peformance

Hi,

 

I'm looking to rebuild my data infrastructure in Power BI away from using dataflows that feed into multiple datasets, and looking into the possibility of replacing the dataflows with relational datasets, and instead linking those datasets via directquery in another dataset.

 

So for example, I will have a dataset called masterdata, AR, AP, GL, and then will use a single other dataset to link these modular datasets into one.

 

The idea is that this will give me much greater flexibility, but i'm not certain on the impact to performance this will have. Initial tests havent given me a great idea, and i fear that i wont know until building out the whole thing, which will be time consuming. Does anyone have any idea if this approach will negatively impact performance?

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I applaud you for your decision to cut out unnecessary steps in your process.

 

The single biggest consideration you need to have is cardinality.  Specifically the cardinality of the fields that you use to link the datasets together. That cardinality must be kept as low as possible. Low 100s preferred. 10K still somewhat possible. As soon as you hit 1M you are doomed. Truly doomed.  If you want to get a really nice shock you can look at the generated DAX query in DAX Studio or others.  I have seen DAX queries of 6GB  (yes, Gigabytes) every.time.a.user.interacts.with.the.report  - just because I hadn't checked the cardinality (which turned out to be 1.3M)

View solution in original post

5 REPLIES 5
lbendlin
Super User
Super User

I applaud you for your decision to cut out unnecessary steps in your process.

 

The single biggest consideration you need to have is cardinality.  Specifically the cardinality of the fields that you use to link the datasets together. That cardinality must be kept as low as possible. Low 100s preferred. 10K still somewhat possible. As soon as you hit 1M you are doomed. Truly doomed.  If you want to get a really nice shock you can look at the generated DAX query in DAX Studio or others.  I have seen DAX queries of 6GB  (yes, Gigabytes) every.time.a.user.interacts.with.the.report  - just because I hadn't checked the cardinality (which turned out to be 1.3M)

Anonymous
Not applicable

Thanks for the reply!

 

So often the datasets will be linked via a core masterdata table, which would be roughly 4500 rows, and is the one side of the one to many. However, the many side will go into the millions depending on the dataset - is this going to prove to be a barrier to this approach?

It will be as soon as you have a measure that filters the many side. I recommend you watch the queries very closely, especially for slow loading visuals. The issue will be obvious once you see ginormous enumerations.

Anonymous
Not applicable

Thanks for your help. Would a bridge table be a solution potentially? So i could bridge the many side with the same masterdata (~4500 rows), turning it into a 1 to many, which ultimately then filters the million+ rows but within its own dataset?

I generally don't like bridge tables but in that scenario it might be helpful to shield the "other"  data model from the excessive cardinality. Try it out.

Helpful resources

Announcements
Sept PBI Carousel

Power BI Monthly Update - September 2024

Check out the September 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.