March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I'm looking to rebuild my data infrastructure in Power BI away from using dataflows that feed into multiple datasets, and looking into the possibility of replacing the dataflows with relational datasets, and instead linking those datasets via directquery in another dataset.
So for example, I will have a dataset called masterdata, AR, AP, GL, and then will use a single other dataset to link these modular datasets into one.
The idea is that this will give me much greater flexibility, but i'm not certain on the impact to performance this will have. Initial tests havent given me a great idea, and i fear that i wont know until building out the whole thing, which will be time consuming. Does anyone have any idea if this approach will negatively impact performance?
Solved! Go to Solution.
I applaud you for your decision to cut out unnecessary steps in your process.
The single biggest consideration you need to have is cardinality. Specifically the cardinality of the fields that you use to link the datasets together. That cardinality must be kept as low as possible. Low 100s preferred. 10K still somewhat possible. As soon as you hit 1M you are doomed. Truly doomed. If you want to get a really nice shock you can look at the generated DAX query in DAX Studio or others. I have seen DAX queries of 6GB (yes, Gigabytes) every.time.a.user.interacts.with.the.report - just because I hadn't checked the cardinality (which turned out to be 1.3M)
I applaud you for your decision to cut out unnecessary steps in your process.
The single biggest consideration you need to have is cardinality. Specifically the cardinality of the fields that you use to link the datasets together. That cardinality must be kept as low as possible. Low 100s preferred. 10K still somewhat possible. As soon as you hit 1M you are doomed. Truly doomed. If you want to get a really nice shock you can look at the generated DAX query in DAX Studio or others. I have seen DAX queries of 6GB (yes, Gigabytes) every.time.a.user.interacts.with.the.report - just because I hadn't checked the cardinality (which turned out to be 1.3M)
Thanks for the reply!
So often the datasets will be linked via a core masterdata table, which would be roughly 4500 rows, and is the one side of the one to many. However, the many side will go into the millions depending on the dataset - is this going to prove to be a barrier to this approach?
It will be as soon as you have a measure that filters the many side. I recommend you watch the queries very closely, especially for slow loading visuals. The issue will be obvious once you see ginormous enumerations.
Thanks for your help. Would a bridge table be a solution potentially? So i could bridge the many side with the same masterdata (~4500 rows), turning it into a 1 to many, which ultimately then filters the million+ rows but within its own dataset?
I generally don't like bridge tables but in that scenario it might be helpful to shield the "other" data model from the excessive cardinality. Try it out.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
124 | |
89 | |
84 | |
70 | |
51 |
User | Count |
---|---|
206 | |
143 | |
97 | |
79 | |
68 |