Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
lemaribdb
Helper II
Helper II

STAR Schema filter DIM table by FACT Table

Hello guys!

I have 2 dataflows, one with all DIM tables and one with all FACT tables.
One of the DIM tables is super huge, so huge I reach my workspace limits when it's loaded, I can't keep it like that.
It's a regualr star schema, the DIM table has common key with FACT table.
My idea was filtering the rows in this DIM table so only keys that are present in my FACT table are kept.

1. I was thinking of doing it in the main data model that use those two dataflows but I'm not sure if it will reduce the size, as the dataflow itself will still have this big DIM table?
2. My other idea is moving the pull of this big DIM table to the dataflow with FACT tables.
Would the first be just as good as the second approach and reduce my datamodel size?

Nonetheless on which step I do this, there still comes the actual filtering.
How can I make a simple statement in Power Query that will filter those unnecesary keys? I was thinking of right outer join and dropping unnecesary columns but there should be a simple way to emulate a WHERE statement?

My idea is adding such a step:

Table.SelectRows(#"Navigation 2", each List.Contains(FACT_Table[HDR_KEY], [DIM_HDR_KEY]))

Where the "Navigation 2" is my DIM table.
But the actual computation of this step takes more than 10 mins and it fails.

1 ACCEPTED SOLUTION

It has contract dates meaning it actually has a mapping to every row in all FACT tables. 
I found a solution, I just did some SQL INNER JOINS with view of my FACT table. Unfortunatelly I wasn't able to replicate that in Power Query (Using merge queries, filter rows, they all failed either because of working with on-prem data or just timeout out after few hours).
Shame it has to be done via SQL in the source, not in PQ.

View solution in original post

4 REPLIES 4
lbendlin
Super User
Super User

Please define "super huge" - how many rows (distinct values in [DIM_HDR_KEY])?  Is it a true dimension table or can it be further normalized? Have you considered using incremental refresh?

This DIM table itself takes 40% of my model size according to DAX studio, about 5gb of size itself, 11mln rows. I don't think incremental refresh helps me as I need to see the preview to do further transofrmations in PQ view.

What is the purpose of that table? What kind of data does it hold?

It has contract dates meaning it actually has a mapping to every row in all FACT tables. 
I found a solution, I just did some SQL INNER JOINS with view of my FACT table. Unfortunatelly I wasn't able to replicate that in Power Query (Using merge queries, filter rows, they all failed either because of working with on-prem data or just timeout out after few hours).
Shame it has to be done via SQL in the source, not in PQ.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors