Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.
Hi there
--Edited everything into this for readability--
I have a SCD2 history table of people which include their Birthdate, Adresses over time etc. So to determine the amount of people that we have in the population at any given point in time i can use the PIT calculation like
NumberofPersons_Simple :=
VAR MinDate =
MIN ( 'Period'[Date] )
VAR Calc =
CALCULATE (
COUNTROWS ( 'Person' ),
'Person'[DW_ValidFrom] <= MinDate,
'Person'[DW_ValidTo] >= MinDate,
REMOVEFILTERS ( 'Period' )
)
RETURN
Calcl
But now I want to be able to filter out those people who at over a certain age at that point in time. For that i'll use the "Age" table and since there is not active relationship between those tables i'll have to lookup up the calculated age and then filter.
Below is my current calculation that solves that - however it's quiet slow once the dataset becomes larger and performance somehow also depends on wich visual that are in use.
Number of People :=
VAR MinDate =
MIN ( 'Period'[Date] )
VAR Calc =
CALCULATE (
COUNTROWS ( 'Person' ),
'Person'[DW_ValidFrom] <= MinDate,
'Person'[DW_ValidTo] >= MinDate,
FILTER (
VALUES ( 'Person' ),
VAR AgeCalculated =
IF (
'Person'[BirthDate] <= MinDate,
TRUNC ( YEARFRAC ( 'Person'[BirthDate], MinDate ) )
)
RETURN
CONTAINS ( VALUES ( 'Age'[Age] ), 'Age'[Age], AgeCalculated )
)
)
RETURN
Calc
I should somehow be able to first narrow down the table to only those rows within the valid SCD2 timeperiod, then Summarize by year and then filter out/relate those years to the Age table.
The performance depends on the visual, so if I for example add two visuals on the canvas - there is a factor 1:5 in calculation time. The one to the left has the Month in the columns - which makes it slow, while the other visual to the right doesn't have the attribute on the columns, but is filtered on the page instead
When I look at the server timings - fast one to the right has the following:
Note:
"Borger Oplysninger" is "People"
"Alder" is "Age"
"Foedselsdato" is "BirthDate"
Here it seems that the filtering is as intended. Rows are reduced to be within the ValidFrom/To range and it super fast over those approx 8 million filtered rows.
When the Month attribute is dragged into the column in the matrix (the one to the left), the Server timings gives the following.
So now it's suddently doing an "additional" select Birthdate, DW_ValidFrom, DWValidTo before returning the same result. So to me it seems that it's somehow "re-treating" the Month filter an additional time, but handling that column filter different than the filter on Month thats already selected in the Filter pane...
With Maaned År on columns in the Matrix visual the code is
// DAX Query DEFINE VAR __DS0FilterTable = TREATAS({"sep 2019"}, 'Periode'[Måned År]) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"), 'Periode'[Måned År ], 'Periode'[MaanedID], __DS0FilterTable, "AntalBorgere", '# Measures'[AntalBorgere], "AntalBorgere_FormatString", IGNORE('# Measures'[_AntalBorgere FormatString]) ) VAR __DS0PrimaryWindowed = TOPN( 102, SUMMARIZE(__DS0Core, 'Alder'[Alder], [IsGrandTotalRowTotal]), [IsGrandTotalRowTotal], 0, 'Alder'[Alder], 1 ) VAR __DS0SecondaryBase = SUMMARIZE(__DS0Core, 'Periode'[Måned År ], 'Periode'[MaanedID]) VAR __DS0Secondary = TOPN(102, __DS0SecondaryBase, 'Periode'[MaanedID], 1, 'Periode'[Måned År ], 1) VAR __DS0BodyLimited = NATURALLEFTOUTERJOIN( __DS0PrimaryWindowed, SUBSTITUTEWITHINDEX( __DS0Core, "ColumnIndex", __DS0Secondary, 'Periode'[MaanedID], ASC, 'Periode'[Måned År ], ASC ) ) EVALUATE __DS0Secondary ORDER BY 'Periode'[MaanedID], 'Periode'[Måned År ] EVALUATE __DS0BodyLimited ORDER BY [IsGrandTotalRowTotal] DESC, 'Alder'[Alder], [ColumnIndex]
If I remove the "Maaned År" from the Column visual i get this simpler query
// DAX Query DEFINE VAR __DS0FilterTable = TREATAS({"aug 2023"}, 'Periode'[Måned År ]) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"), __DS0FilterTable, "VAntalBorgere", '# Measures'[AntalBorgere], "v_AntalBorgere", IGNORE('# Measures'[_AntalBorgere FormatString]) ) VAR __DS0PrimaryWindowed = TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Alder'[Alder], 1) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, 'Alder'[Alder]
The 'Person' Table i a SCD Type 2 with the DW_ValidFrom, DW_ValidTo Date columns. So there is no direct active relationship to those two columns in the 'People' table. Each Person Age a the selected point in time depends on the seletion that the user makes, so there is also not an active relationship from the 'Age' table to the 'People' table
Number of People :=
VAR MinDate =
MIN ( 'Period'[Date] )
VAR Calc =
CALCULATE (
COUNTROWS ( 'Person' ),
'Person'[DW_ValidFrom] <= MinDate,
'Person'[DW_ValidTo] >= MinDate,
FILTER (
VALUES ( 'Person' ),
VAR AgeCalculated =
IF (
'Person'[BirthDate] <= MinDate,
TRUNC ( YEARFRAC ( 'Person'[BirthDate], MinDate ) )
)
RETURN
CONTAINS ( VALUES ( 'Age'[Age] ), 'Age'[Age], AgeCalculated )
)
)
RETURN
Calc
I add two visuals on the canvas. The one to the left has the Month in the columns - which makes it slow, while the other visual to the right doesn't have the attribute on the columns, but is filtered on the page instead
When i look at the server timings - fast one to the right has the following:
"Borger Oplysninger" is "People"
"Alder" is "Age"
"Foedselsdato" is "BirthDate"
Here it seems that the filtering is as intended. Rows are reduced to be within the ValidFrom/To range and it super fast over those approx 8 million filtered rows.
When the Month attribute is dragged into the column in the matrix (the one to the left), the Server timings gives the following.
So now it's suddently doing an "additional" select Birthdate, DW_ValidFrom, DWValidTo before returning the same result. So to me it seems that it's somehow "re-treating" the Month filter an additional time, but handling that column filter different than the filter on Month thats already selected in the Filter pane...
What are the queries generated by the 2 tables? Also, what is the performance like if you use the code I posted?
With Maaned År on columns in the Matrix visual the code is
// DAX Query
DEFINE
VAR __DS0FilterTable =
TREATAS({"sep 2019"}, 'Periode'[Måned År])
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
'Periode'[Måned År ],
'Periode'[MaanedID],
__DS0FilterTable,
"AntalBorgere", '# Measures'[AntalBorgere],
"AntalBorgere_FormatString", IGNORE('# Measures'[_AntalBorgere FormatString])
)
VAR __DS0PrimaryWindowed =
TOPN(
102,
SUMMARIZE(__DS0Core, 'Alder'[Alder], [IsGrandTotalRowTotal]),
[IsGrandTotalRowTotal],
0,
'Alder'[Alder],
1
)
VAR __DS0SecondaryBase =
SUMMARIZE(__DS0Core, 'Periode'[Måned År ], 'Periode'[MaanedID])
VAR __DS0Secondary =
TOPN(102, __DS0SecondaryBase, 'Periode'[MaanedID], 1, 'Periode'[Måned År ], 1)
VAR __DS0BodyLimited =
NATURALLEFTOUTERJOIN(
__DS0PrimaryWindowed,
SUBSTITUTEWITHINDEX(
__DS0Core,
"ColumnIndex",
__DS0Secondary,
'Periode'[MaanedID],
ASC,
'Periode'[Måned År ],
ASC
)
)
EVALUATE
__DS0Secondary
ORDER BY
'Periode'[MaanedID], 'Periode'[Måned År ]
EVALUATE
__DS0BodyLimited
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Alder'[Alder], [ColumnIndex]
If I remove the "Maaned År" from the Column visual i get this simpler query
// DAX Query
DEFINE
VAR __DS0FilterTable =
TREATAS({"aug 2023"}, 'Periode'[Måned År ])
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
__DS0FilterTable,
"VAntalBorgere", '# Measures'[AntalBorgere],
"v_AntalBorgere", IGNORE('# Measures'[_AntalBorgere FormatString])
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Alder'[Alder], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Alder'[Alder]
That doesn't make much sense to me. As far as I can see, __DS0Core contains all the values you want to see in the final table, but it is then splitting it into 2 tables, independently grouped by year and by age, and then joining them back together again. No idea why it would do it like that.
I agree - it's weird that it doing that calculation as it alread has the result.
I'm thinking that these two below end up somehow been completely independent queries in the left visual while the engine thinks smarter in the left - although same results and filters.
Rather than calculating each person's age you could convert the age into a date and then use that as a direct filter, e.g.
Number of People :=
VAR MinDate =
MIN ( 'Period'[Date] )
VAR MaxBirthDate =
DATEADD ( { MinDate }, MIN ( 'Age'[Age] ), YEAR )
VAR Calc =
CALCULATE (
COUNTROWS ( 'Person' ),
'Person'[DW_ValidFrom] <= MinDate,
'Person'[DW_ValidTo] >= MinDate,
'Person'[BirthDate] > MaxBirthDate
)
RETURN
Calc
The Calculation gives the following error:
Function 'DATEADD' cannot be used with columns added by ADDCOLUMNS or SUMMARIZE functions
The end user can choose to see only those that are at Age 5,10 or 23 in the 'Age'[Age] filter.
So the filtering should filter out the rows from 'People' where the calculated Age at the selected Min ('Periode'[Date]) is not 5,10 or 23. But first it need to find the initial valid rows from SCD2 'People' table for the calculation by assuring that the Min ('Periode'[Date]) is between the DW_ValidFrom and DW_ValidTo and that the initial
CALCULATE (
COUNTROWS ( 'Borger Oplysninger' ),
'Borger Oplysninger'[DW_ValidFrom] <= MindsteDato,
'Borger Oplysninger'[DW_ValidTo] >= MindsteDato)
OK, try
Number of People :=
VAR MinDate =
MIN ( 'Period'[Date] )
VAR MinAge =
MIN ( 'Age'[Age] )
VAR MaxBirthDate =
DATE ( YEAR ( MinDate ) - MinAge, MONTH ( MinDate ), DAY ( MinDate ) )
VAR Calc =
CALCULATE (
COUNTROWS ( 'Person' ),
'Person'[DW_ValidFrom] <= MinDate,
'Person'[DW_ValidTo] >= MinDate,
'Person'[BirthDate] > MaxBirthDate
)
RETURN
Calc
Hmm
It doesn't seem to be solving the issue.
The end user can choose any Age that they want to filter/filter out. I have tried to meassure performance in DaxStudio and the problem with performance comes when i add "Month" attribute to the Visual.
Below are same Matrix with the same "Number of People" Measure. Only difference is that the left visual has the Month on the Column showing data for "Jan 2019" since i've selected "Jan 2019" in the filter pane.
Difference here is the time to render the visual. The Visual to the right shows same figures since the filter pane applies the "Jan 2019" to both visuals - but the Visual to the right is about 5-6 times faster in rendering the same numbers.
How can this be?
can you post a picture of the relationships between tables? and also say which table the month column in the visual, and in the filter pane, is coming from.
Check out the November 2023 Power BI update to learn about new features.
Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.