March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello,
I have a LogTable listing all mutation events on some objects.
The row format is:
Object_ID: the object on which mutations are carried on
Mutation_Date: Timestamp of the mutation
Mutation_Parameter: the object parameter on which the mutation is done, e.g. Status, Urgency, Other
Old_Value: the value of the object parameter prior to the mutation
New_Value: the value of the object parameter after the mutation.
I would like to create a new column "Status_before_Mutation" showing the value of the status parameter prior to each mutation. In case of a status mutation, this is logically equal to the existing value in column Old_Value, but in case of other mutations, I need to consider the latest change of status prior to the time of the mutation being considered.
For each mutation, the logic I am pursuing is to select all rows with the same Object_ID, then filter rows with MutationParameter = "Status", then filter rows with Mutation_Date prior to Mutation_Date of the mutation being considered, sort the remaining rows in descending order of Date, pick the first one and extract the column vale "Old_Value" from this row.
Unfortunately, I get a circular dependency error. I guess I need to exclude the new column "Status_before_Mutation" from the CALCULATETABLE row context, which I thought I did with the ALLEXCEPT...
Here is the code leading to this circular dependency:
Status_before_Mutation = CALCULATE( VALUES(LogTable[Old_Value]); TOPN( 1; CALCULATETABLE( LogTable; ALLEXCEPT(LogTable; LogTable[Object_ID]);
LogTable[Mutation_Parameter] = "Status"; LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date]) ); LogTable[Mutation_Date]; DESC ) )
Another potential issue is maybe the use of EARLIER, which I have read is not recommended for large tables.
The LogTable is quite big, up to several hundred thousands rows. I have several thousands different objects each with a number of mutations in the range [1...50].
Any idea on solving this, possible avoiding the use of EARLIER?
Thanks for any suggestion.
Karim
Solved! Go to Solution.
Hi Jimmy,
Thanks for your proposition. I have never used RANKX before....
In the meantime, I could get rid of the circular dependency problem I had by using DISTINCT instead of VALUES in my CALCULATE new column:
Status_after_Mutation = CALCULATE( DISTINCT(LogTable[New_Value]); TOPN( 1; CALCULATETABLE( LogTable; ALLEXCEPT(LogTable; LogTable[Object_ID]); LogTable[Mutation_Parameter] = "Status"; LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date]) ); LogTable[Mutation_Date]; DESC ) )
Also, I corrected the column to be "Status_after_Mutation" using the LogTable[New_Value].
Now I am getting what I want (see marked column below):
In the last column, I have also included what I got with your proposition:
Status_before_Mutation = VAR Ranking = RANKX(FILTER(LogTable, LogTable[Object_ID] = EARLIER(LogTable[Object_ID]) && LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date])), RANKX(ALL(LogTable), LogTable[Mutation_Date]), , DESC, Dense) RETURN IF(Ranking = 1, LogTable[New_Value])
I have two questions now:
- Your solution only provides the status for a status mutation. I tried to modify it without success, as I don't quite understand the use of RANKX within RANKX. How should I modify it to get the current status for all mutations on the same object?
- With my solution using TOPN on a big mutation table (> 200,000 rows), I get the following error from PowerBI
"There is not enough memory to complete this operation. Please try again later when there may be more memory available".
Is RANKX or another approach more memory efficient on big tables? Of course I could first split the table in smaller tables before computing the new columns but if I could avoid it with a better method, that would be interesting.
Thanks
Karim
Addendum:
Based on the LogTable I gave as an example, here is the new column "Status_before_Mutation" I would like to get:
Object_ID | Mutation_Date | Mutation_Parameter | Old_Value | New_Value | Status_before_Mutation |
Object_A | 25.01.2018 17:38 | Status | open | ||
Object_A | 25.01.2018 17:40 | Urgency | 1 | open | |
Object_A | 26.01.2018 04:15 | Other | yy | open | |
Object_A | 26.01.2018 07:30 | Urgency | 1 | 2 | open |
Object_A | 29.01.2018 08:47 | Status | open | work | open |
Object_A | 29.01.2018 17:54 | Urgency | 2 | 1 | work |
Object_A | 30.01.2018 06:08 | Other | yy | zz | work |
Object_A | 30.01.2018 06:53 | Status | work | review | work |
Object_A | 31.01.2018 13:30 | Status | review | closed | review |
Object_B | 24.01.2018 08:47 | Status | open | ||
Object_B | 24.01.2018 08:47 | Urgency | 3 | open | |
Object_B | 26.01.2018 14:43 | Urgency | 3 | 2 | open |
Object_B | 26.01.2018 15:02 | Other | xx | open | |
Object_B | 02.02.2018 15:38 | Status | open | work | open |
Object_B | 07.02.2018 10:06 | Urgency | 2 | 1 | work |
Object_B | 07.02.2018 10:06 | Status | work | closed | work |
Hi Karim11,
Modify your calculate column using DAX below and check if it can meet your requirement:
Status_before_Mutation = VAR Ranking = RANKX(FILTER(LogTable, LogTable[Object_ID] = EARLIER(LogTable[Object_ID]) && LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date])), RANKX(ALL(LogTable), LogTable[Mutation_Date]), , DESC, Dense) RETURN IF(Ranking = 1, LogTable[New_Value])
Regards,
Jimmy Tao
Hi Jimmy,
Thanks for your proposition. I have never used RANKX before....
In the meantime, I could get rid of the circular dependency problem I had by using DISTINCT instead of VALUES in my CALCULATE new column:
Status_after_Mutation = CALCULATE( DISTINCT(LogTable[New_Value]); TOPN( 1; CALCULATETABLE( LogTable; ALLEXCEPT(LogTable; LogTable[Object_ID]); LogTable[Mutation_Parameter] = "Status"; LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date]) ); LogTable[Mutation_Date]; DESC ) )
Also, I corrected the column to be "Status_after_Mutation" using the LogTable[New_Value].
Now I am getting what I want (see marked column below):
In the last column, I have also included what I got with your proposition:
Status_before_Mutation = VAR Ranking = RANKX(FILTER(LogTable, LogTable[Object_ID] = EARLIER(LogTable[Object_ID]) && LogTable[Mutation_Date] <= EARLIER(LogTable[Mutation_Date])), RANKX(ALL(LogTable), LogTable[Mutation_Date]), , DESC, Dense) RETURN IF(Ranking = 1, LogTable[New_Value])
I have two questions now:
- Your solution only provides the status for a status mutation. I tried to modify it without success, as I don't quite understand the use of RANKX within RANKX. How should I modify it to get the current status for all mutations on the same object?
- With my solution using TOPN on a big mutation table (> 200,000 rows), I get the following error from PowerBI
"There is not enough memory to complete this operation. Please try again later when there may be more memory available".
Is RANKX or another approach more memory efficient on big tables? Of course I could first split the table in smaller tables before computing the new columns but if I could avoid it with a better method, that would be interesting.
Thanks
Karim
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
122 | |
89 | |
78 | |
67 | |
52 |
User | Count |
---|---|
199 | |
140 | |
96 | |
77 | |
68 |