March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I recently ran into the following problem and I'd like to know how FILTER behaves. I needed to compute a calculated column which would give me the DATEDIFF between the time in the current row and another row. I therefore had to apply a set of filter to my table to perform the correct calculation. This is the calc column I initially created:
Ciao @marcorusso
you're right, it *should* not behave like this with a mere 100k rows table. I've used similar functions in tables with million rows without issues.
However, there is something very suprising (to me) in your answer:
Overall, creating a calculated column over a large table is never a good idea.
This is very strange. Many people and even yourself said "the more you precalculate in advance, the faster the model will be" and it definitely makes sense. It's much better to let PBI to spend a couple of minutes more when loading the model than waiting seconds every refresh of filters.
So what do you mean by this?
In a large table I would store columns computed outside of Power BI (in SQL or in Power Query / M) and not with a calculated column in DAX, because the cost is high and the compression not optimal (especially when the column produces a small number of unique values).
Calculated columns are very useful when you have to compute data in small tables using data of other tables, maybe aggregating large volume of data in other tables - something that is more expensive in SQL rather than in DAX.
The calculated column described in this thread is so easy (and efficient if you have proper indexes) to calculate in SQL...
ok thanks @marcorusso
that makes sense.
honestly, I was not thinking about SQL as this looked like a fairly simple scenario in DAX (and I've used similar methods countless times - with success).
Regarding M, I never used PowerQuery to refer other rows: I always used PQ with a "single row" mindset (I've done complex transformations, but never "lookups" like this). I'll investigate.
I agree - PQ is not ideal for this kind of query (from a performance perspective). SQL should work pretty well with good indexes.
DAX is an option, but I wouldn't use it for a few million rows. But for 100k it's strange it raises the memory issue.
The performance is not ideal using the FILTER that way and different query plans might produce different performances. You should try the code below.
Time not working = IF ( 'Operator Audit'[Login or Logout] = 0, VAR currentTime = 'Operator Audit'[Time] VAR nextTime = CALCULATE ( MIN ( 'Operator Audit'[Time] ), ALLEXCEPT ( 'Operator Audit', 'Operator Audit'[Operator key], 'Operator Audit'[Date] ), 'Operator Audit'[Login or Logout] = 1, 'OperatorAudit'[Time] > currentTime ) VAR timenotworking = DATEDIFF ( currentTime, nextTime, MINUTE ) RETURN timenotworking )
I tried this code but it doesn't work, I keep getting the run out of memory error. As of now, the only piece of code that works is the second one from the original post, with that specific order of filter arguments. If I switch them, it doesn't work, because it goes out of memory.
I really don't get what is happening, tomorrow I'll try to check the model and the data again to see if it's some kind of mistake unrelated or only partially related with the code.
Hi Darlove,
thanks for the explanations. Yes, when possible I create my calc columns in Power Query, but I'm not an expert in the M language and also in PQ you can't access other rows as far as I know (which is very little so I might be wrong) therefore the kind of calc column I'm trying to compute wouldn't be possible. Am I correct in assuming this or is there a way to perform that kind of calculation in PQ? I'm not asking for the code, just "conceptually speaking"
It's definitely possible that the CALCULATE solution have the same problem, I was just guessing. The problem is that we are discussing about an optimization issue. Which means that every month you might have a different behavior of the engine. Overall, creating a calculated column over a large table is never a good idea.
However, it is a little bit surprising having this kind of problem with just 100k rows. The FILTER solution doesn't have a context transition, but the engine could end up in a query plan that is similar to the one with CALCULATE. Maybe there is something else in the model we're not considering and that has some other side effect.
You should study the problem comparing the different query plans. In order to simulate the calculated column, use EVALUATE ADDCOLUMNS ( table, "new column", <expression> )
in DAX Studio and see what is the impact of changing the order of the FILTER operators. I would also test the query plan using CALCULATE. As I said, it's a small table to generate these problems, so there should be something else.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
User | Count |
---|---|
26 | |
21 | |
20 | |
13 | |
12 |
User | Count |
---|---|
40 | |
27 | |
27 | |
21 | |
19 |