Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hello,
I have a performance issue with cohort analysis in Power BI. I have a log of sessions for a web site in table ga_sessions. It contains 30 million records and has the following columns
(and also other fields as "ad campaign", "region", etc)
I want to select a cohort of users in past and look how they behave up to present. In T-SQL I write
SELECT s.medium, SUM(s.cost) cost, sum(s.revenue) revenue FROM ( SELECT gid, MIN(time) time FROM ga_sessions WHERE medium = 'email' AND time >= '2016-03-01' AND time < '2016-04-01' GROUP BY gid ) c JOIN ga_sessions s ON c.gid = s.gid AND s.time >= c.time GROUP BY s.medium ORDER BY cost DESC
This query finishes in 300 milliseconds. But I need the same in Power BI:
EVALUATE CALCULATETABLE ( SUMMARIZE ( GENERATE ( SUMMARIZE ( 'Cohort Sessions', 'Cohort Sessions'[gid], "c_time", MIN ( 'Cohort Sessions'[time] ) ), FILTER ( Sessions, Sessions[gid] = 'Cohort Sessions'[gid] && Sessions[time] >= [c_time] ) ), Sessions[medium], "cost", SUM ( Sessions[cost] ), "revenue", SUM ( Sessions[revenue] ), ), 'Cohort Sessions'[cohort medium] = "email", 'Cohort Sessions'[time] >= DATE ( 2016, 03, 01 ), 'Cohort Sessions'[time] < DATE ( 2016, 04, 01 ) ) ORDER BY [cost] desc
DAX query finishes in 7 seconds (9 seconds on cold cache)!
If I comment part Sessions[time] >= [c_time] the query finishes in 1.7 sec, but this condition is very important for me.
I created 2 instance of ga_sessions table ('Sessions' and 'Cohort Sessions') in Power BI since I need to provide user with distinct filters for cohort definition and futher sessions.
How I can optimize DAX query ?
Any thought around doing the join and such in your query ("M") versus in DAX?
User | Count |
---|---|
64 | |
59 | |
47 | |
33 | |
32 |
User | Count |
---|---|
84 | |
75 | |
56 | |
50 | |
44 |