Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I thought this had been asked before, but I didn't find a thread via search.
A GA connection has a query pattern like so:
let Source = GoogleAnalytics.Accounts(), account = Source{[Id="654321"]}[Data], webprop = account{[Id="UA-654321-1"]}[Data], profile = webprop{[Id="123456"]}[Data], gadata = Cube.Transform( profile, { { Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"} }, { Cube.AddMeasureColumn, "Sessions", "ga:sessions" } } ), in gadata
That seems to provide coverage for the GA API except for the start date and end date parameters. Are those parameters exposed in some way? If not, how does one specify a query like "users last month". Would it be possible to add them as optional parameters to the GoogleAnalytics.Accounts() or Cube.Transform() function?
Our solution to un-sampling is to break a month/year query to many daily queries, and aggregate the data later on the client-side.
Here is a code
let Source = GoogleAnalytics.Accounts(), #"1" = Source{[Id="999"]}[Data], #"2" = #"1"{[Id="999"]}[Data], #"3" = #"2"{[Id="999"]}[Data], #"newCube" = Cube.Transform(#"3", { {Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"}}, {Cube.AddAndExpandDimensionColumn, "ga:deviceCategory", {"ga:deviceCategory"}, {"Device Category"}}, {Cube.AddAndExpandDimensionColumn, "ga:userAgeBracket", {"ga:userAgeBracket"}, {"Age"}}, {Cube.AddMeasureColumn, "Sessions", "ga:sessions"} }), combinedData = Table.Combine({ Table.SelectRows(#"newCube", each ([Date] = #date(2017, 1, 1))), Table.SelectRows(#"newCube", each ([Date] = #date(2017, 1, 2))), ... {days in between} ... Table.SelectRows(#"newCube", each ([Date] = #date(2018, 3, 4))), Table.SelectRows(#"newCube", each ([Date] = #date(2018, 3, 5))) }) in combinedData
When we run that code it makes PowerBI to send 429 GA requests. One request per day.
So, Power BI is actually requests GA with proper startdate and enddate params.
One problem still remains - when you combine data on the client side it becomes a bit inaccurate in terms of unique counters.
Thank you for the above code. This works to circumvent the sampling issue!
I use a code like this:
let Source = GoogleAnalytics.Accounts(), #"1" = Source{[Id="999"]}[Data], #"2" = #"1"{[Id="999"]}[Data], #"3" = #"2"{[Id="999"]}[Data], #"Added Items" = Cube.Transform(#"3", { {Cube.AddAndExpandDimensionColumn, "ga:channelGrouping", {"ga:channelGrouping"}, {"Default Channel Grouping"}}, {Cube.AddAndExpandDimensionColumn, "ga:customfield", {"ga:customfield"}, {"Customfield"}}, {Cube.AddAndExpandDimensionColumn, "ga:yearMonth", {"ga:yearMonth"}, {"Month of Year"}}, {Cube.AddMeasureColumn, "Revenue", "ga:transactionRevenue"}, {Cube.AddMeasureColumn, "Transactions", "ga:transactions"} }), #"combinedData" = Table.Combine({ Table.SelectRows(#"Added Items", each ([Month of Year] = "201803")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201804")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201805")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201806")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201807")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201808")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201809")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201810")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201811")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201812")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201801")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201802")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201903")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201904")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201905")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201906")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201907")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201908")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201909")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201910")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201911")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201912")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201901")), Table.SelectRows(#"Added Items", each ([Month of Year] = "201902")), Table.SelectRows(#"Added Items", each Text.Contains([Month of Year], "2017")) }) in combinedData
This works, but for performance reasons and scalability I'm looking for an improvement over this as I now query again and again also the historical data. Ideally I would like to create a combined datasource that contains all data up till and including last month. And then have a seperate connection for the data for this month. And combine the two. That way I don't have to send all seperate monthly requests for the historic data anymore, which saves on GA requests. That historic datasource should of course update as soon as a new month starts, so for instance when August starts my historic dataset should automatically contain data up till and inclusing July 2018 (as I aim to avoid manual tasks).
Is this possible in the query editor? Anybody any ideas?
Hi all,
I just wanted to see if there has been any updates on this topic or if anyone found a way around this?
I am getting the correct figures by applying the above mentioned steps for date and using collapse however I have to filter my data which contains a model name in all the page paths. This works really well on the GA query builder but when I add the "contains" filter in Power BI I don't get the correct visitors numbers. It would allow me to collapse and remove as it says that it is not a cube.
Hi, all
It seems, first you should check are actual GA requests which PBI do.
It is possible to see them in log files.
Turn logging on -> https://docs.microsoft.com/en-us/power-bi/desktop-troubleshooting-sign-in -> refresh your report -> navigate to the Traces folder on your local computer. Search for text like "https://www.googleapis.com/analytics/v3/data/ga". It will be somewhere in log files.
Compare requests URI with expected from GA query builder. If they are the same, your issue does not relate to GA Connector.
Our solution to un-sampling is to break a month/year query to many daily queries, and aggregate the data later on the client-side.
Here is a code
let Source = GoogleAnalytics.Accounts(), #"1" = Source{[Id="999"]}[Data], #"2" = #"1"{[Id="999"]}[Data], #"3" = #"2"{[Id="999"]}[Data], #"newCube" = Cube.Transform(#"3", { {Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"}}, {Cube.AddAndExpandDimensionColumn, "ga:deviceCategory", {"ga:deviceCategory"}, {"Device Category"}}, {Cube.AddAndExpandDimensionColumn, "ga:userAgeBracket", {"ga:userAgeBracket"}, {"Age"}}, {Cube.AddMeasureColumn, "Sessions", "ga:sessions"} }), combinedData = Table.Combine({ Table.SelectRows(#"newCube", each ([Date] = #date(2017, 1, 1))), Table.SelectRows(#"newCube", each ([Date] = #date(2017, 1, 2))), ... {days in between} ... Table.SelectRows(#"newCube", each ([Date] = #date(2018, 3, 4))), Table.SelectRows(#"newCube", each ([Date] = #date(2018, 3, 5))) }) in combinedData
When we run that code it makes PowerBI to send 429 GA requests. One request per day.
So, Power BI is actually requests GA with proper startdate and enddate params.
One problem still remains - when you combine data on the client side it becomes a bit inaccurate in terms of unique counters.
@intrasight, interesting. I'm not very familier with GA's API, I can talk to the dev who worked on this. Particularly, I'm a bit surprised that the inclusion of Date dimension breaks the aggregation on GA.
PowerQuery's multidimensional model had limited support for additional parameters to the cube. We've only began adding it for SAP and AS. So for the GA connector, the start/end date parameter is automatically populated via the filter on date dimension. I think there's could be a bug in the PowerQuery folding logic, when the date dimension is collapsed, we should not have sent &dimension=ga:date.
I'll follow up on this and get back to you.
It's the same reason that it would be wrong in MDX - you can't aggregate client-side. Have someone who knows MDX look at the GA integration and I'm sure they can straighten it out. Or have them contact me.
I did see the additional parameters in SAP and AS, and thought "that's the solution".
Thanks for investigating.
Hi,
I agree that parameters would probably have been a better solution for this. We didn't have such a mechanism at the time and alas we went for the filter of the date dimension. Nonetheless, the approach we have today should work and before we go ahead and revise the design, I would like to understand what the problem is.
We don't do any aggregations in Power Query for cubes. All the data that is shown comes back directly from the server. So this isn't the problem. Unless you import the data into Power BI for reporting, in which case measures are always aggregated locally, at the aggregation of the selected dimension attributes, because that's just what the thing does.
When you say you are seeing the wrong number of the user metric.
Are you seeing this number in the (Power) Query Editor or in Power BI reporting whiteboard?
The two requests are likely for computing the previews on refresh for each step and not to compute a single result. If you look at both requests, the only different is that the date dimension is removed in the second one. That's because the first request is the one that is used to refresh the preview in the step before the last.
Thanks,
Tristan
All the details of the problem are in this thread. Again in summary, client-side aggregation is being done because of buggy query folding, and because GA is effectively a cube, you are thus giving wrong numbers. This is all just straight-up "M" at this point since obvioulsy I wouldn't move to the next step if the numbers are wrong. I'd be happy to do a web meeting with you next week to show the details. But you can do the same queries which were presented in this discussion thread.
We don't do aggregrations locally. All cube operations (Expand, filters, collapse) always results in a query that is 100% sent and run on the service. When you collapse a dimension, we simply don't request it in the query. This is why the date is gone in the second request you've posted. This is equivalent to not selecting a dimension attribute in MDX:
select {[Measure].[Users]} on 0, filter([Date].[Date].[Date], <date filter>) on 1 from [GA]
Then when the date is collapsed, we issue:
select {[Measure].[Users]} on 0 from (select filter([Date].[Date].[Date], <date filter>) on 1 from [GA])
I tried the same query on our GA account and the issue doesn't reproduce. I get the same results if I aggregate locally using List.Sum over the column than if I collapse the date dimension attribute.
That being said, you seem savvy and I'm sure you've identified something. I'm very interested in getting to the bottom of this and I'll gladly setup a call.
Hi Tristan,
"send a frown" tries to open Outlook, which I don't have. What's the alternative?
If you look at my network traces, you'll observe the flaw in the GA connector. It issues two calls and one of them does include the ga:date dimension, which will obviously result in the wrong answer.
> I tried the same query on our GA account and the issue doesn't reproduce. I get the same results if I aggregate locally using List.Sum over the column than if I collapse the date dimension attribute.
If you get the same answer aggregating locally, then you are reproducing the problem. I am frustrated that you aren't taking the time to read this full discussion thread.
Can you use the "Send-A-Frown" feature in Power BI to send feedback. It doesn't seem like these forums have direct message and I'd rather not post my email here. You can ask for "Tristan". 🙂
Hi,
Since I can't reach you, how about you contact me.
chris AT activeinterface DOT com
Thanks,
Chris
@intrasight, we narrowed the problem down. In summary, lifting the start/end date to a cube level parameter will address this problem.
Cool. A cube-level parameter is the only generic solution. I'm meeting with the client on Monday. Is there any guidance you can give on the timeframe of that update?
Just to add some details. The issue is that Google Analytics does sampling. The more precise the query, the less accurate it becomes. See this page. Our use of the date dimension for filtering date ranges exacerbates the issue. So while we will likely fix this by using parameters in the future, keep in mind that the problem will always be there to a degree.
Hi Tristan!
Any news on this, my problem besides propper aggregation, is that I get sampled data instead of actual values, and since I'm reporting sales my data is not real.
If I use a date range filter, the original query (confirmed with fiddler) is for the whole range of the DB (going all the way to 2013) and then the filter is applied, so i only get the results from the sampled filter... so no, its not useful.
Any news on this? is it going to be fixed/addressed soon?
The only current way to use the GA connector is to allow it to query back to the beginning of time and then to filter. So sad.
Yes it's sad, because its useless in a lot of scenarios, for you is Users for me is sales (i just cant report them out of sampled data). I guess implementing this shouldnt be that hard!
Hold on. Sampled data is a different matter - not an issue with the connector semantics. If you want unsampled data, you need GA Premium.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.