The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I thought this had been asked before, but I didn't find a thread via search.
A GA connection has a query pattern like so:
let Source = GoogleAnalytics.Accounts(), account = Source{[Id="654321"]}[Data], webprop = account{[Id="UA-654321-1"]}[Data], profile = webprop{[Id="123456"]}[Data], gadata = Cube.Transform( profile, { { Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"} }, { Cube.AddMeasureColumn, "Sessions", "ga:sessions" } } ), in gadata
That seems to provide coverage for the GA API except for the start date and end date parameters. Are those parameters exposed in some way? If not, how does one specify a query like "users last month". Would it be possible to add them as optional parameters to the GoogleAnalytics.Accounts() or Cube.Transform() function?
not really, if the query is small enough (lets say less than a year) GA won't sample data (in their API) premium or not! If you catch the request with fiddler you can see that PBI is requesting for sampled data! if we could run smaller queries like you ask, i would have what i need. (BTW tableau handles this part just fine... just that im on trial)
Can you share the Fidder request that shows it requesting sampled data?
Or are you saying that because there is no timeframe specified that PBI is requesting the entire timeframe and that is implicitly causing sampling.
so there it is, but i guess this is the response, so as you can see we recieve sample data, since the start date is really far off. The same report using google api but with a date range of 1 year, and i get unsampled data!
Ok, so you are getting sampling because of the implicit date range. So I'll extend my "so sad" to cover that also, as there is currently no way to avoid it.
Yes, that is true. But sampling isn't an issue when querying one or two months, as my reports will do, but for longer timeframes (the start-date, end-date parameters you'll be adding to the cube) the sampling increases. My reported issue is, I assume you now agree, that you were doing client-side aggregation.
If you really want to get clever with your connector, then you have to have the query folding logic send multiple queries back to GA to reduce sampling. Google documents that approach in their blog, etc.
For example, if I query ga:yearMonth with start-date = jan-2014 and end-date = dec-2015, then for a busy non-premium GA site I'll likely get numbers that are slightly or way off. But if I split this into 24 separate queries with start-dates and end-dates bracketing single months, then I'll avoid sampling errors - at the price of making more queries.
Hi,
It's a long shot but has anyone found a solution to this without using a third party service as an middle man?
I agree with the many other posts without being able to pass a date filter it is forcing sampling and redering the connector mostly useless. A number of posts across the Power BI forums confuse the issue and suggest post filtering.
So far my many attempts to work round the issue have failed. 😞
Thanks
Thomas
Hi Thomas,
Microsoft did indicate in this tread that they were considering/planning to ad the start/end date parameters. Perhaps someone from Microsoft can chime in. I ended up proxying my requests through a middle-tier service to get what I wanted.
- Chris
Hey, Chris!
I have also encountered the data sampling in Power BI.
It even occurred within small projects because Power BI required data for the whole period.
I have found the solution in creating our own connector that would divide queries by days and import them in the SQL Server Database. This way, we managed to solve the problem.
Eventually, our solution escalated into a separate product. It is more convenient to work with one database which contains not only Google Analytics data, but other data as well. Such as data from Google AdWords and CRM systems.
You can try to set up integration on our website.
I am always there and happy to help if you have any questions.
Hi @intrasight
Thanks for the post, somehow I started to have the same issue just a few days ago. Do you have any news about the fix or the workaround?
I have the same issue just with dates and E-Commerce data. Only that if I want the correct data as my Analytics UI I have to filter by weeks. 😞
I don't know why the folding is so bad for the data for some properties and not for others.
Let me know if there is an Idea or a Bug issue to vote for.
Best,
J.
I've not heard any updates on this. I assume it just not a high-priority item 😞
Thanks for reporting the issue. We are now tracking this issue and will address it soon
PowerQuery models this with filters:
#"Filtered Rows" = Table.SelectRows(gadata, each Date.IsInPreviousWeek([Date]))
Hi,
For some reason the discussion forum won't let me respond to your most recent post, so I'm responding to this one.
By client-side, I mean that the sum is being calculated by the client - Power BI in this case - as opposed to server-side (in GA). Query folding to GA cannot work if the GA query includes the date dimension, which it still does after adding the filter. You'll see this if you look at the GA API call being made.
The correct answer from GA for my query "users last month" is 893. Power BI is reporting 1034, which is the sum of the users on each of the 31 days. "users" and perhaps half of the other metrics cannot be aggregated client-side, and so Power BI is reporting incorrect values. For query folding to work against GA, it would have to remove the date dimension from the API call - which is clearly not happening.
@intrasight - have you found a solution to client vs. server side calculation yet?
If not, take a look at Scitylana
Then you will be able to import un-aggregated Google Analytics data into PBI or even choose to have it in a database and run direct query.
You can try it here, www.scitylana.com
One last work-around I thought to try
let Source = GoogleAnalytics.Accounts(), GaAccount = Source{[Id="xxx"]}[Data], GaWebProp = GaAccount{[Id="UA-xxx-1"]}[Data], GaProfile = GaWebProp{[Id="13844713"]}[Data], #"Added Items" = Cube.Transform(GaProfile, {{Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"}}, {Cube.AddAndExpandDimensionColumn, "ga:yearMonth", {"ga:yearMonth"}, {"Month of Year"}}, {Cube.AddMeasureColumn, "Users", "ga:users"}}), #"Filtered Rows" = Table.SelectRows(#"Added Items", each [Date] >= #date(2016, 1, 1) and [Date] <= #date(2016, 1, 31)), #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(#"Filtered Rows", {"Date"}) in #"Collapsed and Removed Columns"
Which results in these two calls to GA
https://www.googleapis.com/analytics/v3/data/ga?ids=ga:13844713&filters=ga:date==20160101,ga:date==20160102,ga:date==20160103,ga:date==20160104,ga:date==20160105,ga:date==20160106,ga:date==20160107,ga:date==20160108,ga:date==20160109,ga:date==20160110,ga:date==20160111,ga:date==20160112,ga:date==20160113,ga:date==20160114,ga:date==20160115,ga:date==20160116,ga:date==20160117,ga:date==20160118,ga:date==20160119,ga:date==20160120,ga:date==20160121,ga:date==20160122,ga:date==20160123,ga:date==20160124,ga:date==20160125,ga:date==20160126,ga:date==20160127,ga:date==20160128,ga:date==20160129,ga:date==20160130,ga:date==20160131&metrics=ga:users&dimensions=ga:yearMonth&start-date=2016-01-01&end-date=2016-01-31 https://www.googleapis.com/analytics/v3/data/ga?ids=ga:13844713&filters=ga:date==20160101,ga:date==20160102,ga:date==20160103,ga:date==20160104,ga:date==20160105,ga:date==20160106,ga:date==20160107,ga:date==20160108,ga:date==20160109,ga:date==20160110,ga:date==20160111,ga:date==20160112,ga:date==20160113,ga:date==20160114,ga:date==20160115,ga:date==20160116,ga:date==20160117,ga:date==20160118,ga:date==20160119,ga:date==20160120,ga:date==20160121,ga:date==20160122,ga:date==20160123,ga:date==20160124,ga:date==20160125,ga:date==20160126,ga:date==20160127,ga:date==20160128,ga:date==20160129,ga:date==20160130,ga:date==20160131&metrics=ga:users&dimensions=ga:date,ga:yearMonth&start-date=2016-01-01&end-date=2016-01-31
PowerBI still gives the wrong number of 1034, which means that it is unsing the first API call and then doing client-side aggregation. Again, so close but no cigar. If it had used the first API call, it would have returned the correct result. Why is the connector making that second incorrect call to GA? Does MSFT realize that this bug makes the connector near useless?
This post SHOULD be chronologically after the "Here's the specifics. " post. But the discussion software is buggy and won't allow a response to the last post. So I'm responding out of order. Hopefully it will sort display chronologically.
I thought of a potential workaround. It sort of works but with caveats that I'll describe. Rather than use ga:date, which doesn't fold properly, I did this:
let Source = GoogleAnalytics.Accounts(), GaAccount = Source{[Id="xxx"]}[Data], GaWebProp = GaAccount{[Id="UA-xxx-1"]}[Data], GaProfile = GaWebProp{[Id="13844713"]}[Data], GaData = Cube.Transform(GaProfile, {{Cube.AddAndExpandDimensionColumn, "ga:yearMonth", {"ga:yearMonth"}, {"Month of Year"}}, {Cube.AddMeasureColumn, "Users", "ga:users"}}), #"Filtered Rows" = Table.SelectRows(GaData, each ([Month of Year] = "201601")), #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(#"Filtered Rows", {"Month of Year"}) in #"Collapsed and Removed Columns"
It give the corrent value users=893. But it doesn't send a start-date or end-date back to GA. Here are the GA calls:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3a13844713&filters=ga%3ayearMonth%3d%3d201601&metrics=ga%3ausers&start-date=2009-01-01&end-date=2016-02-05 https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3a13844713&filters=ga%3ayearMonth%3d%3d201601&metrics=ga%3ausers&dimensions=ga%3ayearMonth&start-date=2009-01-01&end-date=2016-02-05 https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3a13844713&metrics=ga%3ausers&dimensions=ga%3ayearMonth&start-date=2009-01-01&end-date=2016-02-05
So GA has to deliver data for every month since the site was created (2009). Not very smart or efficient. I want data for just one month. But at least it gives the right value.
Here's the specifics. Here's the query:
let Source = GoogleAnalytics.Accounts(), GaAccount = Source{[Id="xxxx"]}[Data], GaWebProp = GaAccount{[Id="UA-xxxx-1"]}[Data], Profile = GaWebProp{[Id="13844713"]}[Data], #"Added Items" = Cube.Transform(Profile, {{Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"}}, {Cube.AddMeasureColumn, "Users", "ga:users"}}), #"Filtered Rows" = Table.SelectRows(#"Added Items", each [Date] >= #date(2016, 1, 1) and [Date] <= #date(2016, 1, 31)), #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(#"Filtered Rows", {"Date"}) in #"Collapsed and Removed Columns"
And here's the three GA API queries made:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga:13844713&filters=ga:date==20160101,ga:date==20160102,ga:date==20160103,ga:date==20160104,ga:date==20160105,ga:date==20160106,ga:date==20160107,ga:date==20160108,ga:date==20160109,ga:date==20160110,ga:date==20160111,ga:date==20160112,ga:date==20160113,ga:date==20160114,ga:date==20160115,ga:date==20160116,ga:date==20160117,ga:date==20160118,ga:date==20160119,ga:date==20160120,ga:date==20160121,ga:date==20160122,ga:date==20160123,ga:date==20160124,ga:date==20160125,ga:date==20160126,ga:date==20160127,ga:date==20160128,ga:date==20160129,ga:date==20160130,ga:date==20160131&metrics=ga:users&start-date=2016-01-01&end-date=2016-01-31&start-index=1&max-results=1000 https://www.googleapis.com/analytics/v3/data/ga?ids=ga:13844713&filters=ga:date==20160101,ga:date==20160102,ga:date==20160103,ga:date==20160104,ga:date==20160105,ga:date==20160106,ga:date==20160107,ga:date==20160108,ga:date==20160109,ga:date==20160110,ga:date==20160111,ga:date==20160112,ga:date==20160113,ga:date==20160114,ga:date==20160115,ga:date==20160116,ga:date==20160117,ga:date==20160118,ga:date==20160119,ga:date==20160120,ga:date==20160121,ga:date==20160122,ga:date==20160123,ga:date==20160124,ga:date==20160125,ga:date==20160126,ga:date==20160127,ga:date==20160128,ga:date==20160129,ga:date==20160130,ga:date==20160131&metrics=ga:users&dimensions=ga:date&start-date=2016-01-01&end-date=2016-01-31&start-index=1&max-results=1000 https://www.googleapis.com/analytics/v3/data/ga?ids=ga:13844713&metrics=ga:users&dimensions=ga:date&start-date=2009-01-01&end-date=2016-02-04&start-index=1&max-results=1000
The first filters in all the dates (which is redundant due to start-date and end-date, and has no date dimension. This query if run in GA returns users=893
The second again lists all dates but also has date dimension. This query if run in GA returns users=1034
The third doesn't list dates but has current date as end-date. I don't know why it runs this query - I didn't check GA directly.
In Power BI, the value returned is users=1034. The correct value is users=893.
Please correct me if I'm wrong, but I don't think this can be done with filter semantics.
Let me give a more realistic example
let Source = GoogleAnalytics.Accounts(), account = Source{[Id="654321"]}[Data], webprop = account{[Id="UA-654321-1"]}[Data], profile = webprop{[Id="123456"]}[Data], gadata = Cube.Transform( profile, { { Cube.AddMeasureColumn, "Users", "ga:users" } } ), in gadata
As you see, there is no date dimension in the query. Because I want the value of the metrics over a date range. I need metrics (such as ga:users) which cannot be aggregated client-side. So I need the connector to support the missing start and end date parameters.
I'm not sure what you mean by Cannot aggregate client-side. If the date filter is in the expected range, we will fold that into the start-date and end-date parameter of the request URL. For example, this query:
#"Added Items" = Cube.Transform(cube, {{Cube.AddAndExpandDimensionColumn, "ga:date", {"ga:date"}, {"Date"}}, {Cube.AddMeasureColumn, "Sessions", "ga:sessions"}, {Cube.AddMeasureColumn, "Users", "ga:users"}}), #"Filtered Rows" = Table.SelectRows(#"Added Items", each [Date] >= #date(2016, 1, 27) and [Date] <= #date(2016, 2, 4))
will be folded into
?...&metrics=ga:sessions,ga:users&start-date=2016-01-27&end-date=2016-02-04&start-index=1
If you don't want to see the date dimension, you can collapse it later. The filter stays active.
= Cube.CollapseAndRemoveColumns(#"Filtered Rows", {"Date"})
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
122 | |
85 | |
77 | |
55 | |
48 |
User | Count |
---|---|
136 | |
128 | |
78 | |
64 | |
63 |