The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I'm trying to figure out why there are such large discrepancies in data between Google Analytics and Power BI. I know sampling can be an issue but that be the cause of such large differences? Here's part of a table from GA and the same data in PBI. Some values are close (e.g. facebook/remarketing) but others are completely off. Any ideas? I have a deadline to meet that won't be possible if I can't figure this out TIA!
Solved! Go to Solution.
I've got response from the product team.
This is expected behavior. The API we're using for Google Analytics is the one available to free users, and that API is known to return less accurate values for high volume sites due to the way it samples data. Unfortunately, the API for Google Analytics Premium customers is significantly different than that for the free users, so we'd need to implement a new connector specifically for Premium in order to get the more accurate values. This is not currently on our roadmap. Customers who would like to see this feature should request it or upvote it (if already there) at ideas.powerbi.com.
Best Regards,
Herbert
You need access to non sampled raw/hit data and do the aggregations yourself. This is generally a very good idea working with Power BI - since it's aggregated nature. When applying aggregates to an aggregation tool, bad things can easily happen. Users (even pro analysts - yes I have seen it 🙂 will be "seduced" to sum, avg, etc. already distinctly counted metrics or averages, max's, min's etc.
You need to sort of data governance and stay in the taxonomy of Google Analytics (or consciously move out of it), you need either to purchase the pricy GA 360 or use Scitylana's Google Analytics services.
With these 2 options you can move the raw data to a BigQuery table and to a combination of DirectQuery and In-memory import. This works really great!
I've got response from the product team.
This is expected behavior. The API we're using for Google Analytics is the one available to free users, and that API is known to return less accurate values for high volume sites due to the way it samples data. Unfortunately, the API for Google Analytics Premium customers is significantly different than that for the free users, so we'd need to implement a new connector specifically for Premium in order to get the more accurate values. This is not currently on our roadmap. Customers who would like to see this feature should request it or upvote it (if already there) at ideas.powerbi.com.
Best Regards,
Herbert
Any clues as to what threshold defines “high volume sites” ?
thanks!
Proud to be a Super User!
Paul on Linkedin.
@PaulDBrown @v-haibl-msft @cowork16 @
I have spent a lot of time working with GA from the API side.
High volume, in this context, would be a site that when queried via the API, allways/very often hits the sampling threshold of 500K visits within the time range queried.
I represent a company that solves this challenge with the standard API. Please come try us out.
Thank you for the input. A question if I may...
I am working on a client with an average of 100 K monthly visitors (1.2 million a year). So if I wanted to include their GA data in a report, with historical trends etc, would that cause data discrepancies?
I understand the implications of aggregate data and the risks involved in using it in reports. But can't we simply replicate data (by avoiding re-aggregating in sums etc in PBI) as seen on GA? It defeats the prupose of PBI in terms of slicing etc..., but on the other hand allows to centralise reporting from different sources in PBI to make PBI the single reporting tool.
I guess what I find a bit disconcerting is the original poster's comment that the data as seen in the query editor in PBI si the same as seen on GA, but discrepancies occur in reports...
(BTW, I have looked at your product and it does sound very impressive, albeit I am not sure if this particular client wants to spend on extra tools when he can access data on GA for free).
Please excuse my obvious ignorance regaring GA, but I haven't needed to use GA data in PBI before. (btw, does the same problem happen with data extracted from Facebook, Instagram etc?)
Thanks for your help,
Paul.
EDIT: as an afterthought, does Google Data Studio show the same discrepancies as we might find in PBI?
Proud to be a Super User!
Paul on Linkedin.
DataStudio has GA natively built-in.
All queries are sent to GA backend and evaluated.
There is no local aggregation. But sampling still kicks in.
Yes, you'll get historical discrepancies.
When you cross the limit of 500K sessions, which will happen hitting the 5 months mark using your data set, the whole data set is be sampled.
What can you do?
You can make a M script that takes one day of data at the time. Build a joined table on e.g. 365 or have many days you need.
But you would still be limited by the number of filter dimensions and reporting dimensions. You would easily have multiple of these scripts pulling 1 day at a time. Which in turn will be really hard to maintain.
I see more and more running into this problem. All very confused by the amount of time they need to invest in the "free" GA data source. This is just one data source out of many of a reporting suite.
I agree with you - Checking data in GA is not fine - when you already have realized how hard it is to adopt employees/users of any given BI repoting platform. The platform should be where the employees build a better company.
I understand price matters.
Are you talking about the facebook and instagram connectors found on www.scitylana.com/connectors?
They have same issues.
Thank you for the explanation and warning regarding my particular scenario.
When I mentioned other conectors, I was referring to the connectors within Power BI. I take it from your comment that we run into the same issues using the inbuilt PBI connectors then.
Do the connectors on www.scitylana.com/connectors solve the issues?
Which kind of makes the PBI connectors pretty much pointless for medium to large traffic websites/social media sites. Unless I guess we keep the reporting to the very basic and avoiding aggregations? Or does sampling also have a big impact? I'm not sure what the impact sampling actually has (apologies again, I'm new to sources such as GA, Facebook, Instagram etc...but I will be facing these issues in the next two weeks)
It seems therefore that Power BI is not the tool to monitor web activity, unless you are prepared to fork out a relevant amount of €€€, which I guess is out of the question for most medium and small businesses (with free alternatives out there).
BTW, how should I contact you or whoever to understand the licensing for Scitylana? I have checked the page and I have a couple of doubts.
Thanks.
Proud to be a Super User!
Paul on Linkedin.
The connectors have their week spots.
This is nothing new, really. Classic ETL into a DW is still preferable in my opinion.
Power BI + connectors is entry level BI. But it's really great for entry level BI setups
Web activity just pushes it to the limit faster that the other data sources. But these kind of web activity data sources gets more and more normal. Power BI raises peoples expectations faster than other products out there.
MS has a whole suite of products you can buy when you grow out of the Pro licensing.
My free connectors has the same limits.
But Scitylana's Google Analytics solution is solid with built-in elasticity for growth.
As opposed to the connectors and PBI Import mode - DirectQuery in Power BI is much more relevant than one would initially think.
I think a combination of import and direct will be the future. To balance cost and performance.
That's good for the BI Analysts consultants. New types of complexity sees the light. 🙂
Sampling is mostly an issue when you want to
- integrate data sources (joining keys will be missing)
- calculate conversion rates (conversion rate of 5% could in reality be 7.5% or 2.5%)
- drill into user behavior
- rely on the real values
Write on our chat - or send mail to support at scitylana.com
In my experience, Data Studio accurately reflects GA. I even embedded a Data Studio report in a Power BI dashboard as a workaround. But we are still struggling with needing to consume some GA data pieces with Power BI data.
Can you share how you embed a Data Studio report in a POwer Bi Dashboard (or help me find the right resources)?
Thanks!
Proud to be a Super User!
Paul on Linkedin.
You simply
Thank you @v-haibl-msft, this makes sense. We'll have to figure out a plan B for our report.
Any help on this?
It seems like a data retrieval delay in build-in Google Analytic Connector, you may try some other open-sourced GA connectors, like PQGoogleAnalytics
Regards,
Thanks for the reply @v-sihou-msft 🙂 I set up the connector that you mentioned in your reply and I'm getting the exact same numbers as with the built-in connector. But when I queried directly with PQGoogleAnalytics, the numbers in the Query Editor matched with the GA numbers. So I don't know why, when I put the fields in a report, the numbers are so off.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
80 | |
66 | |
53 | |
52 |
User | Count |
---|---|
121 | |
116 | |
77 | |
64 | |
63 |