Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Shape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.

Reply
cowork16
Frequent Visitor

Data from Google Analytics Doesn't Match!

I'm trying to figure out why there are such large discrepancies in data between Google Analytics and Power BI. I know sampling can be an issue but that be the cause of such large differences? Here's part of a table from GA and the same data in PBI. Some values are close (e.g. facebook/remarketing) but others are completely off. Any ideas? I have a deadline to meet that won't be possible if I can't figure this out Smiley Sad TIA!

 

Tables

1 ACCEPTED SOLUTION
v-haibl-msft
Microsoft Employee
Microsoft Employee

@cowork16

 

I've got response from the product team.

 

This is expected behavior. The API we're using for Google Analytics is the one available to free users, and that API is known to return less accurate values for high volume sites due to the way it samples data. Unfortunately, the API for Google Analytics Premium customers is significantly different than that for the free users, so we'd need to implement a new connector specifically for Premium in order to get the more accurate values. This is not currently on our roadmap. Customers who would like to see this feature should request it or upvote it (if already there) at ideas.powerbi.com.

 

Best Regards,

Herbert

View solution in original post

15 REPLIES 15
michaelbilling
Helper II
Helper II

You need access to non sampled raw/hit data and do the aggregations yourself. This is generally a very good idea working with Power BI - since it's aggregated nature. When applying aggregates to an aggregation tool, bad things can easily happen. Users (even pro analysts - yes I have seen it 🙂 will be "seduced" to sum, avg, etc. already distinctly counted metrics or averages, max's, min's etc.

 

You need to sort of data governance and stay in the taxonomy of Google Analytics (or consciously move out of it), you need either to purchase the pricy GA 360 or use Scitylana's Google Analytics services.

With these 2 options you can move the raw data to a BigQuery table and to a combination of DirectQuery and In-memory import. This works really great!

 

v-haibl-msft
Microsoft Employee
Microsoft Employee

@cowork16

 

I've got response from the product team.

 

This is expected behavior. The API we're using for Google Analytics is the one available to free users, and that API is known to return less accurate values for high volume sites due to the way it samples data. Unfortunately, the API for Google Analytics Premium customers is significantly different than that for the free users, so we'd need to implement a new connector specifically for Premium in order to get the more accurate values. This is not currently on our roadmap. Customers who would like to see this feature should request it or upvote it (if already there) at ideas.powerbi.com.

 

Best Regards,

Herbert

@v-haibl-msft 

 

Any clues as to what threshold defines “high volume sites” ?

thanks!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






@PaulDBrown @v-haibl-msft @cowork16 @

 

I have spent a lot of time working with GA from the API side.

 

High volume, in this context, would be a site that when queried via the API, allways/very often hits the sampling threshold of 500K visits within the time range queried.

 

I represent a company that solves this challenge with the standard API. Please come try us out.

https://www.scitylana.com

@michaelbilling 

 

Thank you for the input. A question if I may...

I am working on a client with an average of 100 K monthly visitors (1.2 million a year). So if I wanted to include their GA data in a report, with historical trends etc, would that cause data discrepancies?

I understand the implications of aggregate data and the risks involved in using it in reports. But can't we simply replicate data (by avoiding re-aggregating in sums etc in PBI) as seen on GA? It defeats the prupose of PBI in terms of slicing etc..., but on the other hand allows to centralise reporting from different sources in PBI to make PBI the single reporting tool.

I guess what I find a bit disconcerting is the original poster's comment that the data as seen in the query editor in PBI si the same as seen on GA, but discrepancies occur in reports...

 

(BTW, I have looked at your product and it does sound very impressive, albeit I am not sure if this particular client wants to spend on extra tools when he can access data on GA for free). 

 

Please excuse my obvious ignorance regaring GA, but I haven't needed to use GA data in PBI before. (btw, does the same problem happen with data extracted from Facebook, Instagram etc?)

 

Thanks for your help,

Paul.

 

EDIT: as an afterthought, does Google Data Studio show the same discrepancies as we might find in PBI?





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






@PaulDBrown @melany  

 

DataStudio has GA natively built-in.

All queries are sent to GA backend and evaluated.

There is no local aggregation. But sampling still kicks in.

 

Yes, you'll get historical discrepancies.

When you cross the limit of 500K sessions, which will happen hitting the 5 months mark using your data set, the whole data set is be sampled.

 

What can you do?

You can make a M script that takes one day of data at the time. Build a joined table on e.g. 365 or have many days you need.

But you would still be limited by the number of filter dimensions and reporting dimensions. You would easily have multiple of these scripts pulling 1 day at a time. Which in turn will be really hard to maintain.

 

Scitylana

I see more and more running into this problem. All very confused by the amount of time they need to invest in the "free" GA data source. This is just one data source out of many of a reporting suite.

I agree with you - Checking data in GA is not fine - when you already have realized how hard it is to adopt employees/users of any given BI repoting platform. The platform should be where the employees build a better company.

 

I understand price matters.

 

Are you talking about the facebook and instagram connectors found on www.scitylana.com/connectors?

They have same issues.

 

 

 

@michaelbilling 

 

Thank you for the explanation and warning regarding my particular scenario.

When I mentioned other conectors, I was referring to the connectors within Power BI. I take it from your comment that we run into the same issues using the inbuilt PBI connectors then.

Do the connectors on  www.scitylana.com/connectors solve the issues? 

 

Which kind of makes the PBI connectors pretty much pointless for medium to large traffic websites/social media sites. Unless I guess we keep the reporting to the very basic and avoiding aggregations? Or does sampling also have a big impact? I'm not sure what the impact sampling actually has (apologies again, I'm new to sources such as GA, Facebook, Instagram etc...but I will be facing these issues in the next two weeks)

 

It seems therefore that Power BI is not the tool to monitor web activity, unless you are prepared to fork out a relevant amount of €€€, which I guess is out of the question for most medium and small businesses (with free alternatives out there).

BTW, how should I contact you or whoever to understand the licensing for Scitylana? I have checked the page and I have a couple of doubts. 

Thanks.





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






@PaulDBrown 

The connectors have their week spots.

This is nothing new, really. Classic ETL into a DW is still preferable in my opinion.

 

Power BI + connectors is entry level BI. But it's really great for entry level BI setups

Web activity just pushes it to the limit faster that the other data sources. But these kind of web activity data sources gets more and more normal. Power BI raises peoples expectations faster than other products out there.

MS has a whole suite of products you can buy when you grow out of the Pro licensing.

 

My free connectors has the same limits.

But Scitylana's Google Analytics solution is solid with built-in elasticity for growth.

 

As opposed to the connectors and PBI Import mode - DirectQuery in Power BI is much more relevant than one would initially think.

I think a combination of import and direct will be the future. To balance cost and performance.

That's good for the BI Analysts consultants. New types of complexity sees the light. 🙂

 

Sampling is mostly an issue when you want to

- integrate data sources (joining keys will be missing)

- calculate conversion rates (conversion rate of 5% could in reality be 7.5% or 2.5%)

- drill into user behavior

- rely on the real values

 

Write on our chat -  or send mail to support at scitylana.com

https://www.scitylana.com

In my experience, Data Studio accurately reflects GA.  I even embedded a Data Studio report in a Power BI dashboard as a workaround.  But we are still struggling with needing to consume some GA data pieces with Power BI data.

@melany 

 

Can you share how you embed a Data Studio report in a POwer Bi Dashboard (or help me find the right resources)?

Thanks!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






@PaulDBrown 

You simply

  1. use Data Studio to connect to your Google Analytics account
  2. create the report in Data Studio
  3. enable embedding and copy the link
  4. open PowerBI.com and select the workspace dashboard you want the report to appear 
  5. Add Tile
  6. Select Web Content
  7. Paste the Data Studio report lin
  8. There you go!2019-06-04_13-44-08.png

     

 

Thank you @v-haibl-msft, this makes sense. We'll have to figure out a plan B for our report.

cowork16
Frequent Visitor

Any help on this?

@cowork16

 

It seems like a data retrieval delay in build-in Google Analytic Connector, you may try some other open-sourced GA connectors, like PQGoogleAnalytics

 

Regards,

Thanks for the reply @v-sihou-msft 🙂 I set up the connector that you mentioned in your reply and I'm getting the exact same numbers as with the built-in connector. But when I queried directly with PQGoogleAnalytics, the numbers in the Query Editor matched with the GA numbers. So I don't know why, when I put the fields in a report, the numbers are so off.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.