Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for a $400 discount.
Register nowGet inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.
Hi all,
I’m building a Power BI dashboard using GA4 data exported to BigQuery, and I’m struggling to accurately calculate the number of unique users over different time periods.
Since I’m working with a large dataset (over 200 million rows if I don’t aggregate user IDs), my approach was to aggregate users at a daily granularity to significantly reduce the dataset size. However, this leads to a major issue:
The only solution I’ve found so far is to keep the user ID in the final table, but this results in performance issues, slow queries, and frequent dataset refresh failures in Power BI.
Is there a way to aggregate users without carrying user IDs in the final table, while still maintaining accurate unique user counts over different time periods?
Any suggestions would be greatly appreciated! Thanks!
Hi @rbozz ,
Thanks for Sahir_Maharaj's reply!
And @rbozz , you can try to use partitioning and clustering on your BigQuery tables to speed up queries.
For example, you can partition your table by date:
CREATE TABLE `your_project.your_dataset.your_table`
PARTITION BY date
CLUSTER BY user_id AS
SELECT
date,
user_id,
-- other columns
FROM
`your_project.your_dataset.your_source_table`
And if your dataset is too large to refresh entirely each time, consider using incremental refresh in Power BI. This way, only new data is loaded, reducing the refresh time and resource usage.
https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-overview
Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @v-junyant-msft,
Thanks for the input, but I already clastered and partitioned the table in Big Query, there are no issues in the query execution in BQ. The issue is in Power BI side, I tryed to set up incremental refresh with my semantic model using the Google BQ connector, but it doesn't seem to work, because the refresh times are very high and from time to time the refresh fails.
I also tryed to use dataflow, and incremental refresh. The incremental on the dataflow side works great, but the refresh of the semantic model connected to dataflow has the same issue as before: huge incremental refresh time.
Hope you can give me some other imputs.
Best,
Riccardo
Hello @rbozz,
Can you please try this approach using pre-aggregation in BigQuery:
WITH DailyUsers AS (
SELECT
date,
APPROX_COUNT_DISTINCT(user_pseudo_id) AS daily_unique_users
FROM `your_project.analytics_XXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231'
GROUP BY date
),
WeeklyUsers AS (
SELECT
FORMAT_DATE('%Y-%W', PARSE_DATE('%Y%m%d', date)) AS week,
APPROX_COUNT_DISTINCT(user_pseudo_id) AS weekly_unique_users
FROM `your_project.analytics_XXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231'
GROUP BY week
),
MonthlyUsers AS (
SELECT
FORMAT_DATE('%Y-%m', PARSE_DATE('%Y%m%d', date)) AS month,
APPROX_COUNT_DISTINCT(user_pseudo_id) AS monthly_unique_users
FROM `your_project.analytics_XXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231'
GROUP BY month
)
SELECT * FROM DailyUsers
UNION ALL
SELECT * FROM WeeklyUsers
UNION ALL
SELECT * FROM MonthlyUsers;
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Power BI update to learn about new features.