Solved: Optimizing Performance in a Star Schema with BigQu...

hselfe_rsa · ‎10-22-2023

Hello everyone,

I've been working on a project where I'm trying to create a star schema of tables from a 5m row dataset pulled in from BigQuery. The main goal is to make various aspects of candidate data (e.g., Address, Details, Demographics) available for others to connect to and model against. Here's the approach I've taken:

I bring in a 5m row table from BigQuery containing only the CandidateID.
I then copy, reference, or clone (I've tried every approach I can think of) this exact same table to create around 12 related tables, all linked to the Candidate table via a star schema using the CandidateID as the common key. I've also tried calling BQ 12 times, just with the fields I need. This updates faster but exhibits the same respons eissues in a mixed table of data.

This approach works well for structuring the data, but I'm facing a significant performance issue:

When I publish and try to view a table that shows all the data across these tables, it's extremely unresponsive, taking ages to update. If the user applies a filter (all Female say) it agains takes ages to refresh. I suspect that the 12 joins between the Candidate table and the other related tables are causing this slowdown.

I would greatly appreciate any advice or insights on how to optimize the performance in such a scenario. Are there any best practices for handling a star schema with a large number of joins in BigQuery? Is there a way to make this query more responsive?

Thank you in advance for your help!

Anonymous · ‎10-25-2023

Hi @hselfe_rsa ,

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

Best Regards,

Stephen Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Brunner_BI · ‎10-25-2023

What you are describing, you are doing this in PowerQuery, right? Dont do it in the frontend using calculated tables that will be a disaster for performance

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

hselfe_rsa · ‎10-26-2023

Not calculated tables, but relationships in model view

Anonymous · ‎10-25-2023

Hi @hselfe_rsa ,

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

Best Regards,

Stephen Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

hselfe_rsa · ‎10-26-2023

All useful points, thanks Stephen, we are now doing all the heavy lifing in BQ, and that is helping a great deal - so no transforms in PBI at all

Brunner_BI · ‎10-26-2023

One more thing I would suggest is to use tools like Tabular Editor or Measure Killer to optimize the data model and kick out any columns you actually might not need. Assuming you have some reports obviously.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

hselfe_rsa · ‎10-26-2023

Thanks for the help!

Brunner_BI · ‎10-22-2023

Are you sure you are actually using import mode or are you doing any kind of DirectQuery or composite model?

Always use import mode if you can, this is where you will get the full performance of the Power BI engine.

In import mode, when the data has been loaded it does not matter anymore where the data came from, it will use the analysis service engine and you will get the full performance.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

hselfe_rsa · ‎10-26-2023

Thanks Brunner, yes it is Import Mode in all cases

Optimizing Performance in a Star Schema with BigQuery - Need Help with 12 Joins

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Party with Power BI’s own Guy in a Cube

Optimizing Performance in a Star Schema with BigQuery - Need Help with 12 Joins

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly