Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.

Reply
hselfe_rsa
Regular Visitor

Optimizing Performance in a Star Schema with BigQuery - Need Help with 12 Joins

Hello everyone,

I've been working on a project where I'm trying to create a star schema of tables from a 5m row dataset pulled in from BigQuery. The main goal is to make various aspects of candidate data (e.g., Address, Details, Demographics) available for others to connect to and model against. Here's the approach I've taken:

 

I bring in a 5m row table from BigQuery containing only the CandidateID.
I then copy, reference, or clone (I've tried every approach I can think of) this exact same table to create around 12 related tables, all linked to the Candidate table via a star schema using the CandidateID as the common key. I've also tried calling BQ 12 times, just with the fields I need. This updates faster but exhibits the same respons eissues in a mixed table of data.


This approach works well for structuring the data, but I'm facing a significant performance issue:

 

When I publish and try to view a table that shows all the data across these tables, it's extremely unresponsive, taking ages to update. If the user applies a filter (all Female say) it agains takes ages to refresh. I suspect that the 12 joins between the Candidate table and the other related tables are causing this slowdown.

I would greatly appreciate any advice or insights on how to optimize the performance in such a scenario. Are there any best practices for handling a star schema with a large number of joins in BigQuery? Is there a way to make this query more responsive?

Thank you in advance for your help!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @hselfe_rsa ,

 

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

 

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

View solution in original post

8 REPLIES 8
Brunner_BI
Super User
Super User

What you are describing, you are doing this in PowerQuery, right? Dont do it in the frontend using calculated tables that will be a disaster for performance

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Not calculated tables, but relationships in model view

Anonymous
Not applicable

Hi @hselfe_rsa ,

 

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

 

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

All useful points, thanks Stephen, we are now doing all the heavy lifing in BQ, and that is helping a great deal - so no transforms in PBI at all

One more thing I would suggest is to use tools like Tabular Editor or Measure Killer to optimize the data model and kick out any columns you actually might not need. Assuming you have some reports obviously.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Thanks for the help!

Brunner_BI
Super User
Super User

Are you sure you are actually using import mode or are you doing any kind of DirectQuery or composite model?

Always use import mode if you can, this is where you will get the full performance of the Power BI engine.

 

In import mode, when the data has been loaded it does not matter anymore where the data came from, it will use the analysis service engine and you will get the full performance.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Thanks Brunner, yes it is Import Mode in all cases

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors