Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
hselfe_rsa
Regular Visitor

Optimizing Performance in a Star Schema with BigQuery - Need Help with 12 Joins

Hello everyone,

I've been working on a project where I'm trying to create a star schema of tables from a 5m row dataset pulled in from BigQuery. The main goal is to make various aspects of candidate data (e.g., Address, Details, Demographics) available for others to connect to and model against. Here's the approach I've taken:

 

I bring in a 5m row table from BigQuery containing only the CandidateID.
I then copy, reference, or clone (I've tried every approach I can think of) this exact same table to create around 12 related tables, all linked to the Candidate table via a star schema using the CandidateID as the common key. I've also tried calling BQ 12 times, just with the fields I need. This updates faster but exhibits the same respons eissues in a mixed table of data.


This approach works well for structuring the data, but I'm facing a significant performance issue:

 

When I publish and try to view a table that shows all the data across these tables, it's extremely unresponsive, taking ages to update. If the user applies a filter (all Female say) it agains takes ages to refresh. I suspect that the 12 joins between the Candidate table and the other related tables are causing this slowdown.

I would greatly appreciate any advice or insights on how to optimize the performance in such a scenario. Are there any best practices for handling a star schema with a large number of joins in BigQuery? Is there a way to make this query more responsive?

Thank you in advance for your help!

1 ACCEPTED SOLUTION
v-stephen-msft
Community Support
Community Support

Hi @hselfe_rsa ,

 

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

 

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

View solution in original post

8 REPLIES 8
Brunner_BI
Super User
Super User

What you are describing, you are doing this in PowerQuery, right? Dont do it in the frontend using calculated tables that will be a disaster for performance

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Not calculated tables, but relationships in model view

v-stephen-msft
Community Support
Community Support

Hi @hselfe_rsa ,

 

To optimize the performance in a star schema with a large number of joins in BigQuery, you can follow these best practices:

1. Use denormalization: Consider denormalizing your data by combining related tables into a single table. This can reduce the number of joins required and improve query performance. However, keep in mind that denormalization may increase storage costs and data redundancy.

2. Optimize table design: Ensure that your tables are properly indexed and partitioned. Indexing can speed up query execution by allowing BigQuery to quickly locate the relevant data. Partitioning can help improve query performance by reducing the amount of data that needs to be scanned.

3. Use query optimization techniques: When writing queries, consider using query optimization techniques such as filtering and aggregating data as early as possible in the query execution process. This can help reduce the amount of data that needs to be processed and improve query performance.

4. Monitor query performance: Regularly monitor the performance of your queries using BigQuery's query history and query plan features. This can help identify any performance bottlenecks and guide further optimization efforts.

 

I hope these suggestions help you optimize the performance of your star schema in BigQuery. If you have any further questions or need additional assistance, please let me know.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

All useful points, thanks Stephen, we are now doing all the heavy lifing in BQ, and that is helping a great deal - so no transforms in PBI at all

One more thing I would suggest is to use tools like Tabular Editor or Measure Killer to optimize the data model and kick out any columns you actually might not need. Assuming you have some reports obviously.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Thanks for the help!

Brunner_BI
Super User
Super User

Are you sure you are actually using import mode or are you doing any kind of DirectQuery or composite model?

Always use import mode if you can, this is where you will get the full performance of the Power BI engine.

 

In import mode, when the data has been loaded it does not matter anymore where the data came from, it will use the analysis service engine and you will get the full performance.

------------------------------------
Brunner BI focusing on Microsoft Power BI development and consulting
Developers of external tool "Measure Killer"
My blog

Thanks Brunner, yes it is Import Mode in all cases

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.