Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
crmorgan7
New Member

Is it necessary to use a Select statement when choosing a table for my data flow?

I am fairly new to Power BI dataflow building, so bare with me.  

 

One of our vendors told me that when I am in the workspace and I go to create a new ODBC dataflow on the "Connect to Data Source" page I should click on "Advanced Options" and use the Select * "table name" because it is more efficient than using the folder tree.  Instead - on the "Connect to Data Source" page I wrote in my DSN and then clicked Next and used the folder tree to select the table.  I was able to filter/transform the data and the dashboard I'm working on is working as expected.  

 

Is there really a difference between using the Select statement and just using the folder tree to select the table?  

 

I tried to add the Select statement after the fact but it throws all kinds of errors, hence my question of why I should even be concerned about this if I have a working dashboard.  I appreciate the insight!  

 

1 ACCEPTED SOLUTION
DataNinja777
Super User
Super User

Hi @crmorgan7 ,

 

No, it's not strictly necessary to use a SELECT statement when creating your dataflow. As you've discovered, using the folder tree in the Navigator gets the job done and results in a working dashboard. The primary difference between the two methods isn't about the final outcome, but rather about performance and efficiency, especially when you're dealing with large amounts of data. Your vendor's advice points towards a best practice for optimization.

 

Think of it like a trip to the supermarket. Using the Navigator is like going to the store, grabbing an entire aisle's worth of products, bringing them all home, and then sorting through them in your kitchen to find what you need. It works, but it's inefficient. Using a SELECT statement, on the other hand, is like giving a detailed shopping list to a store employee at the entrance. They retrieve only the specific items you requested and bring them to you. This approach is far more efficient because the heavy lifting is done at the source (the database), and only the necessary data is transported. This concept of pushing the workload back to the database is known as Query Folding.

 

The method you choose often depends on the situation. The Navigator is perfectly fine when you're exploring a new dataset or when the tables are small and performance isn't a concern. However, for very large tables with millions of rows, using a SELECT statement from the start is significantly better. It reduces network traffic and the processing load on Power BI, leading to much faster data refresh times.

 

The errors you encountered when trying to add the statement later are quite common. Power Query records each transformation as a separate, sequential step. When you tried to edit the initial "Source" step, you changed its fundamental output, which broke the subsequent steps that were dependent on the original structure. The SELECT statement is meant to be part of the initial connection setup, not added after the fact.

 

Ultimately, if your current dashboard is performing well, there's no pressing need to rebuild it. You've found a solution that works. Just keep your vendor's advice in mind for future projects, as starting with a SELECT statement can be a lifesaver when you're working with massive datasets.

 

Best regards,

View solution in original post

4 REPLIES 4
v-priyankata
Community Support
Community Support

Hi @crmorgan7 

Thank you for reaching out to the Microsoft Fabric Forum Community.

@Royel @lbendlin @DataNinja777  Thanks for the inputs

I hope the information provided by users was helpful. If you still have questions, please don't hesitate to reach out to the community.

 

DataNinja777
Super User
Super User

Hi @crmorgan7 ,

 

No, it's not strictly necessary to use a SELECT statement when creating your dataflow. As you've discovered, using the folder tree in the Navigator gets the job done and results in a working dashboard. The primary difference between the two methods isn't about the final outcome, but rather about performance and efficiency, especially when you're dealing with large amounts of data. Your vendor's advice points towards a best practice for optimization.

 

Think of it like a trip to the supermarket. Using the Navigator is like going to the store, grabbing an entire aisle's worth of products, bringing them all home, and then sorting through them in your kitchen to find what you need. It works, but it's inefficient. Using a SELECT statement, on the other hand, is like giving a detailed shopping list to a store employee at the entrance. They retrieve only the specific items you requested and bring them to you. This approach is far more efficient because the heavy lifting is done at the source (the database), and only the necessary data is transported. This concept of pushing the workload back to the database is known as Query Folding.

 

The method you choose often depends on the situation. The Navigator is perfectly fine when you're exploring a new dataset or when the tables are small and performance isn't a concern. However, for very large tables with millions of rows, using a SELECT statement from the start is significantly better. It reduces network traffic and the processing load on Power BI, leading to much faster data refresh times.

 

The errors you encountered when trying to add the statement later are quite common. Power Query records each transformation as a separate, sequential step. When you tried to edit the initial "Source" step, you changed its fundamental output, which broke the subsequent steps that were dependent on the original structure. The SELECT statement is meant to be part of the initial connection setup, not added after the fact.

 

Ultimately, if your current dashboard is performing well, there's no pressing need to rebuild it. You've found a solution that works. Just keep your vendor's advice in mind for future projects, as starting with a SELECT statement can be a lifesaver when you're working with massive datasets.

 

Best regards,

Royel
Solution Sage
Solution Sage

Hi @crmorgan7,

When it comes to results, there’s no difference between the two methods  you can use either one. However, each has its own advantages depending on your situation.

 

Use Folder Tree when:

  • You’re new to Power BI and want a simpler setup

  • Your data source supports automatic query folding

  • You don’t need complex filtering at the start

  • Your dataset is moderately sized

 

Use SELECT Statement when:

  • You’re working with large datasets and need better performance

  • You want to apply specific SQL optimizations

  • You need early filtering at the database level

  • You want to leverage special features of your ODBC driver

 

Note:
Using a SELECT statement is generally more efficient, especially for performance tuning.
However, the Folder Tree method also supports query folding  so with proper setup, it can achieve similar performance. You can learn more about query folding from here 

 

Find this helpful? ✔ Give a Kudo • Mark as Solution – help others too!

lbendlin
Super User
Super User

If in doubt consult the query diagnostics. It's possible that both options result in the same level of query folding.

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors