Skip to main content
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.


Performance ETL powerQuery or previous ETL ?

Good afternoon community.

I have some doubts regarding the performance with 3 custiones.

1- What is the best performance with respect to the ETL part?
I understand that to do data cleaning in powerBi it is convenient to do it from powerQuery and not with DAX functions.
But suppose I have a SQL server database and a table of 50 million records, of which 10 million have to be filtered because they are not useful.
It is convenient to make a view in SQL, clean all the data (do the ETL process) and then from PowerBi I take that view with the data that is useful to me?
Or do I take the table directly from powerBi and do ETL from powerQuery?

2- Is it convenient to put together a "massive" data flow from which you can then connect from different reports?
Assuming this flow connected to a SQL server database and took 1 hour to upgrade. Assuming a super consultation.
Or is it better to query the database directly in each report?
One way I see that it is better not to stick to the database many times for the same information, but in the other way I think it is more efficient.
In my opinion it is more effective to throw 100 small queries to the database, than a single query that contains the 100.

3- What would be the best way to "relate" or solve this problem.
I have 3 tables. Form A / Form B / Users.
Form A and B are related. And both have a user code that relates to the users table.
BUT, the user of Form A and B are not necessarily the same.
Should you duplicate the User table and relate each one to each Form? Or should you have the table loose and search from both forms with a calculated column your user?
Or if there is another alternative it would help me.
This example also appears to me in case in the same table I have more than 1 way of relating to another.
Example if Form had a cod_usuario_creador / cod_usuario_aprobador / cod_usuario_modifica.

Thanks a lot!

Super User
Super User

Great questions. The answer to all of them is "It depends".  There are many factors that can impact ETL performance. You will need to test both options in your specific scenario and then make a decision. Generally you want to do the data destruction as late as possible. Same for the data modeling.  Try both options and see which one works better for you. Generally you want to let the data model do the work for you as much as possible.

Helpful resources

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.


Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.


Fabric certifications survey

Certification feedback opportunity for the community.