Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
First of all, apologies if this topic is not for this forum (I couldn't find a more appropriate one).
I have a PBI report published in a PBI Service with a connection to SQL Server with two queries inside the report. The report is not big, one of the tables has 7 columns and 100 rows, the other table has 9 columns and 75K rows, 700KB size, i.e. nothing big. In fact this is the smallest report of the whole BI suite we have. This report takes between 20 sec and 30 sec to refresh daily, sometimes even 15 sec. So far so good.
Now the issue which puzzles me.
We use Microsoft Fabric and we have a roadmap for full automation of different processes. Before we switched to Fabric we were using the usual PBI web service, but we went for Fabric because of the multiple good ideas and potential options which the product offers or it will. However, not surprisingly, we discovered that it has tons of bugs and glitches, which we would expect to be addressed and fixed within reasonable time. But, some of these problems are related to super basic functions or basic performance, and frankly are very disappointing.
Regarding the report what we did is to move the two queries out of the report and to add them to a stored procedure.
This is the flow (all of this happens in Fabric pipeline):
All the above 3 steps take between 5 and 6 minutes which is ridiculous. We use the same sql server, same tables etc in both cases, however, we have this huge performance difference - when the query runs from inside the report (20-30 sec) and when we move the query to a separate process outside the report it takes 6 minutes using the same premises and same resources!?
Any ideas, suggestions, explanations etc are more than welcome.
Thanks,
EM
It's indeed puzzling that you're experiencing such a significant difference in performance between running the queries inside Power BI and running them through your stored procedure. Here are some potential reasons and suggestions to explore:
1. **Data Transfer Overhead**: When running the queries directly inside Power BI, the data transfer between the SQL Server and Power BI Service is optimized and potentially more efficient compared to running the stored procedure separately. Check if there are any bottlenecks or limitations in the data transfer process when using the stored procedure.
2. **Stored Procedure Optimization**: Review the stored procedure code to ensure it's optimized for performance. Are there any inefficient queries, unnecessary joins, or resource-intensive operations that could be optimized? Consider using SQL Server Profiler to analyze the execution plan and identify any areas for improvement.
3. **Concurrency and Resource Allocation**: Check if there are any concurrency or resource allocation issues affecting the performance of the stored procedure. Are there other processes or queries running simultaneously that could be impacting performance? Consider adjusting resource allocation or scheduling to minimize contention.
4. **Data Transformation and Aggregation**: Are there any additional data transformation or aggregation steps performed in the stored procedure that could be contributing to the longer runtime? Evaluate if these steps are necessary and if they can be optimized or moved to a different stage in the pipeline.
5. **Network Latency and Bandwidth**: Consider if network latency or bandwidth limitations could be affecting the performance of the data transfer process. Are there any network bottlenecks or constraints that could be impacting the speed of data transmission between the SQL Server and your data warehouse?
6. **Power BI Service Configuration**: Review the configuration and settings of your Power BI Service environment. Are there any limitations or constraints in terms of resource allocation, concurrency, or data transfer bandwidth that could be impacting performance?
7. **Monitoring and Logging**: Implement monitoring and logging mechanisms to track the performance of the data transfer process and identify any specific areas of concern or slowdowns. This could help pinpoint the root cause of the performance difference and guide optimization efforts.
By investigating these potential factors and exploring optimization opportunities, you may be able to improve the performance of the data transfer process when using the stored procedure. Additionally, consider reaching out to Microsoft support or community forums for further assistance and insights specific to your environment and setup.
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !!
Hi johnbasha,
Thanks for your input, I appreciate it.
Here are my comments:
I still cannot find a logical explanation why the report updates on my local machine (which has aggravating factors like downloading the data locally through a VPN) in about a minute, while in MS Fabric environment using Dataflow Gen2 takes 6 minutes all together, where in fact all the data transfers happen in the cloud (our SQL server is cloud based) which is supposed to be the fastest performance.
We have a version of the report published in the PBI service within the same Fabric account which report has the SQL code embedded in, and this report refreshes daily in about 20-30 seconds. The other (the problematic test one) version pulls the data using Dataflow Gen2 (same code, and same environment, same Fabric account) and takes ages to update.
What we are reading about Dataflow Gen2 is that it is fast, seamless, reliable… etc. and uses the same engine as Power BI. Since it is using the same engine why do we see such a performance gap?
“… Dataflow Gen2 is built using the familiar Power Query experience that’s available today across several Microsoft products and services such as Excel, Power BI, Power Platform, Dynamics 365 Insights applications, and more…”
“..Dataflow Gen2 in Fabric is the evolution of Dataflow, redesigned from the ground up with more powerful compute engines, data movement capabilities to help you load data to several Fabric and Azure data destinations…”
“… Dataflows Gen2 are authored using Power Query. Once you publish a dataflow, the dataflow definition is generated – this is the program ….”
I’m reading that many users in different topics in this forum and across other forums have similar complaints, and as I can see one of the solutions is to use Python (PySpark) for the data load which apparently is way faster than using Dataflow.
PS:
It seems that the "staging" property of Dataflow Gen2 worsen the performance based on some articles I just read. Turns out that the "staging" is enabled by default, but trying to switch it off doesn't allow you to put the result of the dataflow into the warehouse... simply fabulous.