"Parallel loading of tables" is not working correctly.
When disable "Parallel loading of tables" in section Data Load in Options, it is working correctly in Power BI Desktop (tables are loaded sequentially in correct order). But after Save as to Power BI Report Server (both in Jan 2021 release), and run scheduled refresh, this behavior not working correctly, tables are still loading parallelly.
This option is importat when is there many reports refreshing many tables, which can overload datasource, and in cases when is important order of loading.
This issue is also present in October 2020 relese (all three builds). Last worked correctly in May 2020 release.
I hate being the bearer of bad news, but I come with some today. This feature was never something that was explictly supported in PBIRS. It worked as sort of a byproduct of how models and refresh worked in the product. However, with the move to models with enhanced model metadata this feature no longer seems to work "for free" and would require some unknown (and more importantly unplanned) amount of dev and test work.
I have added it as a task in the backlog, but your best shot would be to add this to the https://ideas.powerbi.com/ideas/ list as we use this to help rank items in our backlog. Once again sorry to be the one to come with the bad news, but I wanted to be transparent with our findings.
I'm also raising hand, I think it need to be solved. It is very important issue, because case of current massive parallel load, it making crucial issues on source side (e.g. Oracle DB) and also on PBIRS server during refresh. We must rework some reports to another technology, and must hold some another project. Please what is status of this issue in backlog?
This is something that we investigated and did not make the cut over the last few releases. I will again bring this up with our PM team and use this thread as evidence of demand. I understand the frustration and will investigate the cost once again. Though this will not be part of the September release.
Hi @Petebro ,
Thank you very much. It is quite understandable that it will not be in this September release. But please bring back this functionality in future releases of PBI Report Server. It doesn't matter if in the same way (serial/parallel setting of processing on report level), or in any other way, such as conditional/ordered processing.
The main reasons are:
1) In reports with a large number of large DB tables, it can cause serious problems with the performance and availability of source systems (due to a large number of parallel table scan operations). Or if the administrator of the source system limits the number of parallel connections, it can lead to the failure of the entire refresh report.
2) In the same cases, it can also cause serious problems with the performance and availability of the Power BI Report Server itself (due to the large number of parallel processes/compress vertipaq tables).
3) In some specific cases, it is necessary to process the sources in a specific order, as the sources are sorted in the ordered list in the power query editor (which now allows you to change the order).
4) The possibility of serial and ordered processing was in the past and is still functional today in Power BI Desktop RS, but after save-as to Power BI Report Server it worked correctly in the past, but stopped working in one of the past versions.
Has anything been done with this parallel loading in the report server over the past 18 months?
Sorry for picking up the old thread but this could have a huge impact on some work I am currently completing.
I'm sad of your findings. I understand that due to new format, it needs some additional work and testing for this functionality. It is not priority issue for me, now, because we make some workarounds on source data servers (but it is limiting our performance), to override produced issues of this functional gap. But I hope that it will be solved in some future release, and we can return to performance of pre-october release.
Additionally, I don't think, that it worked byproduct of old model. "Parallel loading of tables" setting, which can be turned on or off, is officially presented in Power BI Desktop, and it works here in current same as pre-October versions. Data refresh on PBI Report Server (pre-October releases) works same way as on Desktop, with respect of "Parallel loading of tables" toggle. PBI Report Server data refresh have some limitations which are documented here https://docs.microsoft.com/en-us/power-bi/report-server/scheduled-refresh , and there is nothing about that "Parallel loading of tables" setting is not reflected on server. But in October release (and later) of PBI Report Server it stopped work, and in this release is not documented anything about removing this feature.
What's more, this feature is important especially on server side. On PBI Desktop, I can easy override, and run refresh selective manually in order as I want, and additionally on PBI Desktop is typically running on some laptop on speed limited connection. There is minimal risk of overload sources. But on PBI RS, there was only one way how to refresh in correct order, and how avoid datasource overload by massive parallel load, by "Parallel loading of tables" setting.
Please consider what next, if return this feature to PBI RS (I will be happy :)) . Or not, but in this case will be better (officially) remove this also from PBI Desktop (there no reason to maintain functionality which is not supported on both PBI RS and PBI Service). Or bring another alternative, e.g. PBI RS specific settings on report Data source or report schedule level.
BTW, Power BI Ideas is good for some public cool features with wow effect like "Option to use either light or dark interface.", and also is it only for mass market Power BI Desktop and Power BI Service. For anything else, especially for technical related features in Power BI RS in enterprise environment, it is black hole.
Maybe one truly "idea", instead of this non-fully functional feature, will be much better bring another better feature.
E.g. possibility to define (if you want) dependency settings between data sources in PowerQuery editor (like to dependency setting between data flows in SSIS). This would enable much more control and power over data refresh proces, include run in correct order, run sequentially or paralely, run conditionally (on success/completed predecessor or on variable) and much more.
Yes, it need little more develompment, but I think, it is good "idea".
@josef78 That is an important issue. If you have a Pro license please submit a Pro support ticket to Microsoft.
Support | Microsoft Power BI . If you cannot do that I'll see if I can reproduce your finding and will raise a ticket myself.
How are you proving that queries run in parallel? Do you use the gateway logs?
I have Power BI Pro, but it is not related to cloud Power BI Service or PBI Gateway. It is related to on-premise Power BI Report Server. Please can you raise ticket internally?
Simple repro steps:
-Use Power BI Desktop RS (eg. January 2021)
-Check "Enable parallel loading of tables" in Options in Data Load section is "Enabled"
-Get data from SQL Server database, and use SQL statement to load data which is loading for some time, you can use statement "waitfor delay '00:00:30';select getdate() timestamp;" to simulate query running for 30second, and also provide some timestamp
-Add second separated data source from same server, you can use same SQL statement
-To report layout add two independent visualization to show timestamp values from both sources
-Run Refresh in Power BI Desktop, tables are loaded parallelly and difference of both timestamps is less then 30s (typically about 1s) = It is OK
-Under Options change "Enable parallel loading of tables" to be "unchecked"
-Run Refresh again, tables are loaded sequentially and difference of both timestamps is greater or equal to 30s (depends on how long is first query running), also timestamp from second source is greater than from first (running in correct order) = it is OK
-Save this report to Power BI Report Server - January 2021 release (PBIRS)
-On report in PBIRS setup data source and scheduled refresh
-Run scheduled refresh, and after finish refresh open report in PBIRS
-Report shows time difference between both timestamp less than 30s, and it is mean it run parallelly, and it is wrong