The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
We have a Power BI connector that receives JSON data page-by-page. We see that performance in terms of rows of data loaded per second lags behind that of an ODBC driver very significantly.
In particular, there were a couple of performance bottlenecks that I indentified:
You could consider running Python or R scripts to parse the JSON but I'd think these may not be faster.
Binary.Buffer might speed up some of the network transfers but probably won't help with parsing.
As for the sample code - your main tools are List.Generate and List.Accumulate . Both have their strengths and weaknesses, you'd have to test how each behaves in your scenario. Then use Table.ExpandTableColumn rather than Table.Combine.
Thank you for the recommendations. We don't think we can use Python/R scripts in our connector itself as I think these would create external dependencies for our clients.
I have tried using Binary.Buffer on the response, and it seem that it's slightly slower (measured by total processing time) than directly passing the result of Web.Contents to Json.Document.
I think we do need to always use List.Generate as we don't have a pre-determined number of pages, but only learn whether there is a next page or not after receiving each page. I have also tried List.Combine on raw lists (rather than of tables) instead of Table.ExpandTableColumn and Table.Combine, but the performance seems to be similar.
Do you think there might be anything else that could affect the performance?
The size of the JSON payload plays a big role. See if you can get a less chatty endpoint that only has the fields you need.
Here's an interesting article, slightly off topic but may help Effective Strategies for Storing and Parsing JSON in SQL Server - Simple Talk (red-gate.com)