Re: Json.Document and Table.GenerateByPage slow pe...

xli629b · ‎08-21-2024

We have a Power BI connector that receives JSON data page-by-page. We see that performance in terms of rows of data loaded per second lags behind that of an ODBC driver very significantly.

In particular, there were a couple of performance bottlenecks that I indentified:

It takes about 1.4 seconds to parse a 30MB JSON object with Json.Document. By contrast, it only takes a small fraction of a second to do the same in Node.js.
- Is there any faster way to parse JSON (maybe we're calling it the wrong way?) or can the implementation of Json.Document be potentially improved if it's an inherent limitation?
Furthermore, we used the Table.GenerateByPage sample code to combine pages of data together. However, this seems to take about 2.4 seconds to add each ~30MB page of data to the combined table, which seems very slow.
- Could there be any other way to combine tables together in Power Query that still works for very large datasets? We can also use Table.Combine, but it seems to slow down greatly/fail for 2-3 GB of data.

lbendlin · ‎08-21-2024

You could consider running Python or R scripts to parse the JSON but I'd think these may not be faster.

Binary.Buffer might speed up some of the network transfers but probably won't help with parsing.

As for the sample code - your main tools are List.Generate and List.Accumulate . Both have their strengths and weaknesses, you'd have to test how each behaves in your scenario. Then use Table.ExpandTableColumn rather than Table.Combine.

xli629b · ‎09-09-2024

Thank you for the recommendations. We don't think we can use Python/R scripts in our connector itself as I think these would create external dependencies for our clients.

I have tried using Binary.Buffer on the response, and it seem that it's slightly slower (measured by total processing time) than directly passing the result of Web.Contents to Json.Document.

I think we do need to always use List.Generate as we don't have a pre-determined number of pages, but only learn whether there is a next page or not after receiving each page. I have also tried List.Combine on raw lists (rather than of tables) instead of Table.ExpandTableColumn and Table.Combine, but the performance seems to be similar.

Do you think there might be anything else that could affect the performance?

lbendlin · ‎09-09-2024

The size of the JSON payload plays a big role. See if you can get a less chatty endpoint that only has the fields you need.

Here's an interesting article, slightly off topic but may help Effective Strategies for Storing and Parsing JSON in SQL Server - Simple Talk (red-gate.com)

Json.Document and Table.GenerateByPage slow performance/alternatives

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Json.Document and Table.GenerateByPage slow performance/alternatives

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025