Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
xli629b
Helper II
Helper II

Json.Document and Table.GenerateByPage slow performance/alternatives

We have a Power BI connector that receives JSON data page-by-page. We see that performance in terms of rows of data loaded per second lags behind that of an ODBC driver very significantly. 

In particular, there were a couple of performance bottlenecks that I indentified:

  • It takes about 1.4 seconds to parse a 30MB JSON object with Json.Document. By contrast, it only takes a small fraction of a second to do the same in Node.js.
    • Is there any faster way to parse JSON (maybe we're calling it the wrong way?) or can the implementation of Json.Document be potentially improved if it's an inherent limitation?

  • Furthermore, we used the Table.GenerateByPage sample code to combine pages of data together. However, this seems to take about 2.4 seconds to add each ~30MB page of data to the combined table, which seems very slow.
    • Could there be any other way to combine tables together in Power Query that still works for very large datasets? We can also use Table.Combine, but it seems to slow down greatly/fail for 2-3 GB of data.
3 REPLIES 3
lbendlin
Super User
Super User

You could consider running Python or R scripts to parse the JSON but I'd think these may not be faster.

 

Binary.Buffer might speed up some of the network transfers but probably won't help with parsing.

 

As for the sample code - your main tools are List.Generate and List.Accumulate . Both have their strengths and weaknesses, you'd have to test how each behaves in your scenario. Then use Table.ExpandTableColumn rather than Table.Combine.

Thank you for the recommendations. We don't think we can use Python/R scripts in our connector itself as I think these would create external dependencies for our clients.

I have tried using Binary.Buffer on the response, and it seem that it's slightly slower (measured by total processing time) than directly passing the result of Web.Contents to Json.Document.

I think we do need to always use List.Generate as we don't have a pre-determined number of pages, but only learn whether there is a next page or not after receiving each page. I have also tried List.Combine on raw lists (rather than of tables) instead of Table.ExpandTableColumn and Table.Combine, but the performance seems to be similar.

Do you think there might be anything else that could affect the performance?

The size of the JSON payload plays a big role. See if you can get a less chatty endpoint that only has the fields you need.

 

Here's an interesting article, slightly off topic but may help Effective Strategies for Storing and Parsing JSON in SQL Server - Simple Talk (red-gate.com)

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.