Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
WhenWe
Regular Visitor

Dataflow refresh failing - is this a limitation of PPU

We use PPU.  We have a dataflow which connects to our MS Azure Blob storage with JSON files.  New JSON files are added constantly as activities happen.  We experience around +-1000 new JSON files per hour ranging from a few hundred KiB to several MiB per file.  We configured Incremental refresh on the Dataflow which works well and a Scheduled refresh of every 4Hrs.

 

WhenWe_2-1674973408448.png

 

 

All works well for a several days and then it all stops with the following error message

 

There was a problem refreshing your dataflow

Your xxxxxxxxxx dataflow couldn’t be refreshed because there was a problem with one or more entities, or because dataflow capabilities were unavailable.

 

 

You will see from the refresh history below it was working well, refreshing up to 2:17hrs and then it stopped.  We have not made any changes to the data layout, formats, etc.

 

WhenWe_1-1674972656845.png

 

We have tried manual refresh but this gives the same result.  We let the refreshing run for several occurences thinking there might be a problem with the PBI system (linked to the global Microsoft problem recently).  But the problem still persists.

 

Is this an issue of too much data requiring refresh and PPU has a limited capacity ?

- We have noticed that when we delete JSON files from Azure and only leave 1 to 2 days of transactions, the refresh is successful.  This reduces the amount of data processed which leads us to assume the processing storage capacity of PPU as a possible issue.

- If this is a limitation of the PPU license in terms of capacity, is there a way we can increase the capacity or do we need to purchase Premuim Capacity?

- BTW - we do have other similar connections to MS Azure Blob (not the same Blob) and these are running fine but with a LOT less volume of JSON files.

 

OR can someone please advise/point us in the direction as to a recommended solution?

6 REPLIES 6
R1k91
Super User
Super User

maybe you're facing saturation in resources during Mashup operations happening in the dataflow.

I would go as your second message:

- use ADF or Synapse Pipelines to load JSON in a SQL DB (CDC could help Change data capture - Azure Data Factory & Azure Synapse | Microsoft Learn)

- configure incremental refresh on dataset fthat points to SQL DB you'll be able to load data quickly using RageStart and RageEnd that folds on SQL sources.


--
Riccardo Perico
BI Architect @ Lucient Italia | Microsoft MVP

Blog | GitHub

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you R1k91.  

 

This seems to be the next step we need to try.  Thank you for your response.  Much appreciated

WhenWe
Regular Visitor

Additional to previous posting:

Would it be recommended to rather upload the JSON files into a SQL DB on Azure and then use the SQL DB in PBI Direct Query mode? 

The volume of JSON files increases every hour by +-1, 000 new JSON files, resulting in massive data volumes.

 

Thank you for any assistance 

Hi,

 

Additional to the above information, I deleted days of storage and left only 1 day.  The refresh now runs smoothly and successfuly completes with an incremental refresh working well again.

 

But I expect to hit the same problem a few days down the road

 

My workspace capacity is only on 8MB - so I'm far from the 10GB limitation on workspace. 

 

I also checked the comparative table for limitations between PPU and PP Capacity for storage - Max Storage for both is 100TB, so that's not the problem either.

 

Does anyone have any other suggestions we could try before trying out the SQL DB approach ?

 

 

WhenWe_1-1675057410963.png

 

WhenWe_0-1675057332652.png

 

Waiting for others suggestions I don't think your problem has something to deal with Storage.

you're in a dataflow and it fails before it completes. My assumption is you're hitting something in the Mashup Engine machine that works behind the scenes of a dataflow.

when dealing with so many text/json files I always try to load them in a RDBMS or at least convert them in Parquet

 


--
Riccardo Perico
BI Architect @ Lucient Italia | Microsoft MVP

Blog | GitHub

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Ok - thank you for your reply.  We are investigating as per your original suggestion on Synapse.  

 

Really appreciate your replies.  Thank you

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.