Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
shuhn1229
Resolver I
Resolver I

Understanding Incremental Refresh

Hi all,

 

Need some advice / help with this.

 

I have a query that works with an intranet site via a REST API to call POST The output is JSON and requires transformation. In order to do scheduled refresh, I setup a data gateway and this works well for most of my queries using that datasource. Recently, I built a new query that retrieves about 1.2 million rows and about 100 megs worth of data (using the same code above and datasource). This query returns a datamashup error when I attempt to use schedule refresh, but works perfectly via desktop.

 

{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","parameters":{},"details":[{"code":"DM_ErrorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"<ccon>The underlying connection was closed: The connection was closed unexpectedly.</ccon>"}},{"code":"DM_ErrorDetailNameCode_UnderlyingHResult","detail":{"type":1,"value":"-2147467259"}},{"code":"Microsoft.Data.Mashup.ValueError.Detail","detail":{"type":1,"value":"<ccon>MYURL</ccon>"}},{"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataSource.Error"}}],"exceptionCulprit":1}}} Table: Custom1
 

 

 

I think this is due to the size of the query and the request timing out. This is because all queries using that very similiar code (only changed by a query ID) are much smaller in size. I was advised to try incremental refresh to circumvent this issue. However, I'm wondering if this will actually solve my problem. As I understand via incremental refresh we partician the data by date and only subject particians which likely contain data changes to the refresh, cutting down on the amount of total data to be refreshed. However, this will not be useful to me as any of the data is subjected to change. So with that being said:

 

  • Any thoughts on what be resulting in this mashup error? As I stated same code and datasource, only the size of the query. My understanding is that the limit for data size for scheduled refresh is 2GB, which I am not close to at all.
  • What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 
  • If not, how might we go about this? I was thinking of doing a separate query filtered by only after date X, and essentially only refreshing this data daily and joining it to the historical source. Seems circuitious. 

thanks!

Steve

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

Hi, 

 

Thanks for your help. Figured out the issue... sort of. We ended up caching the results of the query and then trying cloud refresh... and viola, it works. So this error had to do with the lag time between getting the results of the query and sending the request.

 

TY

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.