Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
shuhn1229
Resolver I
Resolver I

Understanding Incremental Refresh

Hi all,

 

Need some advice / help with this.

 

I have a query that works with an intranet site via a REST API to call POST The output is JSON and requires transformation. In order to do scheduled refresh, I setup a data gateway and this works well for most of my queries using that datasource. Recently, I built a new query that retrieves about 1.2 million rows and about 100 megs worth of data (using the same code above and datasource). This query returns a datamashup error when I attempt to use schedule refresh, but works perfectly via desktop.

 

{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","parameters":{},"details":[{"code":"DM_ErrorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"<ccon>The underlying connection was closed: The connection was closed unexpectedly.</ccon>"}},{"code":"DM_ErrorDetailNameCode_UnderlyingHResult","detail":{"type":1,"value":"-2147467259"}},{"code":"Microsoft.Data.Mashup.ValueError.Detail","detail":{"type":1,"value":"<ccon>MYURL</ccon>"}},{"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataSource.Error"}}],"exceptionCulprit":1}}} Table: Custom1
 

 

 

I think this is due to the size of the query and the request timing out. This is because all queries using that very similiar code (only changed by a query ID) are much smaller in size. I was advised to try incremental refresh to circumvent this issue. However, I'm wondering if this will actually solve my problem. As I understand via incremental refresh we partician the data by date and only subject particians which likely contain data changes to the refresh, cutting down on the amount of total data to be refreshed. However, this will not be useful to me as any of the data is subjected to change. So with that being said:

 

  • Any thoughts on what be resulting in this mashup error? As I stated same code and datasource, only the size of the query. My understanding is that the limit for data size for scheduled refresh is 2GB, which I am not close to at all.
  • What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 
  • If not, how might we go about this? I was thinking of doing a separate query filtered by only after date X, and essentially only refreshing this data daily and joining it to the historical source. Seems circuitious. 

thanks!

Steve

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

Hi, 

 

Thanks for your help. Figured out the issue... sort of. We ended up caching the results of the query and then trying cloud refresh... and viola, it works. So this error had to do with the lag time between getting the results of the query and sending the request.

 

TY

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.