Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
shuhn1229
Helper III
Helper III

Understanding Incremental Refresh

Hi all,

 

Need some advice / help with this.

 

I have a query that works with an intranet site via a REST API to call POST The output is JSON and requires transformation. In order to do scheduled refresh, I setup a data gateway and this works well for most of my queries using that datasource. Recently, I built a new query that retrieves about 1.2 million rows and about 100 megs worth of data (using the same code above and datasource). This query returns a datamashup error when I attempt to use schedule refresh, but works perfectly via desktop.

 

{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","parameters":{},"details":[{"code":"DM_ErrorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"<ccon>The underlying connection was closed: The connection was closed unexpectedly.</ccon>"}},{"code":"DM_ErrorDetailNameCode_UnderlyingHResult","detail":{"type":1,"value":"-2147467259"}},{"code":"Microsoft.Data.Mashup.ValueError.Detail","detail":{"type":1,"value":"<ccon>MYURL</ccon>"}},{"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataSource.Error"}}],"exceptionCulprit":1}}} Table: Custom1
 

 

 

I think this is due to the size of the query and the request timing out. This is because all queries using that very similiar code (only changed by a query ID) are much smaller in size. I was advised to try incremental refresh to circumvent this issue. However, I'm wondering if this will actually solve my problem. As I understand via incremental refresh we partician the data by date and only subject particians which likely contain data changes to the refresh, cutting down on the amount of total data to be refreshed. However, this will not be useful to me as any of the data is subjected to change. So with that being said:

 

  • Any thoughts on what be resulting in this mashup error? As I stated same code and datasource, only the size of the query. My understanding is that the limit for data size for scheduled refresh is 2GB, which I am not close to at all.
  • What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 
  • If not, how might we go about this? I was thinking of doing a separate query filtered by only after date X, and essentially only refreshing this data daily and joining it to the historical source. Seems circuitious. 

thanks!

Steve

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

I think this is due to the size of the query and the request timing out

I don't think so. That would have resulted in a different error message.  Might have been a transient network connectivity issue, or someone restarting the service on the other end.

 

What I am looking to do is to refresh data only where a value in a column has changed to cut down on the size of the output handled by scheduled refresh. would incremental refresh help with this? 

yes and no. While incremental refresh offers to check for data changes this comes at a horrible cost - a canary dataset copy will be created that then does the delta tracking.

 

Fundamentally incremental refresh expects immutable data (like IoT sensor readings).  It is very bad at real world business data (which would require differential refresh).  You can mitigate that by keeping your partitions small and by refreshing individual partitions as needed, and doing a full refresh every now and then.

Hi, 

 

Thanks for your help. Figured out the issue... sort of. We ended up caching the results of the query and then trying cloud refresh... and viola, it works. So this error had to do with the lag time between getting the results of the query and sending the request.

 

TY

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.