Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
gp10
Advocate III
Advocate III

Redshift and Gateway issue

Hi community,
I'm facing issues when trying to create a dataflow with Amazon Redshift data. The data that need to ne imported are over 1bn rows.
I have tried to connect both with the Amazon Redshift connector and ODBC.
The evaluation in Power Query is completed.
But the dataflow refreshes only when I select a really low number of days in the "Store rows from the past" field of the incremental refresh settings.
Whenever I try to store the entire date range I'm getting errors. This is a gateway issue, are there any settings in the gateway that can help?
I have also noticed that when I try to refresh, the VM's CPU, where the gateway is installed, goes instantly to 100% and its being consumed by multiple Microsoft Mashup Evaluation Container tasks.

I'm getting different errors, among others I'm getting this error:

;Parameter name: source [GatewayPipelineErrorCode]=DM_GWPipeline_Gateway_MashupDataAccessError [ErrorShortName]=GatewayClientErrorResponseException[GatewayId=1379774]/MashupDataAccessException[ErrorCode=-2147467259 HResult=-2147467259]/Wrapped(InternalMashupException)[ErrorCode=-2147467259 HResult=-2147467259]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088]/Wrapped(ErrorException)[HResult=-2146233088] [ExceptionErrorShortName]=GatewayClientErrorResponseException[GatewayId=.....'. RootActivityId =........Param1 = <ccon>We're sorry an error occurred during evaluation.</ccon> [DM_ErrorDetailNameCode_UnderlyingErrorCode]=-2147467259 [DM_ErrorDetailNameCode_UnderlyingErrorMessage]=<ccon>We're sorry an error occurred during evaluation.</ccon> [DM_ErrorDetailNameCode_UnderlyingHResult]=-2147467259 [InnerType]=ErrorException Value cannot be null.

Error: Encountered user gateway exception: '[GatewayPipelineErrorCode]=DM_GWPipeline_Gateway_TimeoutError [ErrorShortName]=GatewayTimeoutException [ExceptionErrorShortName]=GatewayTimeoutException'. RootActivityId = e6cc17f9-ee3b-4697-b466-e2bfc290bd40.Param1 = [GatewayPipelineErrorCode]=DM_GWPipeline_Gateway_TimeoutError [ErrorShortName]=GatewayTimeoutException [ExceptionErrorShortName]=GatewayTimeoutException

Any ideas?

1 ACCEPTED SOLUTION

Then the pbix file is so big that it cant be published.

You can bootstrap the incremental refresh partitions to keep the original size under 10 GB. Then use XMLA refresh to fill each partition separately. This will then allow you to load datasets that are half the size of your capacity, and supposedly even beyond.

View solution in original post

13 REPLIES 13
lbendlin
Super User
Super User

Data duplication issues are usually caused by incorrect application of the RangeStart and RangeEnd filters. One of them must be inclusive, the other must be exclusive, and they cannot be overlapping.

 

Also keep in mind that incremental refresh is really bad at data changes post fact.  If your data is transactional in nature then you will need to mix the incremental refresh with occasional full refreshes.

Thanks for the info. My data do not change post fact.
As for the RangeStart and RangeEnd filters I'm using the following:
RangeStart
1/1/0001 12:00:00 AM
RangeEnd
12/31/9999 11:59:59 PM
and I add a between filter. The pbix file has correct data. After publishing a filtered version and following the proccess with ALM Toolkit and SSMS and the removal of the filter, the refreshed partions, especially those of the last month, return wrong data. If it was a parameter issue, shouldn't it affect the entire date range?

Thanks again for your time, patience and quality support.

 

As for the RangeStart and RangeEnd filters I'm using the following:
RangeStart
1/1/0001 12:00:00 AM
RangeEnd
12/31/9999 11:59:59 PM
and I add a between filter.

 

 

Please don't do that. Do not use fake dates, and do not use a between filter (as that would make both dates inclusive).

 

Instead, use a subset of your dates (just enough for development)

 

RangeStart
1/1/2022 12:00:00 AM
RangeEnd
1/1/2023 12:00:00 AM

 

and then in your Power Query use a filter

 

[Date]>=RangeStart and [Date]<RangeEnd

That was it.
Thanks so much @lbendlin, amazing support and guidance. Thanks.

Ashrafsleem2020
Frequent Visitor

Same Issue

The operation is timed out.. The exception was raised by the IDataReader interface. Please review the error message and provider documentation for further information and corrective action. Table: beginning_balance.

 

 

Data source error:{"error":{"code":"DM_GWPipeline_Gateway_TimeoutError","pbi.error":{"code":"DM_GWPipeline_Gateway_TimeoutError","parameters":{},"details":[],"exceptionCulprit":1}}} 
Cluster URI: 
Activity ID: 
Request ID: 
Time:2023-03-30 10:37:34Z

 

 

 

 

 

 

 

 

 

lbendlin
Super User
Super User

ErrorException Value cannot be null

gaps in primary key columns?

 

What are your incremental refresh settings? Which datetime column are you using for the partitioning?

Hi @lbendlin ,
no gaps in any primary key column. I have tried with remove empty in power query but still get the same error.
Also, in other attempts I was deleting the primary key column to reduce size, since its not needed, but still no luck.

I have tried various incremental refresh setttings, from trying to pull the entire dataset with a setting like
bring data from last 5 years and refresh last 10 days, to last 3 months and last 5 days for example.

I'm trying to load a single table. I'm using it's datetime column.

Instead of a dataflow try using a dataset with incremental refresh.

The issue with a dataset @lbendlin is that incremental refresh doesn't seem to work with Amazon Redshift. Then the pbix file is so big that it cant be published. And if it is in a publishable size, again the first refresh will fail with similar errors from the gateway.

Then the pbix file is so big that it cant be published.

You can bootstrap the incremental refresh partitions to keep the original size under 10 GB. Then use XMLA refresh to fill each partition separately. This will then allow you to load datasets that are half the size of your capacity, and supposedly even beyond.

Thanks for this suggestion @lbendlin , I have managed to do that with SSMS and ALM Toolkit.

The big issue now is that the data are not accurate. For some reason its like it duplicates the data of the last month and it may have small differences elsewhere.

Any ideas on how to deal with this, I have refreshed several times and the issue persists.

Use DAX Studio or SSMS to check the partitions and their content. Refresh individual partitions as needed.

I did that several times but still the data are wrong. Especially in the last month where it creates daily instead of monthly partitions. It looks like its returning double and more than the actual rows. 
Could this have to do with the Amazon Redshift power bi connector?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors