Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
motoray
Helper V
Helper V

Gen2 Dataflow failures when loading medium sized datasets

I've been trying to get a medium-sized (100,000 records for now) dataset to load from a SQL database (on prem and Azure) to a Fabric Warehouse from a Gen2 dataflow. I can get it to load 1000 or 10000 records, but once I step up to higher numbers, I get the error below. And yes--port 1433 is open. Also--I can load all the data no problem with a native Power BI dataset or a Gen1 dataflow. 

 

It's almost like the connection to the source database times out after a certain period of time.

 

Mashup Exception Data Source Error Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: DataSource.Error: Microsoft SQL: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) Details: DataSourceKind = Lakehouse;DataSourcePath = Lakehouse;Message = A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.);ErrorCode = -2146232060;Number = 10054;Class = 20

26 REPLIES 26
motoray
Helper V
Helper V

So have been doing some more testing (things magically started working again last week) and am starting to think there might be a max # of times per [time period] that I can load data to the warehouse. I tried with a dataset with 85000 rows and got a failure--host closed the connection--but then tried an hour later and it worked. When I hit the 85000 mark, I'd been gradually ramping up from 1000, so 4-5 runs in the space of under an hour. Still testing to see if this is the real issue, but would be great if someone from Microsoft could let me know whether there is indeed some kind of throttling going on that prevents repreated loads into the same table...

Things seem to have stabilized for me now and I'm able to successfully load full datasets from onprem SQL directly to a Warehouse or Lakehouse. The one thing I'm still seeing--and just noticed this yesterday--is that if I have a brand new warehouse/lakehouse and try do a load to a new table with a Gen2 dataflow, it will fail if I attempt to load the full dataset the first time. If I load 100 rows, it succeeds and then I'm able to pull the filter and load everything.

SidJay
Microsoft Employee
Microsoft Employee

Hi motoray,

 

This is unexpected behavior with the SQL endpoint and we are still actively debugging the issue. The issue (connection forcibly closed by remote host) is sporadic and there isn't a specific limit or throttle that you're running into.

 

Thanks

Thanks for working with me on this. It was really good to hear from you and have the issue confirmed yesterday.

SidJay
Microsoft Employee
Microsoft Employee

Hi motoray,

 

Thanks for confirming. We are continuing the investigation.

SidJay
Microsoft Employee
Microsoft Employee

Hi motoray & ShunOshidari,

 

A question to help us with the investigation: Have you modified/deleted the automatically generated staging artifacts that were generated in the workspace? These are the automatically generated Lakehouse, Warehouse, etc. with "Staging" in the name.

 

Thanks

Yes, I have previously deleted staging artifacts. After that, when I created a new dataflow item, the artifacts were recreated, so I thought it was fine. However, following your advice, I created a new workspace to test it out and noticed that only DataflowsStagingLakehouse1 was being recreated in the original workspace, while DataflowsStagingWarehouse_xxx remained deleted.

Moving forward, I plan to use the newly created workspace.

However, the issue continues to occur even in the new workspace where no artifacts have been deleted. Like the previous workspace, it sometimes completes successfully and sometimes fails.

These are the Request IDs for an error that occurred in the new workspace. The error message is the same.
ac6411df-22d3-42eb-8f95-5154bab3a21b
5b273482-69b2-4372-9901-bbaa45d71255

What's interesting is that even in cases where the process completes successfully, it takes about 12 to 14 minutes. This seems excessively long for an import process of a 6000-row, 3MB Excel file. When it fails due to an error, it usually fails within about 5 to 7 minutes.

Following miguel's advice, I will also report the same content to https://support.fabric.microsoft.com/support.

Thanks for your support.

I have not deleted the staging artifacts. Learned early on that everything breaks when you delete those, so I deleted those workspaces and started fresh and left the staging lakehouses/warehouses alone. Also--I don't put data in the staging areas or use dataflows to pull data from them either. At most, I open and run a query to check and see if the data is loaded to staging.

SidJay
Microsoft Employee
Microsoft Employee

Hi motoray & ShunOshidari,

 

Thank you for all the information you've provided. The request ids have helped us better identify the symptoms and we are now investigating the root cause. This is not a widespread issue, but some users loading to Lakehouses and Warehouses appear to be getting errors from the SQL endpoint. We are investigating.

 

Thanks

ShunOshidari
Helper I
Helper I

I'm glad I came across your post. We're experiencing a similar issue in our environment. We've recently begun to evaluate Fabric, and we were unsure if this issue had been occurring from the start.

Here's our situation:

Dataflow: Gen2
Source: SharePoint folder (Excel file, 3MB, 6000 rows)
Destination: Warehouse
Capacity region: Japan East

With this Excel file, we managed to successfully import up to approximately 4600 rows. I believe the actual limit might also depend on the number of columns.

As Gen2 uses Lakehouse for Staging, the actual data flow should be as follows:
Excel (SharePoint) -> Staging Lakehouse -> Warehouse

The error message we received is identical to the one pasted in the original post. It appears that the data successfully reaches the Staging Lakehouse, but the Staging Lakehouse is timing out when moving it to the Warehouse.

Interestingly, when I retried the Gen2 job a few hours after encountering the error, it completed successfully. All the data seemed to be present in the Warehouse. However, when I attempted the job again the next day, I encountered the same error.

Our environment is in the Japan East region. Judging by the time of your post, I assume you're likely in a different region. Therefore, I believe this issue isn't confined to a specific region.

I'm currently seeking the right contact within our organization to reach out to Microsoft.

If others viewing this thread are encountering the same situation, it would be greatly appreciated if you could share the circumstances under which it occurred here.

Hi ShunOshiadari,

 

Could you please share the error message you're receving in Refresh History? If you're able to share the Request ID, that would also be helpful to better understand the failure.

 

Thanks

Ah I see. If I include the actual error message in my post, my post gets deleted. (I posted it as "My original post (1/2)")

Please refer to the original post of motoray on the top of this page, which contains exactly the same error message I got.

My original post (2/2)
---

Some of the Request IDs are:
1f446887-bd7a-48e3-8965-c79217c74965
6cd137a6-19cb-47e6-abe2-21053a7a3038 (different dataflow)
7115718e-662b-472a-873b-9ee100e57f50 (another different dataflow)

Hope these help to resolve the issue!

My original post (1/2)
---

Thanks SidJay!

The error message is identical to what motoray posted originally, which is:

Mashup Exception Data Source Error Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: DataSource.Error: Microsoft SQL: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) Details: DataSourceKind = Lakehouse;DataSourcePath = Lakehouse;Message = A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.);ErrorCode = -2146232060;Number = 10054;Class = 20

I am conducting a post test because my responses keep disappearing within seconds no matter how many times I post them. If this post does not disappear, I will then proceed with my original post.

thank you for sharing all of this info! We're investigating internally, but please do reach out to our support team so that they can create an incident case and get the support directly from our team.

 

Use this link to reach the support team: https://support.fabric.microsoft.com/support

 

I'll update this thread once we have more information to share.

Thanks miguel! I will do it.

Already there--case # 2307240010003563

Meister_Kaio
Frequent Visitor

Hi!

It appears that I'm encountering a similar problem, which relates to the number of rows in a file. I can successfully load files with over 100K rows from Azure ALDS Gen2 Storage to a Lakehouse. However, when attempting to transfer files between two Lakehouses, only the files with more than 100K rows encounter failure, while files with fewer rows transfer without any issues.




Now none of my Gen2 dataflows will complete so I'm not sure what's going on anymore. Time to call in Microsoft.

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugFabric_Carousel

Fabric Monthly Update - August 2024

Check out the August 2024 Fabric update to learn about new features.

August Carousel

Fabric Community Update - August 2024

Find out what's new and trending in the Fabric Community.