Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
kirah2128
Helper I
Helper I

One Large Text File Parsing 200MB Error 500

Dear Community,

Have you encountered this problem? I'm processing a text file and using the code below. The script is running successfuly on my laptop but with Fabric it's not working. Do you have other solutions on this? I have tried increase the workspace to the larges nodes available and still same issue arrised. 

 

 

with open (amm, 'rt') as myfile:  # Open lorem.txt for reading text
    contents = myfile.read()              # Read the entire file to a string
print(contents)   
# Split Text body by delimiter <TASK
split1 = contents.split('<TASK')    

# Create a Dataframe of n number of Rows based on the Split
update_df = pd.DataFrame(split1, columns =['row_text'])

# Delete first row and retain only AMM Tasks
df = update_df.drop(0)

 

 

LivyHttpRequestFailure: Something went wrong while processing your request. Please try again later. HTTP status code: 500. Trace ID: f606057c-c7bd-46de-8c3f-c215cc076b03.
 
Regards,
King
1 ACCEPTED SOLUTION

Dear Everyone,

 

the problem is solved =D by removing the print() function. 

# print(contents)   

For some reason the fabric can't handle large concatinated string. 

View solution in original post

10 REPLIES 10
kirah2128
Helper I
Helper I

To all, this is the latest discovery. They will go deeper invgestigation on this as we found this warning during the execution of the simple read and print of 200mb text file

 

2024-01-03 05:09:38,976 WARN ClientCnxn [Thread-68-SendThread(vm-fdb01374:XXXX)]: Session 0x10000008c0c0002 for sever vm-fdb01374/XX.X.XXX.X:XXX, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.

org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 73538ms for session id 0x10000008c0c0002

 

 

To all following this issue:, They asked us to increase the workspace setting, but before reaching them we already did and increase it to XXlarge node but still same issue. 

kirah2128
Helper I
Helper I

We will have a support session tomorrow. I will update everyone here. 

Dear Everyone,

 

the problem is solved =D by removing the print() function. 

# print(contents)   

For some reason the fabric can't handle large concatinated string. 

Hi @kirah2128 ,

Glad to know your query got resolved. Please continue using Fabric Community for your further queries.

HimanshuS-msft
Community Support
Community Support

Hi @kirah2128 

Since small files are working fine and only the big files are failing , I suggest you to work with the Microsoft support team on this . The reason being that they will have the details as what is the real error under the hood . At this time I do not think that cluster resource can be an issue as the spark is deisgned to handle big loads , Once you have created a support ticket , please do share the same here so that we can also keep an eye on the same .

Thanks 
Himanshu 

Hi @kirah2128 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Did you got a chance to create a support ticket? If you have created a support ticket , please do share the same here so that we can also keep an eye on the same.

HimanshuS-msft
Community Support
Community Support

Hello @kirah2128 
Are you sure that its the size of the file which which is creating the issue ? I am just trying to make sure that you have tried to process a smaller file and if that works . 
Also I am assuming that the path of the file ( may be the file is in Lakehouse ) is correct . 
I will wait to hear back on this from you .
Thanks 
Himanshu 

I tried with smaller file around 2mb and its working just fine. Both files are located in the same folder location. But the one I raised is still not working. 

v-gchenna-msft
Community Support
Community Support

Hello @kirah2128 ,

At this time, we are reaching out to the internal team to get some help on this .
We will update you once we hear back from them.

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.