Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Reply
csana23
Regular Visitor

PDF file size increase after upload to Fabric Lakehouse via Power Automate - unusable file

Hello everybody, I would like to get some insight on the following issue.

 

I have a simple Power Automate/Logic App flow that is supposed to upload a PDF file (coming in as an attachment of an email) to a Fabric lakehouse. Here is the entire flow: Entire flow

 

I use a 2-step process that is described by the Azure Data Lake Storage Gen 2 REST API documentation: first I create an empty file in the lakehouse folder with the ?resource=file parameter added to the HTTP request (HTTP request created with a Compose component): Compose component setup

 

First HTTP request with PUT method

 

Then I assemble another HTTP request (this time PATCH method, using the append and flush actions together, then adding the byte contents of the file to the request body): Second HTTP request with PATCH

For the HTTP requests, I'm using the HTTP with Microsoft Entra ID (preauthorized) connectors.

The result is seemingly correct, because something is uploaded to the lakehouse, however it is bigger in size than the original file (3MB instead of 2MB). This might not seem like a big problem, but when trying to process the file (in my case split it up into its individual pages - pagination), the process fails. Therefore something in the file is being corrupted.

 

Now the question is: how to upload this binary file to the lakehouse without any bloat added to it by Power Automate?

 

Here is some documentation I've followed: https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update?view=rest... https://www.linkedin.com/pulse/how-call-onelake-api-from-power-automate-enterprise-app-nigel-smith-4...

 

So I've read up quite a bit on this and here are the possible answers so far:

  • Power Automate might add some JSON information to the content bytes, thus increasing the file size
  • Power Automate base64-encodes the attachment which in turn increases the file size, compared to the raw binary representation
  • When running the append and flush methods together, for some reason that can increase the file size
  • However when separating them to two HTTP requests, then the flow fails (error message: Action 'Invoke_an_HTTP_request_1-copy' failed: The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data.), because the position parameter of the URI "doesn't have the correct size of the file" - I'm getting the attachment size from the Outlook connector as a parameter. Now this is interesting because this pretty much confirms that the append method adds something extra to the file content bytes.
  • I tried the decodeBase64, base64ToBinary functions applied to the attachment content bytes to no avail
  • Another interesting phenomenon is when I use the file upload functionality directly in the lakehouse, the uploaded file size is correct and the file can be processed. Again, a clue that Power Automate is adding something to the content bytes that it really shouldn't.

Here is the link to the StackOverflow question: https://stackoverflow.com/staging-ground/79425706

1 ACCEPTED SOLUTION

Hey @v-saisrao-msft , 

Thank you for following up on this!

 

Well, this specific issue is still not solved but I have found a workaround (which is an understatement): We load the email attachments to an Azure Blob Storage via Logic App, and then we initiate a data pipeline in Fabric that reads all files in this given Blob Storage Container and copies them to the lakehouse's Files section. From this copy, the files arrive without any corruption and can be further utilized and processed (meaning pagination and processing by Document Intelligence service).

 

I have also created a ticket with Microsoft Support and we are still in discussion about how to resolve the issue of corrupting the files in Power Automate.

 

So if it is possible, please don't close this topic yet, because if we find a solution, I would like to post it, I'm sure it will be helpful for others as well.

 

Thank you! 

View solution in original post

12 REPLIES 12
v-saisrao-msft
Community Support
Community Support

Hi @csana23,

Could you please confirm if the issue was resolved after raising a support case? If so, we’d appreciate it if you could share the solution to help others in the community. As we haven’t heard back, we’re closing this thread. For any further issues, please raise a new thread in the Microsoft Fabric Community Forum — we’ll be happy to assist.

Thank you for being part of the Microsoft Fabric Community.

v-saisrao-msft
Community Support
Community Support

Hi @csana23,

 

We haven’t heard back from you regarding your issue. If it has been resolved, please mark the helpful response as the solution and give a ‘Kudos’ to assist others. If you still need support, let us know.

 

Thank you.

Hey @v-saisrao-msft , 

Thank you for following up on this!

 

Well, this specific issue is still not solved but I have found a workaround (which is an understatement): We load the email attachments to an Azure Blob Storage via Logic App, and then we initiate a data pipeline in Fabric that reads all files in this given Blob Storage Container and copies them to the lakehouse's Files section. From this copy, the files arrive without any corruption and can be further utilized and processed (meaning pagination and processing by Document Intelligence service).

 

I have also created a ticket with Microsoft Support and we are still in discussion about how to resolve the issue of corrupting the files in Power Automate.

 

So if it is possible, please don't close this topic yet, because if we find a solution, I would like to post it, I'm sure it will be helpful for others as well.

 

Thank you! 

Hi @csana23,

I would like to confirm if you were able to resolve the issue after raising the support ticket.

If so, please provide the insights and accept it as a solution.

Hi @csana23,

Thank you for providing this update. It's great to hear that you've opened a ticket with Microsoft Support on this matter. We will keep this topic open, as your insights and any resolution could be valuable to others facing similar challenges.

 

Thank you.

csana23
Regular Visitor

Hey @v-saisrao-msft

Thank you very much for the quick help!

I will implement these steps and get back to you with my findings!

hello, I am using same flow. My issue is that PDL file is getting created but is blank. Flow works for text file but not for PDF, image. can someone help on this?

v-saisrao-msft
Community Support
Community Support

Hi  @csana23,

Thank you for reaching out to the Microsoft Froum Community. 

 

It appears that Power Automate might be altering the binary content of your PDF file during the upload, which is causing the file size to increase and leading to corruption issues. To ensure the file is transferred correctly, please follow these key steps: 

  • When extracting the attachment from the email, verify if the content is base64-encoded by inspecting the output of the email trigger. If it is base64-encoded, use the base64ToBinary() function in Power Automate to convert it back to binary format. 
  • Before uploading, ensure that the extracted attachment's file size matches the original. If the file size is larger at this point, Power Automate might be adding metadata or encoding the file differently.  
  • Please Use a PUT request to create an empty file in the lakehouse. Ensure the URL includes the ?resource=file parameter.  
  • Submit a PATCH request to append the binary content to the empty file. The URL should include ?action=append&position=0. Set the Content-Length header to the size of the binary data, calculated with the length() function. Use the output of the base64ToBinary() conversion or the raw binary data as the request body.
  • Send another PATCH request to flush the content and finalize the file. The URL should include ?action=flush&position=, where the position reflects the total size of the file after appending the data. 

Since uploading manually to the Lakehouse results in the correct file size, this suggests that Power Automate is modifying the content in some way.Comparing the manually uploaded file with the one from Power Automate can help identify differences.  

Utilize the "Run history" feature in Power Automate to review the precise output of your email trigger. This will assist in verifying whether the attachment content is base64-encoded and ensure the correct path is followed. 

 

If this post helps you, please mark it as the solution and give a kudo so that other members of the community can easily find it helpful. 

 

Thank you. 

 

 

Hey @v-saisrao-msft,

 

Thank you for the tips, I implemented the process you suggested, here are my findings!

 

Initialized a variable in the flow to capture the attachment output of the Outlook trigger and it really is a base64-encoded string sequence:

csana23_0-1739266345786.png

 

The size of the attachment content in bytes after retrieving it with the Attachment Content parameter is 

2071820 bytes. However the raw file on my disk is 2071518 bytes. Now this is expected as base64 encoding increases the file size. So no conclusion can be drawn about Power Automate adding bytes to the original file, as of yet.
 
But when trying to determine the size of the file that has been converted back to binary, the length() function fails. I'm using this formula:
 length(base64ToBinary(item()?['contentBytes']))
And the error message: 
Action 'Set_variable_1' failed: Unable to process template language expressions in action 'Set_variable_1' inputs at line '0' and column '0': 'The template language function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'. 
 
I tried a work-around, calculating the binary file size in this manner:
sub(div(mul(item()?['size'], 3), 4), if(endsWith(item()?['contentBytes'], '=='), 2, if(endsWith(item()?['contentBytes'], '='), 1, 0)))
But this is only an approximation, therefore not usable... 
 
So I tried another approach: forget all base64ToBinary conversions and let's just use whatever is coming out of the Outlook connector. So building the three steps:
1. PUT request with ?resource=file
2. PATCH request with ?action=append&position=0, specifying Content-Length header as Attachment Size parameter (item()?['size']), adding item()?['contentBytes'] to Body of the request
3. PATCH request with ?action=flush&position=item()?['size']
 
And the result is: the flow never finishes. The reason I think is because at step 3, the position argument is not correct - in the sense that the size of the attachment is OK but it isn't the real size of the entire request that the HTTP connector is sending. So there must be a mismatch between the Content-Length specified in step 2 and the actual payload size. 
 
I've also seen that it is not recommended to set the Content-Length in most Power Automate scenarios because of this exact reasons. 
 
At this point I strongly suspect that Power Automate is adding bytes "behind the scenes" and that messes up the end result in the lakehouse. What could be these extra bytes? A theory I came accross is that the HTTP connector is doing chunking of the request body and therefore adding padding to the chunks and the end result is basically the base64 encoding+chunking that results in an almost 2MB file size increase.
 
An interesting finding: I followed the exact steps outlined in this tutorial, so just uploading text ("Test") to a text file in the lakehouse, specifying the position parameter as 4 (4 characters = 4 bytes) in the flush step and that worked perfectly. So if we could get the position parameter in the flush step correctly, I think that would solve this case!
 
Thank you very much for the continued support!

Hi @csana23,

 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

 

Thank you.

Hi @csana23,

 

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

 

Thank you.

Hi @csana23, 

 

Sorry for the delayed response and thank you for your patience. Thank you for providing detailed information about your findings and the steps you've taken thus far.  

  • Since the length() function is not working, consider using the string() function to convert the binary data to a string before applying the length() function. This approach may help in obtaining the correct size. 
  • Please ensure that the Content-Length header in the PATCH request accurately reflects the size of the binary data being transmitted. If Power Automate is appending extra bytes, you may need to manually adjust this value. 
  • If Power Automate is performing chunked uploads, it is essential to manage each chunk individually and ensure the position parameter is accurately updated after appending each chunk. 
  • Utilize the "Run history" feature in Power Automate to review the precise content sent in each request. This will assist in identifying any extra bytes or metadata being appended. 

If this post helps you, please mark it as the solution and give a kudo so that other members of the community can easily find it helpful.  

  

Thank you. 

 

 

Helpful resources

Announcements
Fabric Data Days is here Carousel

Fabric Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.