Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
PVO3
Impactful Individual
Impactful Individual

Dataflows not able to get data from PDF

Hello,

 

I posted a reply in this topic. I urgently need some help so I opened a new topic.

 

The error: "Error: PipelineException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt" is returned when trying to use "Invoke custom function" or Pdf.Tables in a dataflow. The source are PDF files from a Sharepoint folder.

 

I think the error is caused by the extra step "Table.TransformColumnTypes" that dataflows automatically adds. I'm not sure how to prevent dataflows from doing this. This because the 'Table' datatype is not supported by Dataflows. But the end result is just a single Text column. By doing the same query over CSV files, the 'Table' datatype but no error is returned and it works fine. But on the otherhand the preview is correct, even with these added steps.

 

- I have a dataflow and a PBI dataset with the exact same query (that gathers data from PDF's from a Sharepoint folder).

- The PBI dataset is refreshing/working like expected

- Both the Dataset as the Dataflow are in the same Premium workspace which I am the owner of.

- The dataflow uses the advanced engine for calculations

- The preview of the dataflow returns the desired result

- I have similar dataflows running that extract data from txt and xlms files, these run without issues.

- The error occurs exactly at the moment that I use the Pdf.Tables function, or alternativly the "Invoke custom function". Without these steps, it runs fine.

 

Help would be greatly appreciated. Thanks a lot!

 

 

 

 

 

let
  Source = SharePoint.Files("https://SharepointSite", [ApiVersion = 15]),
  Path = Table.SelectRows(Source, each [Folder Path] = "https://SharepointFolderWithPDF"),
  SelectContent = Table.SelectColumns(Path, {"Content"}),
  ExtractContent = Table.AddColumn(SelectContent, "Custom", each Pdf.Tables([Content])),
  #"Kolommen transformeren" = Table.TransformColumnTypes(ExtractContent, {{"Custom", type text}}),
  #"Fouten vervangen" = Table.ReplaceErrorValues(#"Kolommen transformeren", {{"Custom", null}}),
  #"Kolommen verwijderen" = Table.RemoveColumns(#"Fouten vervangen", Table.ColumnsOfType(#"Fouten vervangen", {type table, type record, type list, type nullable binary, type binary, type function}))
in
  #"Kolommen verwijderen"

 

 

 

 

*Last 3 steps automatically added by Dataflows.

1 ACCEPTED SOLUTION
PVO3
Impactful Individual
Impactful Individual

MS confirmed this is currently a known limitation in Premium workspaces.

https://learn.microsoft.com/en-us/power-query/connectors/pdf#power-bi-dataflows-in-a-premium-capacit...

 

Currently trying to get this fixed by using a gateway

View solution in original post

3 REPLIES 3
PVO3
Impactful Individual
Impactful Individual

MS confirmed this is currently a known limitation in Premium workspaces.

https://learn.microsoft.com/en-us/power-query/connectors/pdf#power-bi-dataflows-in-a-premium-capacit...

 

Currently trying to get this fixed by using a gateway

lbendlin
Super User
Super User

You cannot just transform the column type from table to text.  You need to expand the table to new columns first.

PVO3
Impactful Individual
Impactful Individual

Thanks @lbendlin. I know. The code is just to indicate at what point the error triggers.

Whatever I do from here, the error "Error: PipelineException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt" is returned. So also when expanding.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors