Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Amos_Lim
Resolver I
Resolver I

Need Recommendations on Pdf Data Set

Hello,

I'm using Power BI to import data from a pdf file.

There are three columns: UserName, ID, and the keyTags that each user can choose from.

Each key description is preceded by a Bullet.

Not every user will have the same level of access and the same number of keys.

I'd like some feedback on how do I reorganise them in Power BI to see who will have access to e.g. E1 LCP?

The photo shown are all saved in one cell/column for one user. Thanks.

 

HCLIM_3-1692613190478.png

 

 

 

1 ACCEPTED SOLUTION

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder.  In the Source step, give the patch of the folder.

Hope this helps.

Ashish_Mathur_0-1692921708128.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

11 REPLIES 11
Ashish_Mathur
Super User
Super User

Hi,

This M code splits the entries in the Keytag column into multiple rows based on the bullet delimiter

let
    Source = Pdf.Tables(File.Contents("C:\Users\mathu\Desktop\Sample.pdf"), [Implementation="1.3"]),
    Table001 = Source{[Id="Table001"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([keyTags] <> ""))
in
    #"Filtered Rows"

Hope this helps.

Ashish_Mathur_0-1692841781434.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks, Ashish_Mathur.

As my Pdf consists of 19 pages, and my source will be

 

Source = Table.Combine({Page001, Page002, Page003, Page004, Page005, Page006, Page007, Page008, Page009, Page010, Page011, Page012, Page013, Page014, Page015, Page016, Page017, Page018, Page019}),

 

How can I make changes to your original M code to suit my source?

Hi,

Share the download link of that 19 page PDF file.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

please see attached file.

https://we.tl/t-QB248XDi5j 

 

Capture.JPG

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder.  In the Source step, give the patch of the folder.

Hope this helps.

Ashish_Mathur_0-1692921708128.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks so much.

You are welcome.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
lbendlin
Super User
Super User

I'm using Power BI to import data from a pdf file.

Sorry to hear that. Is there absolutely no other way to get the data in a more usable format?

The pdf was generated by software, and it basically won't let me do much.
Is it possible to use a single command to organize all data in a single row, regardless of how many lines the cell has? I.e., arrange all of the bullets and text in a single row one after the other?

can you post a sanitized sample pdf?

Please find attached file one page of sample.

Pdf Sample File 

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.