Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Amos_Lim
Resolver I
Resolver I

Need Recommendations on Pdf Data Set

Hello,

I'm using Power BI to import data from a pdf file.

There are three columns: UserName, ID, and the keyTags that each user can choose from.

Each key description is preceded by a Bullet.

Not every user will have the same level of access and the same number of keys.

I'd like some feedback on how do I reorganise them in Power BI to see who will have access to e.g. E1 LCP?

The photo shown are all saved in one cell/column for one user. Thanks.

 

HCLIM_3-1692613190478.png

 

 

 

1 ACCEPTED SOLUTION

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder.  In the Source step, give the patch of the folder.

Hope this helps.

Ashish_Mathur_0-1692921708128.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

11 REPLIES 11
Ashish_Mathur
Super User
Super User

Hi,

This M code splits the entries in the Keytag column into multiple rows based on the bullet delimiter

let
    Source = Pdf.Tables(File.Contents("C:\Users\mathu\Desktop\Sample.pdf"), [Implementation="1.3"]),
    Table001 = Source{[Id="Table001"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([keyTags] <> ""))
in
    #"Filtered Rows"

Hope this helps.

Ashish_Mathur_0-1692841781434.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks, Ashish_Mathur.

As my Pdf consists of 19 pages, and my source will be

 

Source = Table.Combine({Page001, Page002, Page003, Page004, Page005, Page006, Page007, Page008, Page009, Page010, Page011, Page012, Page013, Page014, Page015, Page016, Page017, Page018, Page019}),

 

How can I make changes to your original M code to suit my source?

Hi,

Share the download link of that 19 page PDF file.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

please see attached file.

https://we.tl/t-QB248XDi5j 

 

Capture.JPG

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder.  In the Source step, give the patch of the folder.

Hope this helps.

Ashish_Mathur_0-1692921708128.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks so much.

You are welcome.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
lbendlin
Super User
Super User

I'm using Power BI to import data from a pdf file.

Sorry to hear that. Is there absolutely no other way to get the data in a more usable format?

The pdf was generated by software, and it basically won't let me do much.
Is it possible to use a single command to organize all data in a single row, regardless of how many lines the cell has? I.e., arrange all of the bullets and text in a single row one after the other?

can you post a sanitized sample pdf?

Please find attached file one page of sample.

Pdf Sample File 

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors