Solved: Re: Need Recommendations on Pdf Data Set

Amos_Lim · ‎08-21-2023

Hello,

I'm using Power BI to import data from a pdf file.

There are three columns: UserName, ID, and the keyTags that each user can choose from.

Each key description is preceded by a Bullet.

Not every user will have the same level of access and the same number of keys.

I'd like some feedback on how do I reorganise them in Power BI to see who will have access to e.g. E1 LCP?

The photo shown are all saved in one cell/column for one user. Thanks.

Ashish_Mathur · ‎08-24-2023

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder. In the Source step, give the patch of the folder.

Hope this helps.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

Ashish_Mathur · ‎08-23-2023

Hi,

This M code splits the entries in the Keytag column into multiple rows based on the bullet delimiter

let
    Source = Pdf.Tables(File.Contents("C:\Users\mathu\Desktop\Sample.pdf"), [Implementation="1.3"]),
    Table001 = Source{[Id="Table001"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([keyTags] <> ""))
in
    #"Filtered Rows"

Hope this helps.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Amos_Lim · ‎08-23-2023

Thanks, Ashish_Mathur.

As my Pdf consists of 19 pages, and my source will be

Source = Table.Combine({Page001, Page002, Page003, Page004, Page005, Page006, Page007, Page008, Page009, Page010, Page011, Page012, Page013, Page014, Page015, Page016, Page017, Page018, Page019}),

How can I make changes to your original M code to suit my source?

Ashish_Mathur · ‎08-23-2023

Hi,

Share the download link of that 19 page PDF file.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Amos_Lim · ‎08-24-2023

please see attached file.

https://we.tl/t-QB248XDi5j

Ashish_Mathur · ‎08-24-2023

Hi,

This M code works

let
    Source = Folder.Files("C:\Users\mathu\Desktop\Data"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Pdf.Tables([Content])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Id", "Name", "Kind", "Data"}, {"Id", "Name", "Kind", "Data"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each ([Kind] = "Table")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom", each Table.PromoteHeaders([Data])),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom1",{"Data"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}, {"User name", "User information", "Credentials", "keyTags", "keyTag groups", "keyTag time profiles"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Expanded Custom1", {{"keyTags", Splitter.SplitTextByDelimiter("•", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keyTags"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keyTags", type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type", each ([keyTags] <> ""))
in
    #"Filtered Rows1"

Just place the PDF file in a folder. In the Source step, give the patch of the folder.

Hope this helps.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Amos_Lim · ‎08-24-2023

Thanks so much.

Ashish_Mathur · ‎08-24-2023

You are welcome.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

lbendlin · ‎08-22-2023

I'm using Power BI to import data from a pdf file.

Sorry to hear that. Is there absolutely no other way to get the data in a more usable format?

Amos_Lim · ‎08-23-2023

The pdf was generated by software, and it basically won't let me do much.
Is it possible to use a single command to organize all data in a single row, regardless of how many lines the cell has? I.e., arrange all of the bullets and text in a single row one after the other?