Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Reply
pickslides
Helper I
Helper I

Compare a folder of PDF files to a list of key words.

Hi all, 

 

I have a table of keywords and want to identify how many times these words appear in a folder full of PDF files.

 

Does BI / Power Query have this function?

 

Q

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @pickslides ,

 

Are these keywords in the PDF file name or in the PDF file? If it's in the file, I doubt it's possible. 

If the keywords appear in the PDF file name, then you also need to consider whether a file name will appear in more than two keywords, whether there is a separator (space) between the keywords, which will also affect the steps involved, could you further clarify the question in this regard?

 

please check the follow thread if helps:

Solved: Counting the number of occurrences of a string in

Solved: Counting same contact per row when mixed in with m...

 

If the keyword is a single word, you can use the Folder connector and refer to the steps below.

let
    Source = Folder.Files("X:\XXX\XXX"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name"}),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",".pdf","",Replacer.ReplaceText,{"Name"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Name", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Name.1", "Name.2", "Name.3"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Name.1", type text}, {"Name.2", type text}, {"Name.3", type text}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value"),
    #"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Value"}, Keywords, {"Words"}, "Keywords", JoinKind.LeftOuter),
    #"Expanded Keywords" = Table.ExpandTableColumn(#"Merged Queries", "Keywords", {"Words"}, {"Keywords.Words"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Keywords", each ([Keywords.Words] <> null)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Keywords.Words"})
in
    #"Removed Columns"

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly -- How to provide sample data

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi @pickslides ,

 

Are these keywords in the PDF file name or in the PDF file? If it's in the file, I doubt it's possible. 

If the keywords appear in the PDF file name, then you also need to consider whether a file name will appear in more than two keywords, whether there is a separator (space) between the keywords, which will also affect the steps involved, could you further clarify the question in this regard?

 

please check the follow thread if helps:

Solved: Counting the number of occurrences of a string in

Solved: Counting same contact per row when mixed in with m...

 

If the keyword is a single word, you can use the Folder connector and refer to the steps below.

let
    Source = Folder.Files("X:\XXX\XXX"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name"}),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",".pdf","",Replacer.ReplaceText,{"Name"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Name", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Name.1", "Name.2", "Name.3"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Name.1", type text}, {"Name.2", type text}, {"Name.3", type text}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value"),
    #"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Value"}, Keywords, {"Words"}, "Keywords", JoinKind.LeftOuter),
    #"Expanded Keywords" = Table.ExpandTableColumn(#"Merged Queries", "Keywords", {"Words"}, {"Keywords.Words"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Keywords", each ([Keywords.Words] <> null)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Keywords.Words"})
in
    #"Removed Columns"

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly -- How to provide sample data

Helpful resources

Announcements
Top Solution Authors
Top Kudoed Authors