To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi all,
I have a table of keywords and want to identify how many times these words appear in a folder full of PDF files.
Does BI / Power Query have this function?
Q
Solved! Go to Solution.
Hi @pickslides ,
Are these keywords in the PDF file name or in the PDF file? If it's in the file, I doubt it's possible.
If the keywords appear in the PDF file name, then you also need to consider whether a file name will appear in more than two keywords, whether there is a separator (space) between the keywords, which will also affect the steps involved, could you further clarify the question in this regard?
please check the follow thread if helps:
Solved: Counting the number of occurrences of a string in
Solved: Counting same contact per row when mixed in with m...
If the keyword is a single word, you can use the Folder connector and refer to the steps below.
let
Source = Folder.Files("X:\XXX\XXX"),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Name"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",".pdf","",Replacer.ReplaceText,{"Name"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Name", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Name.1", "Name.2", "Name.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Name.1", type text}, {"Name.2", type text}, {"Name.3", type text}}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Value"}, Keywords, {"Words"}, "Keywords", JoinKind.LeftOuter),
#"Expanded Keywords" = Table.ExpandTableColumn(#"Merged Queries", "Keywords", {"Words"}, {"Keywords.Words"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Keywords", each ([Keywords.Words] <> null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Keywords.Words"})
in
#"Removed Columns"
Best Regards,
Gao
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
How to get your questions answered quickly -- How to provide sample data
Hi @pickslides ,
Are these keywords in the PDF file name or in the PDF file? If it's in the file, I doubt it's possible.
If the keywords appear in the PDF file name, then you also need to consider whether a file name will appear in more than two keywords, whether there is a separator (space) between the keywords, which will also affect the steps involved, could you further clarify the question in this regard?
please check the follow thread if helps:
Solved: Counting the number of occurrences of a string in
Solved: Counting same contact per row when mixed in with m...
If the keyword is a single word, you can use the Folder connector and refer to the steps below.
let
Source = Folder.Files("X:\XXX\XXX"),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Name"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",".pdf","",Replacer.ReplaceText,{"Name"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Name", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Name.1", "Name.2", "Name.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Name.1", type text}, {"Name.2", type text}, {"Name.3", type text}}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Value"}, Keywords, {"Words"}, "Keywords", JoinKind.LeftOuter),
#"Expanded Keywords" = Table.ExpandTableColumn(#"Merged Queries", "Keywords", {"Words"}, {"Keywords.Words"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Keywords", each ([Keywords.Words] <> null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Keywords.Words"})
in
#"Removed Columns"
Best Regards,
Gao
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
How to get your questions answered quickly -- How to provide sample data