The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Good day,
I am quite new to Power Query and a little unsure of how everything works. I am looking to extract some data from pdf invoices that all have the same format. Currently I have set up a folder where all the invoices are stored and added to Power Query but I would like to retain only certain information by looking for certain Keywords such as "Date", "Invoice Number", "Total", etc. My idea was to create a new column that would look for each of these key words, but should return the value in the cell below or to the right of the cell that contains the Matched keyword. I have seen some tutorials on how to look for and retain certain keywords, but how do I set it up so that it gives me the value in the cell that actually contains the information that I am looking for?
An example would be like this:
GIVEN DATA: | || | RETAINED INFORMATION: | ||||
(Invoice 1) | || | Date | Invoice number | Total | (Keywords to look for) | |
Date | Invoice number | || | 07/01/2022 | INV0001 | $99.00 | (Information gathered) |
07/01/2022 | INV0001 | || | 08/01/2022 | INV0002 | $150.00 | |
|| | ||||||
Total | $99.00 | || | ||||
--------------------- | ---------------- | || | ||||
(Invoice 2) | || | |||||
Date | Invoice number | || | ||||
08/01/2022 | INV0002 | || | ||||
|| | ||||||
Total | $150.00 | || |
Solved! Go to Solution.
One way to do it is to
Read the comments and also examine the Applied Steps window to better understand
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type any}, {"Column2", type any}}),
//add shifted columns
shiftedCols = Table.FromColumns(Table.ToColumns(#"Changed Type") &
{List.RemoveFirstN(#"Changed Type"[Column1]) & {null}} &
{List.RemoveFirstN(#"Changed Type"[Column2]) & {null}}),
//Add columns for Date, Invoice Number and Total with appropriate logic
date = Table.AddColumn(shiftedCols, "Date", each if [Column1] ="Date" then [Column3] else null),
invNum = Table.AddColumn(date,"Invoice Number", each if [Column2] = "Invoice number" then [Column4] else null),
total = Table.AddColumn(invNum,"Total", each if [Column1] = "Total" then [Column2] else null),
//Remove the original and the shifted columns
#"Removed Columns" = Table.RemoveColumns(total,{"Column1", "Column2", "Column3", "Column4"}),
//remove rows where all entries are null
remAllNulls = Table.SelectRows(#"Removed Columns", each not ([Date]=null and [Invoice Number]=null and [Total]=null)),
//Fill up the Total column since that will be on a different row than the others
#"Filled Up" = Table.FillUp(remAllNulls,{"Total"}),
//Remove the unneeded rows, filtering on Date<>null
#"Filtered Rows" = Table.SelectRows(#"Filled Up", each ([Date] <> null)),
//Set the data types
#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{
{"Date", type date},
{"Invoice Number", type text},
{"Total", Int64.Type}})
in
#"Changed Type1"
One way to do it is to
Read the comments and also examine the Applied Steps window to better understand
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type any}, {"Column2", type any}}),
//add shifted columns
shiftedCols = Table.FromColumns(Table.ToColumns(#"Changed Type") &
{List.RemoveFirstN(#"Changed Type"[Column1]) & {null}} &
{List.RemoveFirstN(#"Changed Type"[Column2]) & {null}}),
//Add columns for Date, Invoice Number and Total with appropriate logic
date = Table.AddColumn(shiftedCols, "Date", each if [Column1] ="Date" then [Column3] else null),
invNum = Table.AddColumn(date,"Invoice Number", each if [Column2] = "Invoice number" then [Column4] else null),
total = Table.AddColumn(invNum,"Total", each if [Column1] = "Total" then [Column2] else null),
//Remove the original and the shifted columns
#"Removed Columns" = Table.RemoveColumns(total,{"Column1", "Column2", "Column3", "Column4"}),
//remove rows where all entries are null
remAllNulls = Table.SelectRows(#"Removed Columns", each not ([Date]=null and [Invoice Number]=null and [Total]=null)),
//Fill up the Total column since that will be on a different row than the others
#"Filled Up" = Table.FillUp(remAllNulls,{"Total"}),
//Remove the unneeded rows, filtering on Date<>null
#"Filtered Rows" = Table.SelectRows(#"Filled Up", each ([Date] <> null)),
//Set the data types
#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{
{"Date", type date},
{"Invoice Number", type text},
{"Total", Int64.Type}})
in
#"Changed Type1"
Thank you. That worked perfectly!