Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am trying to transform rows to columns from report output file(s).
[Original post is updated after @BA_Pete reply, to cover duplicate medicine scenario]
I categorize as
Report data
001 Pharmacy Report Date: 01-MAR-2022 Page: 1
Patient ID: 100
Name: Brenda, Chad T
Address: 123, East 12th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony
Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food
Medicine: Oflaxicin 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Items Count: 2
Prescription Date: 10-Feb-2022 10:30AM
Old Balance: $0.00
Bill Amount: $145.00
Bill Paid: $145.00
Total Due: $0.00
002 Pharmacy Report Date: 01-MAR-2022 Page: 1
Patient ID: 101
Name: Joleen, Wright T
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony
Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food
Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Medicine: Thyorixn 50mcg
Dosage: 1.0
Usage: Morning 1.0
Medicine: Zyrtec
Dosage: 2.0
Usage: Morning 1 and night 1
003 Pharmacy Report Date: 01-MAR-2022 Page: 2
Patient ID: 101
Name: Joleen, Wright T
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony
Prescription Items:
Medicine: Hemarrhoid cream
Dosage: N/A
Usage: Apply after each bowel movement
Items Count: 5
Prescription Date: 11-Feb-2022 10:30AM
Old Balance: $0.00
Bill Amount: $245.00
Bill Paid: $0.00
Total Due: $245.00
004 Pharmacy Report Date: 01-MAR-2022 Page: 1
Patient ID: 102
Name: Paul, Peter Chan
Address: 123, East 12th Avenue, City
Pharmacy: Emergency pharmacy
Pharmacist: Alan Sheppard
Prescription Items:
Medicine: Tylenol 250mg
Dosage: 1.0
Usage: Morning and Daily 1 tablet after food
Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Items Count: 2
Prescription Date: 14-Feb-2022 02:30PM
Old Balance: $0.00
Bill Amount: $110.00
Bill Paid: $110.00
Total Due: $0.00
005 Pharmacy Report Date: 01-MAR-2022 Page: 1
Patient ID: 103
Name: Cathy, Peter Chan
Address: 123, East 12th Avenue, City
Pharmacy: Emergency pharmacy
Pharmacist: Alan Sheppard
Prescription Items:
Medicine: Tylenol 250mg
Dosage: 1.0
Usage: Morning and Daily 1 tablet after food
Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Items Count: 3
Prescription Date: 18-Feb-2022 02:30PM
Old Balance: $0.00
Bill Amount: $160.00
Bill Paid: $160.00
Total Due: $0.00
006 Pharmacy Report Date: 01-MAR-2022 Page: 1
Patient ID: 109
Name: Jack, Musk E
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony
Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food
Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
Medicine: Thyorixn 50mcg
Dosage: 1.0
Usage: Morning 1.0
Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5
007 Pharmacy Report Date: 01-MAR-2022 Page: 2
Patient ID: 109
Name: Jack, Musk E
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony
Prescription Items:
Medicine: Hemarrhoid cream
Dosage: N/A
Usage: Apply after each bowel movement
Medicine: Zyrtec
Dosage: 2.0
Usage: Morning 1 and night 1
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food
Items Count: 7
Prescription Date: 11-Feb-2022 10:30AM
Old Balance: $0.00
Bill Amount: $345.00
Bill Paid: $0.00
Total Due: $345.00
Rules:
* If Dosage is blank or N/A then chnage it to 1
Output expected: (or all data elements)
Seq Number | Report Name | Date | Page | Patient ID | Name | Pharmacy | Medicine | Dosage | Items Count | Prescription Date | Bill Amount |
001 | Pharmacy Report | 01-MAR-2022 | 1 | 100 | Brenda, Chad T | Ground floor pharmacy | Tylenol 500mg | 2 | 2 | 10-Feb-2022 10:30AM | $145.00 |
001 | Pharmacy Report | 01-MAR-2022 | 1 | 100 | Brenda, Chad T | Ground floor pharmacy | Oflaxicin 250mg | 1.5 | 2 | 10-Feb-2022 10:30AM | $145.00 |
002 | Pharmacy Report | 01-MAR-2022 | 1 | 101 | Joleen, Wright T | Ground floor pharmacy | Tylenol 500mg | 2 | 5 | 11-Feb-2022 10:30AM | $245.00 |
002 | Pharmacy Report | 01-MAR-2022 | 1 | 101 | Joleen, Wright T | Ground floor pharmacy | Bactrim DS 250mg | 1.5 | 5 | 11-Feb-2022 10:30AM | $245.00 |
002 | Pharmacy Report | 01-MAR-2022 | 1 | 101 | Joleen, Wright T | Ground floor pharmacy | Thyorixn 50mcg | 1 | 5 | 11-Feb-2022 10:30AM | $245.00 |
002 | Pharmacy Report | 01-MAR-2022 | 1 | 101 | Joleen, Wright T | Ground floor pharmacy | Zyrtec | 2 | 5 | 11-Feb-2022 10:30AM | $245.00 |
003 | Pharmacy Report | 01-MAR-2022 | 2 | 101 | Joleen, Wright T | Ground floor pharmacy | Hemarrhoid cream | 1 | 5 | 11-Feb-2022 10:30AM | $245.00 |
004 | Pharmacy Report | 01-MAR-2022 | 1 | 102 | Paul, Peter Chan | Emergency pharmacy | Tylenol 250mg | 1 | 2 | 14-Feb-2022 02:30PM | $110.00 |
004 | Pharmacy Report | 01-MAR-2022 | 1 | 102 | Paul, Peter Chan | Emergency pharmacy | Amox 250mg | 1.5 | 2 | 14-Feb-2022 02:30PM | $110.00 |
005 | Pharmacy Report | 01-MAR-2022 | 1 | 103 | Cathy, Peter Chan | Emergency pharmacy | Tylenol 250mg | 1 | 3 | 18-Feb-2022 02:30PM | $160.00 |
005 | Pharmacy Report | 01-MAR-2022 | 1 | 103 | Cathy, Peter Chan | Emergency pharmacy | Amox 250mg | 1.5 | 3 | 18-Feb-2022 02:30PM | $160.00 |
005 | Pharmacy Report | 01-MAR-2022 | 1 | 103 | Cathy, Peter Chan | Emergency pharmacy | Amox 250mg | 1.5 | 3 | 18-Feb-2022 02:30PM | $160.00 |
006 | Pharmacy Report | 01-MAR-2022 | 1 | 109 | Jack, Musk E | Ground floor pharmacy | Tylenol 500mg | 2 | 7 | 11-Feb-2022 10:30AM | $345.00 |
006 | Pharmacy Report | 01-MAR-2022 | 1 | 109 | Jack, Musk E | Ground floor pharmacy | Bactrim DS 250mg | 1.5 | 7 | 11-Feb-2022 10:30AM | $345.00 |
006 | Pharmacy Report | 01-MAR-2022 | 1 | 109 | Jack, Musk E | Ground floor pharmacy | Thyorixn 50mcg | 1 | 7 | 11-Feb-2022 10:30AM | $345.00 |
006 | Pharmacy Report | 01-MAR-2022 | 1 | 109 | Jack, Musk E | Ground floor pharmacy | Bactrim DS 250mg | 1.5 | 7 | 11-Feb-2022 10:30AM | $345.00 |
007 | Pharmacy Report | 01-MAR-2022 | 2 | 109 | Jack, Musk E | Ground floor pharmacy | Hemarrhoid cream | 1 | 7 | 11-Feb-2022 10:30AM | $345.00 |
007 | Pharmacy Report | 01-MAR-2022 | 2 | 109 | Jack, Musk E | Ground floor pharmacy | Zyrtec | 2 | 7 | 11-Feb-2022 10:30AM | $345.00 |
007 | Pharmacy Report | 01-MAR-2022 | 2 | 109 | Jack, Musk E | Ground floor pharmacy | Tylenol 500mg | 2 | 7 | 11-Feb-2022 10:30AM | $345.00 |
19 rows
I started the power query as below:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
#"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID")
then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Prescription Date:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
#"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
#"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
#"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),
#"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
#"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
#"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID",
each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null , type nullable text),
#"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
#"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"})
in
#"Replaced Value"
from here, not sure how to take further ...
trying to think to use group by using one or two columns and split ...
Any thoughts?
Solved! Go to Solution.
Hi @sevenhills ,
I've gone about it a slightly different way from you i.e. I've built up all the columns from the original table, then filled them around the [medicine] and [dosage] columns (as these are the only ones that can be distinct within a group).
I think the addition of the STOP row could be slicker, maybe using a record function, but couldn't get my head in gear for it.
The only difference I got to your example output was on poor old Joleen's special cream, where your output stated a dosage of 1, but the data had N/A. Probably easy enough for you to add a step into my code to fix this if required.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3ZXdb9owEMD/lRPaIzDbwAtvoem2ToJGLdWkoT6Y5CCW/BE5piP//YwDIyCyrSuapuUhSu4u9+Vf7haLDiEUTq8k51bxtIIHLIx1QRZzh2MgtDeNHnqMMLaz42svo53n7qKTcCdQO7iLvYSQIJtx5fUTizrjXbjJeQbzoIiyzGJZeks26MItL51/cjlEL6g36E2Fq2qv+0zG8NGajc5gJY2xUOzFTRtRujHMqs0aNcyNrnW1gY+VWlE4YTTcOVTlOMinmIlUaJ/ivJKojYQRIWoddLEpQ3GsX5fyVL9OjdVCr4H7VGIuZAUUHF9KdMBXDi2sjMmOkY8R7leSb3fPwEbnMWh/dCkGDVG0WOcOyN4k3EIJcOP74Stm0FJofWCU9D7gsj4wSsYDEk2D6b3MYMIl16k38tc70t8f2kRICZGqve80dDg60SVcZOMalKZubhyXEG/woPvhMdwIYVenjDYo+2wkou7CFxsa1soZG1yDs38XtAlPnRUK4scrkNZIPK+MFVvtM1fpudOLmR/EZ46+VtZh+julN7OiTZAGbwKJ/RcgfULFrc2NyCC1yNVJQ2fvI2h2NCoKD1DNDfI0h6X5hhKUeUHlm9AyWUa/mCz0zZOF/WSykNa50vhqD8Tw6pOFNYBI+EZ2IcFd9/wO03+6wW4VWn/UPq1WGiLfNnjMsSi4zV6Dw2GuXPrlrzNX/Llt//buGh4JI8wTlrx+d1HSvrtoO2O15vk7", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
addSeqID = Table.AddColumn(Source, "seqID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Page:") then Text.BeforeDelimiter([Line], " ") else null, type text),
addReportName = Table.AddColumn(addSeqID, "reportName", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], [seqID], "Date:") else null, type text),
addDate = Table.AddColumn(addReportName, "date", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], "Date:", "Page:") else null),
addPage = Table.AddColumn(addDate, "page", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.AfterDelimiter([Line], "Page:") else null, type text),
addPatientID = Table.AddColumn(addPage, "patientID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Patient ID:") then Text.AfterDelimiter([Line], "Patient ID:") else null, type text),
addName = Table.AddColumn(addPatientID, "name", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Name:") then Text.AfterDelimiter([Line], "Name:") else null, type text),
addPharmacy = Table.AddColumn(addName, "pharmacy", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Pharmacy:") then Text.AfterDelimiter([Line], "Pharmacy:") else null, type text),
addMedicine = Table.AddColumn(addPharmacy, "medicine", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Medicine:") then Text.AfterDelimiter([Line], "Medicine:") else null, type text),
addDosage = Table.AddColumn(addMedicine, "dosage", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Dosage:") then Text.AfterDelimiter([Line], "Dosage:") else null),
addItemsCount = Table.AddColumn(addDosage, "itemCount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Items Count:") then Text.AfterDelimiter([Line], "Items Count:") else null),
addPrescDate = Table.AddColumn(addItemsCount, "prescriptionDate", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Prescription Date:") then Text.AfterDelimiter([Line], "Prescription Date:") else null),
addBillAmount = Table.AddColumn(addPrescDate, "billAmount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Bill Amount:") then Text.AfterDelimiter([Line], "Bill Amount:") else null),
trimTextTypeCols = Table.TransformColumns(addBillAmount,{{"seqID", Text.Trim, type text}, {"reportName", Text.Trim, type text}, {"date", Text.Trim, type text}, {"page", Text.Trim, type text}, {"patientID", Text.Trim, type text}, {"name", Text.Trim, type text}, {"pharmacy", Text.Trim, type text}, {"medicine", Text.Trim, type text}}),
remLineCol = Table.RemoveColumns(trimTextTypeCols,{"Line"}),
fillDownToMedicine = Table.FillDown(remLineCol,{"seqID", "reportName", "date", "page", "patientID", "name", "pharmacy", "medicine"}),
fillUpFromDosage = Table.FillUp(fillDownToMedicine,{"dosage", "itemCount", "prescriptionDate", "billAmount"}),
filterMedStopNull = Table.SelectRows(fillUpFromDosage, each ([medicine] <> null and [medicine] <> "STOP")),
tableDistinct = Table.Distinct(filterMedStopNull)
in
tableDistinct
Pete
Proud to be a Datanaut!
try change at those points in the way you see in the pictures
regarding your questions about using the table.group function it takes some time to make a detailed explanation.
You can find somethink here
Hi @BA_Pete
Final code:
Raw Data
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
#"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID")
then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Items Count:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
#"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
#"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
#"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),
#"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
#"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
#"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID",
each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null , type nullable text),
#"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
#"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"}),
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Section", "Patient ID"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text]}})
in
#"Grouped Rows"
Data Prep - Page Row
let
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Page Row")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Split Column by Positions" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByPositions({0, 18, 44, 58, 64}), {"Line.1", "Line.2", "Line.3", "Line.4", "Line.5"}),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Positions"," Date: ","",Replacer.ReplaceText,{"Line.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Line.4", "Section", "Section - Begin", "Patient ID", "Line.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Line.5", "Page"}, {"Line.3", "Date"}, {"Line.2", "Title"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Seq Number", "Index", "Title", "Date", "Page"})
in
#"Reordered Columns"
Data Prep - HF Rows
et
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] <> "Page Row" and [Section] <> "Items" )),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index", "Section", "Section - Begin", "Seq Number"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
#"Filtered Rows1" = Table.SelectRows(#"Trimmed Text", each ([Line.1] <> "Patient ID")),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows1"),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Line.1]), "Line.1", "Line.2")
in
#"Pivoted Column"
Data Prep - Items
let
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Items" )),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded TempTable100", each ([Line] <> "Prescription Items:") and not Text.StartsWith([Line], "Usage")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Section", "Section - Begin"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
#"Added Conditional Column" = Table.AddColumn(#"Trimmed Text", "G1",
each
if [Line.1] = "Medicine" then "G1_L_" & Number.ToText([Index])
else if [Line.1] = "Items Count" then "G2_" & [Patient ID]
else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Filled Down" = Table.FillDown(#"Sorted Rows",{"G1"}),
#"Removed Columns1" = Table.RemoveColumns(#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Line.1]), "Line.1", "Line.2"),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column","N/A","1.0",Replacer.ReplaceText,{"Dosage"})
in
#"Replaced Value"
Data
let
Source = Table.NestedJoin(#"Data Prep - HF Rows", {"Patient ID"}, #"Data Prep - Items", {"Patient ID"}, "Raw Data - Items", JoinKind.LeftOuter),
#"Expanded Raw Data - Items" = Table.ExpandTableColumn(Source, "Raw Data - Items", {"Seq Number", "Medicine", "Dosage"}, {"Seq Number", "Medicine", "Dosage"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Raw Data - Items", {"Seq Number"}, #"Data Prep - Page Row", {"Seq Number"}, "Raw Data - Page Row", JoinKind.LeftOuter),
#"Expanded Raw Data - Page Row" = Table.ExpandTableColumn(#"Merged Queries", "Raw Data - Page Row", {"Title", "Date", "Page"}, {"Title", "Date", "Page"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded Raw Data - Page Row",{"Patient ID", "Name", "Items Count", "Prescription Date", "Medicine", "Dosage", "Old Balance", "Bill Amount", "Bill Paid", "Total Due", "Seq Number", "Title", "Date", "Page", "Address", "Pharmacy", "Pharmacist"})
in
#"Reordered Columns"
I will be happy to hear comments/updates/suggestions.
Converted as functions, used to process each file in the folder.
I am able to process the data and got what I needed.
In a given file, I classified as sections and processed as three segments.
a) For a given Patient ID, identify the static rows & sections
i.e., header, footer, end row. Use split by ": " and get the info needed
b) For a given Patient ID, identify the dynamic rows
i.e., items. Use sort, split by ": ", fill down and fill up.
c) For a given Patient ID, identify the extended info,
i.e., rows with Seq Number, PAGE: , spanned across multiple pages and get info needed.
Join ((a) + (b)) + (c)
Question I have to figure out is the best way to add these steps in order to process all files in the folder.
Thank you @Anonymous , @BA_Pete
Thank you.
I have been using the same for long time in many projects. As I used mockup data for this post, actual files has lot more combinations. Working on it for nested functions / transformation at each file level and clubbing them. 🙂
Hi @Anonymous
I am closing the thread and marking as solution.
Couple of issues:
I have been using the link approach i.e., files list, combine and then do transformation in many projects. No issues. Here, since each file is huge, I am getting into very slowness during processing in the dataflow. Better is to do each file separate processing and then do the join of each file later.
Other thing I noticed is Text.Combine works different sort order in the dataflow vs. dataset/pbix code on the Power BI service, which is outside the post context.
Kudos to your lambda expressions experience
Hi @sevenhills ,
Any chance you can drop your final M with a JSON source in here please?
I'm interested to see/learn how you completed this in the end.
Ta,
Pete
Proud to be a Datanaut!
Hi @BA_Pete
Final code:
Raw Data
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
#"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID")
then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Items Count:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
#"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
#"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
#"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),
#"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
#"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
#"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID",
each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null , type nullable text),
#"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
#"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"}),
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Section", "Patient ID"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text]}})
in
#"Grouped Rows"
Data Prep - Page Row
let
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Page Row")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Split Column by Positions" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByPositions({0, 18, 44, 58, 64}), {"Line.1", "Line.2", "Line.3", "Line.4", "Line.5"}),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Positions"," Date: ","",Replacer.ReplaceText,{"Line.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Line.4", "Section", "Section - Begin", "Patient ID", "Line.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Line.5", "Page"}, {"Line.3", "Date"}, {"Line.2", "Title"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Seq Number", "Index", "Title", "Date", "Page"})
in
#"Reordered Columns"
Data Prep - HF Rows
et
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] <> "Page Row" and [Section] <> "Items" )),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index", "Section", "Section - Begin", "Seq Number"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
#"Filtered Rows1" = Table.SelectRows(#"Trimmed Text", each ([Line.1] <> "Patient ID")),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows1"),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Line.1]), "Line.1", "Line.2")
in
#"Pivoted Column"
Data Prep - Items
let
Source = #"Raw Data",
#"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Items" )),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
#"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded TempTable100", each ([Line] <> "Prescription Items:") and not Text.StartsWith([Line], "Usage")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Section", "Section - Begin"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
#"Added Conditional Column" = Table.AddColumn(#"Trimmed Text", "G1",
each
if [Line.1] = "Medicine" then "G1_L_" & Number.ToText([Index])
else if [Line.1] = "Items Count" then "G2_" & [Patient ID]
else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Filled Down" = Table.FillDown(#"Sorted Rows",{"G1"}),
#"Removed Columns1" = Table.RemoveColumns(#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Line.1]), "Line.1", "Line.2"),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column","N/A","1.0",Replacer.ReplaceText,{"Dosage"})
in
#"Replaced Value"
Data
let
Source = Table.NestedJoin(#"Data Prep - HF Rows", {"Patient ID"}, #"Data Prep - Items", {"Patient ID"}, "Raw Data - Items", JoinKind.LeftOuter),
#"Expanded Raw Data - Items" = Table.ExpandTableColumn(Source, "Raw Data - Items", {"Seq Number", "Medicine", "Dosage"}, {"Seq Number", "Medicine", "Dosage"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Raw Data - Items", {"Seq Number"}, #"Data Prep - Page Row", {"Seq Number"}, "Raw Data - Page Row", JoinKind.LeftOuter),
#"Expanded Raw Data - Page Row" = Table.ExpandTableColumn(#"Merged Queries", "Raw Data - Page Row", {"Title", "Date", "Page"}, {"Title", "Date", "Page"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded Raw Data - Page Row",{"Patient ID", "Name", "Items Count", "Prescription Date", "Medicine", "Dosage", "Old Balance", "Bill Amount", "Bill Paid", "Total Due", "Seq Number", "Title", "Date", "Page", "Address", "Pharmacy", "Pharmacist"})
in
#"Reordered Columns"
I will be happy to hear comments/updates/suggestions.
Converted as functions, used to process each file in the folder.
Thank you @Anonymous
I am thinking along these lines
(The report file looks like detailed report output, which is similar to consolidated invoice / order details report for all customers ~ based on my oracle reports knowledge. but this is pharmacy domain)
a) Load the file and add line number.
b) Categorize into 5 sections, as each may have one row (or) begin and end rows
c) Modularizing by passing sections to the functions
Similar to the function you did
I started with these, In a given file,
Still working on it ...
- Club them back and remove unwanted rows and columns and present
Coming back to your solution, I don't know Spanish and tying to understand and adapt the concept.
I try to summarize the steps of my idea.
I have divided the report into groups identified by the "Total Due" line.
Since I used the Equation Criteria parameter of the Table.Group () function, I had to insert a dummy line of this type at the beginning of the report which becomes the first line of each group (not the last) but is eliminated with Table.Skip (). If this info is needed, it can be obtained from the Bill lines.
On each of these groups I have applied the trans () function.
I am attaching a file with an application of the trans function to the more complex group (the one that has two different Pharmacy Reports).
The important part is the indexing by subgroups of the Patient Items part, so that you can pivot correctly.
I did some filldowns (but it might be useful to do some fillups too), before deleting the superfluous lines.
Some cleaning and tidying up of the labels and column names remain to be done.
For example, since I used ":" to divide the initial text into columns, the term "Page" remained with the values of the "Line.2" field ie the date. It should be deleted from there and added to the column where the page number "Line.3" is.
PS
My language is Italian not Spanish.
I don't know Spanish either 😁
Thanks @Anonymous
Sorry, I thought it is Spanish and tried translating your M query code online and tried to understand it 🙂
I looked into your latest file and my raw data (check my original post) and added the Totat due line and tried applying your code and it is not doing correctly. Please could you check.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZZfT9swEMC/ygntsWW2Q2HrW0q7jUmFCjpNWsXDkbiNhWNHjsvIt5+bBOFWDX/TiU30oUruzue78y93ns32ptqihOGS9wE+kH1C9i47sz1CKKz/JgmaFKMCznmmjS1lQ7RuGaHdcXjeZYSxlR0unIyWXiZoBVcWToZOUns+xdTpB4arGDtwnGAM01IRxrHhee4sWdCBEebWPdkEwhuultyZCltUXutI+vDV6KWKYS61NpDVYt9G5LYPp8VywRVMtap0lYHbKzIis0IrOLE8zfulfMxjEQnlQpwWkistoUdIuih1Q52XybH9KpUf1etYGyXUAtCFMkQhC6Bg8UpyCzi33MBc6/h+5/sdzuYSb1fPwHqbe9D93rY9aLmLEovEAqlNyr8yBTh29XAZM2hItDowSrpf+FV1YJT0AxKOS9MzGcMAJapoBYOPw0BICWFaeV9p6EFvTTdBEfcrUHydD9e6x5oy1jpl1KPsu5acqw78NGXBGjljQRucvV3QBhhZI1IYXrRAmhd4UmgjbpWLPI02nW6N/E684ehXYSyPnpK6HxX1QQpeBRL7L0D6xlM0JtEihshwTNcKevoxBL+iYZY5gCpuOEYJXOnfXEKqb3jqitDQWXqPdBb66s7CHugspLGveKtqIA5a7yzMA2KCS9mBCV9Vz80w9dIJNkq5cUftwmqkIXRlg4uEZxma+Dk43PWVbZ98O33Fndvt355dB/eEEeYImzx/dlHSPLtoM2Obs6vXOmGBR9gx2qR4R6wlxHbLbPAIs59ez+zhA8wePp3Zw9aZ/eyPSYyuOzBe5tcwepsj8v2u1Wp8NVVHrV++/i2qdnPxavF+vPsPYK0fHu38lhi86JZ4t+ryDw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
#"Split Column by Delimiter" = Table.SplitColumn(Source, "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2", "Line.3"}),
#"Removed Blank Rows" = Table.SelectRows(#"Split Column by Delimiter", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Filled Down" = Table.FillDown(#"Removed Blank Rows",{"Line.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Filled Down",{{"Line.1", type text}, {"Line.2", type text}, {"Line.3", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Line.1"}, {{"all", each _}}, GroupKind.Local, (x,y)=>Number.From(x[#"Line.1"]="Total Due" and y = x) ),
#"Patient ID 101" = #"Grouped Rows"{1}[all],
#"Removed Top Rows" = Table.Skip(#"Patient ID 101",1),
#"Grouped Rows1" = Table.Group(#"Removed Top Rows", {"Line.1", "Line.2"}, {{"all", each _}}, GroupKind.Local, (x,y) => Number.From( Text.Contains(x[#"Line.1"], "Pharmacy") and Text.End(x[#"Line.1"],10) = Text.End(y[#"Line.1"],10) )),
#"Expanded all" = Table.ExpandTableColumn(#"Grouped Rows1", "all", {"Line.1", "Line.2", "Line.3"}, {"Line.1.1", "Line.2.1", "Line.3"}),
#"Grouped Rows2" = Table.Group(#"Expanded all", {"Line.1.1"}, {{"all", each Table.AddIndexColumn(_, "Pat_Index1", 0, 1) }}),
#"Expanded all1" = Table.ExpandTableColumn(#"Grouped Rows2", "all", {"Line.1", "Line.2", "Line.1.1", "Line.2.1", "Line.3", "Pat_Index1"}, {"Line.1", "Line.2", "Line.1.1.1", "Line.2.1", "Line.3", "Pat_Index1"}),
#"Pivoted Column" = Table.Pivot(#"Expanded all1", List.Distinct(#"Expanded all1"[Line.1.1]), "Line.1.1", "Line.2.1", (x) => x{0}?),
#"Filled Down1" = Table.FillDown(#"Pivoted Column",{"Patient ID", "Name", "Address", "Pharmacy", "Pharmacist", "Items Count", "Prescription Date", "Old Balance", "Bill Amount", "Bill Paid"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down1", each [Medicine] <> null and [Medicine] <> "")
in
#"Filtered Rows"
Also, please explain these lines what you are doing here, not able to get it
#"Grouped Rows" = Table.Group(#"Changed Type", {"Line.1"}, {{"all", each _}}, GroupKind.Local, (x,y)=>Number.From(x[#"Line.1"]="Total Due" and y = x) ),
#"Grouped Rows1" = Table.Group(#"Removed Top Rows", {"Line.1", "Line.2"}, {{"all", each _}}, GroupKind.Local, (x,y) => Number.From( Text.Contains(x[#"Line.1"], "Pharmacy") and Text.End(x[#"Line.1"],10) = Text.End(y[#"Line.1"],10) )),
#"Pivoted Column" = Table.Pivot(#"Expanded all1", List.Distinct(#"Expanded all1"[Line.1.1]), "Line.1.1", "Line.2.1", (x) => x{0}?),
Once I understand this and make it working, I will go back functions implementation above.
My reason for going to functions are
~ my data has lot of lines in each section
~ each file has 10K+ lines and 100+ patient records
~ easy in long term to add more custom logic in the PBI Service dataflow
try change at those points in the way you see in the pictures
regarding your questions about using the table.group function it takes some time to make a detailed explanation.
You can find somethink here
Thank you, I will look into the lamda explanation and try to understand.
Able to move forward further along. Appreciated , will update the post by the end of the day
Hi @sevenhills ,
I've gone about it a slightly different way from you i.e. I've built up all the columns from the original table, then filled them around the [medicine] and [dosage] columns (as these are the only ones that can be distinct within a group).
I think the addition of the STOP row could be slicker, maybe using a record function, but couldn't get my head in gear for it.
The only difference I got to your example output was on poor old Joleen's special cream, where your output stated a dosage of 1, but the data had N/A. Probably easy enough for you to add a step into my code to fix this if required.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3ZXdb9owEMD/lRPaIzDbwAtvoem2ToJGLdWkoT6Y5CCW/BE5piP//YwDIyCyrSuapuUhSu4u9+Vf7haLDiEUTq8k51bxtIIHLIx1QRZzh2MgtDeNHnqMMLaz42svo53n7qKTcCdQO7iLvYSQIJtx5fUTizrjXbjJeQbzoIiyzGJZeks26MItL51/cjlEL6g36E2Fq2qv+0zG8NGajc5gJY2xUOzFTRtRujHMqs0aNcyNrnW1gY+VWlE4YTTcOVTlOMinmIlUaJ/ivJKojYQRIWoddLEpQ3GsX5fyVL9OjdVCr4H7VGIuZAUUHF9KdMBXDi2sjMmOkY8R7leSb3fPwEbnMWh/dCkGDVG0WOcOyN4k3EIJcOP74Stm0FJofWCU9D7gsj4wSsYDEk2D6b3MYMIl16k38tc70t8f2kRICZGqve80dDg60SVcZOMalKZubhyXEG/woPvhMdwIYVenjDYo+2wkou7CFxsa1soZG1yDs38XtAlPnRUK4scrkNZIPK+MFVvtM1fpudOLmR/EZ46+VtZh+julN7OiTZAGbwKJ/RcgfULFrc2NyCC1yNVJQ2fvI2h2NCoKD1DNDfI0h6X5hhKUeUHlm9AyWUa/mCz0zZOF/WSykNa50vhqD8Tw6pOFNYBI+EZ2IcFd9/wO03+6wW4VWn/UPq1WGiLfNnjMsSi4zV6Dw2GuXPrlrzNX/Llt//buGh4JI8wTlrx+d1HSvrtoO2O15vk7", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
addSeqID = Table.AddColumn(Source, "seqID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Page:") then Text.BeforeDelimiter([Line], " ") else null, type text),
addReportName = Table.AddColumn(addSeqID, "reportName", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], [seqID], "Date:") else null, type text),
addDate = Table.AddColumn(addReportName, "date", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], "Date:", "Page:") else null),
addPage = Table.AddColumn(addDate, "page", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.AfterDelimiter([Line], "Page:") else null, type text),
addPatientID = Table.AddColumn(addPage, "patientID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Patient ID:") then Text.AfterDelimiter([Line], "Patient ID:") else null, type text),
addName = Table.AddColumn(addPatientID, "name", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Name:") then Text.AfterDelimiter([Line], "Name:") else null, type text),
addPharmacy = Table.AddColumn(addName, "pharmacy", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Pharmacy:") then Text.AfterDelimiter([Line], "Pharmacy:") else null, type text),
addMedicine = Table.AddColumn(addPharmacy, "medicine", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Medicine:") then Text.AfterDelimiter([Line], "Medicine:") else null, type text),
addDosage = Table.AddColumn(addMedicine, "dosage", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Dosage:") then Text.AfterDelimiter([Line], "Dosage:") else null),
addItemsCount = Table.AddColumn(addDosage, "itemCount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Items Count:") then Text.AfterDelimiter([Line], "Items Count:") else null),
addPrescDate = Table.AddColumn(addItemsCount, "prescriptionDate", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Prescription Date:") then Text.AfterDelimiter([Line], "Prescription Date:") else null),
addBillAmount = Table.AddColumn(addPrescDate, "billAmount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Bill Amount:") then Text.AfterDelimiter([Line], "Bill Amount:") else null),
trimTextTypeCols = Table.TransformColumns(addBillAmount,{{"seqID", Text.Trim, type text}, {"reportName", Text.Trim, type text}, {"date", Text.Trim, type text}, {"page", Text.Trim, type text}, {"patientID", Text.Trim, type text}, {"name", Text.Trim, type text}, {"pharmacy", Text.Trim, type text}, {"medicine", Text.Trim, type text}}),
remLineCol = Table.RemoveColumns(trimTextTypeCols,{"Line"}),
fillDownToMedicine = Table.FillDown(remLineCol,{"seqID", "reportName", "date", "page", "patientID", "name", "pharmacy", "medicine"}),
fillUpFromDosage = Table.FillUp(fillDownToMedicine,{"dosage", "itemCount", "prescriptionDate", "billAmount"}),
filterMedStopNull = Table.SelectRows(fillUpFromDosage, each ([medicine] <> null and [medicine] <> "STOP")),
tableDistinct = Table.Distinct(filterMedStopNull)
in
tableDistinct
Pete
Proud to be a Datanaut!
Thank you @BA_Pete
--------
I updated the raw data and output in the original post(, helps the question in long term viewers.)
--------
Few points I missed in the original post, after looking into your solution.
a) In a given report file,
i) No patient id duplicates in a given file.
ii) For a given patient,
Medicine, Dosage, Usage combinations can have duplicates.
Medicine, Dosage, Usage will always stay as combination in a given page. (no split of pages of these three row combination)
b) I went back to the requirements and saw that if dosage is blank or N/A then change to 1. So, you are right, I can add a step to fix it. Sorry, I did not mention in the original post, as I was not aware when I did the first post.
--------
My each data file is upto 25000 lines, and get each week one file. Data combinations are many.
Let me check and get back to you, (will update this reply)
I am thinking along these lines, for modularity
Work in progress
Raw Data - Step 1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
#"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID")
then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Prescription Date:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
#"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
#"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
#"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
#"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),
#"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
#"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
#"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID",
each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null , type nullable text),
#"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
#"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"})
in
#"Replaced Value"
Raw Data Prep - Step 2
let
Source = #"Raw Data - Step 1",
addPageRowInfo = Table.AddColumn(Source, "Page Row Info", each if [#"Section - Begin"] = "Page Row - Begin" then Parse_PageRow([Line]) else null, type table),
addPatientEndRowInfo = Table.AddColumn(addPageRowInfo, "addPageRowInfo", each if [#"Section - Begin"] = "End Patient - Begin" then Parse_PatientEndRow([Line]) else null, type table),
#"Grouped Rows" = Table.Group(addPatientEndRowInfo, {"Seq Number", "Section"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text, Page Row Info=table, addPageRowInfo=table]}})
in
#"Grouped Rows"
For each row, took the "TempTable100" column (and to modularize,) added these functions and later clubbed them all. it is the same logic we did for the whole file.
...
Function: Parse_HeaderPatientRows
Function: Parse_Items
Function: Parse_FooterRows
Function: Parse_PatientEndRow
= Table.AddColumn(#"Grouped Rows", "Data Cols Info", each
if [#"Section"] = "End Patient" then Parse_PatientEndRow([allrows]{0}[Line])
else if [#"Section"] = "Footer" then Parse_FooterRows([allrows])
else if [#"Section"] = "Header Patient" then Parse_HeaderPatientRows([allrows])
else if [#"Section"] = "Items" then Parse_Items([allrows])
else null
, type table)
Note: All the functions code can be done as one function too. Because of lot more data in each row and few more transformations, I adapted as separate functions.
Thank you @BA_Pete . Stopper row identification is a cool technique.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.