Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi,
I am trying to read in the data from PDF document that contains operational stats.
The data from the PDF file looks like this.
When the PDF file is read into Power BI - Power BI identifies that the data contains 4 headers Login etc, Agent ID, Agent Name and Date as shown below:
Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 | Column9 | Column10 | Column11 | Column12 |
Log In | Log Out | Login Duration | Available T | Released T | Available In Call T | Released In Call T | In Call T | Media Type | Calls Handled | Avg T To Answer | Max T To Answer |
Agent Id: | 1111 | null | null | null | null | null | null | null | null | null | null |
Agent Name: | Joe Blogs | null | null | null | null | null | null | null | null | null | null |
06/01/2025 | null | null | null | null | null | null | null | null | null | null | null |
09:55:51 | 14:36:57 | 04:41:06 | 03:35:16 | 01:05:50 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:31 |
14:45:56 | 18:13:05 | 03:27:09 | 03:22:06 | 00:05:03 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:15 |
null | Total: | 08:08:15 | 06:57:22 | 01:10:53 | 00:00:00 | 00:00:00 | 00:00:00 | null | 0 | 00:00:00 | 00:00:31 |
Agent Id: | 9999 | null | null | null | null | null | null | null | null | null | null |
Agent Name: | Millie Smith | null | null | null | null | null | null | null | null | null | null |
06/01/2025 | null | null | null | null | null | null | null | null | null | null | null |
10:02:31 | 12:53:35 | 02:51:04 | 00:00:00 | 02:51:04 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:00 |
13:21:18 | 14:25:33 | 01:04:15 | 00:00:00 | 01:04:15 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:00 |
16:04:22 | 16:39:01 | 00:34:39 | 00:00:00 | 00:34:39 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:00 |
null | Total: | 04:29:58 | 00:00:00 | 04:29:58 | 00:00:00 | 00:00:00 | 00:00:00 | null | 0 | 00:00:00 | 00:00:00 |
Agent Id: | 1122 | null | null | null | null | null | null | null | null | null | null |
Agent Name: | Joe Black | null | null | null | null | null | null | null | null | null | null |
06/01/2025 | null | null | null | null | null | null | null | null | null | null | null |
09:33:02 | 11:09:31 | 01:36:29 | 00:45:59 | 00:50:30 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:16 |
12:23:38 | 12:56:05 | 00:32:27 | 00:32:26 | 00:00:01 | 00:00:00 | 00:00:00 | 00:00:00 | (All) | 0 | 00:00:00 | 00:00:00 |
null | Total: | 02:08:56 | 01:18:25 | 00:50:31 | 00:00:00 | 00:00:00 | 00:00:00 | null | 0 | 00:00:00 | 00:00:16 |
What is the most efficient way to get the data into a meaningful table to extract the operational stats for each agents profile per date?
Any help you can provide will be greatly appreciated.
Thanks
Solved! Go to Solution.
Use this
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each [Log In] <> null),
#"Added Agent Id" = Table.AddColumn(#"Filtered Rows", "Agent Id", each if Text.StartsWith(Text.From([Log In]),"Agent Id",Comparer.OrdinalIgnoreCase) then [Log Out] else null),
#"Added Agent Name" = Table.AddColumn(#"Added Agent Id", "Agent Name", each if Text.StartsWith(Text.From([Log In]),"Agent Name",Comparer.OrdinalIgnoreCase) then [Log Out] else null),
#"Added Date" = Table.AddColumn(#"Added Agent Name", "Date", each if [Log Out] = null then [Log In] else null),
#"Filled Down" = Table.FillDown(#"Added Date",{"Agent Id", "Agent Name", "Date"}),
#"Filtered Data Rows Only" = Table.SelectRows(#"Filled Down", each [Log In] <> "Agent Id:"and [Log In] <> "Agent Name:"and [Log Out] <> null),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Data Rows Only",{"Agent Id", "Agent Name", "Date", "Log In", "Log Out", "Login", "Available T", "Released T", "Available", "Released", "In Call T", "Media Type", "Calls", "Avg T To", "Max T To"})
in
#"Reordered Columns"
to produce this:
from this Excel Data:
This is exactly the awnser of your question
https://youtu.be/pofthD7-YuQ?si=rh_pW-ZYYlxhBo--
Thank you for your link to your video.
Your video helped me to understand the methodology. My source was not an originally table though as it was table contain pages and tables but I still understood your methodology and how it worked.
Thanks again for all your help
JB
Hi @Jacb, another solution:
Output (first few columns)
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("1VVNb9QwEP0r1p5AqlR/xNlkbgscWERBgtyWHgyxlgg3Qd0sH/+eN46lZNtQtd3uoUkkP4/tefb4zWSzWbzvtmLdLs4i+LjvB9S0X/C+2V+7vul4dPXLNcF9DV5U6H3ywbudr2NnHFq34rUL4XDK1DjFF75unKj+/vTosHUn3rq2Dr6OPreiElUnVu3ut7/m6e7PgeXybLNYbX3bi3VNGFd40LT7EI5tRtcf3JVn5+86L16Fbrt7QgaZn0t1rqW2T+R06rska8lyQFRGJie7BJQZZYpkztCQsaQihAlzJUNJ8fsvfLEK4SWb5mYYFblBmMEfu1YFKQPvA6FekiwT1GkbkrmlOZpb2cid4lB1vQt8b7IgfCpugIMA4uHESpK9F2vyeNeBpyos8ZxMhRdNCI0Xn6+a/vszESLiLDUHCmLQCDlUx+EDhOqyG0Gdtz5cDICRGzrDTRdDEmhLxiS5Z0kSk4Xz1sdz5+wvqg3QlCTVMMUgHctbC+etj+S+lQTYCOpBcWPJvPXBSZBID0txPPgpS7H79uOZZABKsUENjEqAyMohGaA3VGWd7pzrZYIWYji+FKOwRxVq0ki5ImVfnkoxKDCwHGE++lAnU6HmUmzTHwd/Bm3HE9+L9S4V8oEv/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t, Column7 = _t, Column8 = _t, Column9 = _t, Column10 = _t, Column11 = _t, Column12 = _t]),
ReplacedNulls = Table.TransformColumns(Source, {}, each List.ReplaceMatchingItems({_}, {{"null", null}}){0}),
PromotedHeader = Table.PromoteHeaders(ReplacedNulls),
TransformedHeader = Table.TransformColumnNames(PromotedHeader, each Text.Combine(List.RemoveMatchingItems(Text.Split(_, "#(lf)"), {""}), " ") ),
GroupedRows = Table.Group(TransformedHeader, "Log In", {{"T", each
[ a = Table.RemoveLastN(_, 1),
b = Table.FromRows(List.Transform(List.Skip(Table.ToRows(a), 3), (x)=> {a{0}[Log Out], a{1}[Log Out], a{2}[Log In]} & x), {"Agent Id", "Agent Name", "Date"} & Table.ColumnNames(a) )
][b], type table}}, 0,
(x,y)=> Byte.From( y = "Agent Id:" ) ),
CombinedT = Table.Combine(GroupedRows[T])
in
CombinedT
Use this
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each [Log In] <> null),
#"Added Agent Id" = Table.AddColumn(#"Filtered Rows", "Agent Id", each if Text.StartsWith(Text.From([Log In]),"Agent Id",Comparer.OrdinalIgnoreCase) then [Log Out] else null),
#"Added Agent Name" = Table.AddColumn(#"Added Agent Id", "Agent Name", each if Text.StartsWith(Text.From([Log In]),"Agent Name",Comparer.OrdinalIgnoreCase) then [Log Out] else null),
#"Added Date" = Table.AddColumn(#"Added Agent Name", "Date", each if [Log Out] = null then [Log In] else null),
#"Filled Down" = Table.FillDown(#"Added Date",{"Agent Id", "Agent Name", "Date"}),
#"Filtered Data Rows Only" = Table.SelectRows(#"Filled Down", each [Log In] <> "Agent Id:"and [Log In] <> "Agent Name:"and [Log Out] <> null),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Data Rows Only",{"Agent Id", "Agent Name", "Date", "Log In", "Log Out", "Login", "Available T", "Released T", "Available", "Released", "In Call T", "Media Type", "Calls", "Avg T To", "Max T To"})
in
#"Reordered Columns"
to produce this:
from this Excel Data:
Thank you - this worked!
I just had to follow your steps as my source was a PDF document but I was still able to use the other steps and it worked.
Thanks again
JB
It is always a bit of trial and error.
Here Iwould probaly
This would be a good start I think...
Did I answer your question? Then please mark my post as the solution and make it easier to find for others having a similar problem.
If I helped you, please click on the Thumbs Up to give Kudos.
Kees Stolker
A big fan of Power Query and Excel