Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
xz296
Regular Visitor

Seeking a simpler way: trouble to extract the Date format from an uncleaned text field (HTML format)

Hi all, I'm having trouble to extract the latest date (mm/dd/yy) or (mm/dd/yyyy) from an uncleaned text field. I utlized a bunch of steps in Power Query to extract the date, but some of them don't show up in the dashboard until I refresh my steps in Power Query.

The uncleaned text field ("comment"), which is a column has content as below,  contains these signs "<p>", ids, unknown date without year (I'm gonna ignore this date) and other sentences. This column is real time data would be updated daily. 

<p>11/11/21 a id(has both letters and numbers); reporting</p>
<p>a id(has both letters and numbers 2/26: xxxx via dmaap complete.</p>
<p>a id(has both letters and numbers1/26: app team is working on development.</p>

My requirement is to extract the latest correct date from it and create a column ("Old Days") which calculates the number of days from the date until today. I don't know why this column occurs so many blank rows after my calculation, even though it contains correct format of date. I looked through my steps, like I said, some of rows in the Old Days are blank but I could see my steps extract correct date after I filter the corresponding ids in Power Query. 

Can anyone provide me a simpler way to solve or tell me what's wrong?

My steps are like this (I created the table for mainly processing the date on onboarded_date table):

= Table.AddColumn(#"Renamed Columns", "Cleaned Comment", each Html.Table([comment], {{"ExtractedText",":root"}}))

then I extend the comment table of cleaned Comment, grouped the table by id and cleaned comment, split by white space.

#"Added Custom" = Table.AddColumn(#"Renamed Columns", "Cleaned Comment", each Html.Table([comment], {{"ExtractedText",":root"}})),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Added Custom", {{"Cleaned Comment", null}}),

#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom1", "Custom"),
#"Duplicated Column2" = Table.DuplicateColumn(#"Expanded Custom", "Custom", "Custom - Copy"),
#"Filtered Rows" = Table.SelectRows(#"Duplicated Column2", each Text.Contains([#"Custom - Copy"], "/")),
#"Renamed Columns3" = Table.RenameColumns(#"Filtered Rows",{{"Custom - Copy", "Custom.1"}}),
#"Added Custom3" = Table.AddColumn(#"Renamed Columns3", "Custom.2", each if ( List.Count(Text.Split([Custom.1],"/")))>=3 then [Custom.1] else null),
#"Replaced Errors1" = Table.ReplaceErrorValues(#"Added Custom3", {{"Custom.2", " "}}),
#"Filtered Rows3" = Table.SelectRows(#"Replaced Errors1", each not Text.StartsWith([Custom.2], "/") and not Text.StartsWith([Custom.2], "http") and not Text.Contains([Custom.2], "//") and not Text.Contains([Custom.2], "ele") and not Text.Contains([Custom.2], "source")),
#"Inserted Kept Characters" = Table.AddColumn(#"Filtered Rows3", "Kept Characters", each Text.Select([Custom.2], {"/".."9"}), type text),
#"Filtered Rows1" = Table.SelectRows(#"Inserted Kept Characters", each not Text.Contains([Kept Characters], "//") and not Text.Contains([Kept Characters], " /2/ ")),
#"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each ([Custom.2] <> null)),
#"Added Custom4" = Table.AddColumn(#"Filtered Rows2", "new col", each if ( List.Count(Text.Split([#"Kept Characters"],"/")))=3 then [#"Kept Characters"] else null),
#"Renamed Columns4" = Table.RenameColumns(#"Added Custom4",{{"new col", "new col.1"}}),
#"Sorted Rows" = Table.Sort(#"Renamed Columns4",{{"m_id", Order.Ascending}}),
#"Split Column by Delimiter4" = Table.SplitColumn(#"Sorted Rows", "new col.1", Splitter.SplitTextByDelimiter("6718", QuoteStyle.Csv), {"new col.1.1", "new col.1.2", "new col.1.3"}),
#"Removed Columns4" = Table.RemoveColumns(#"Split Column by Delimiter4",{"new col.1.3", "new col.1.2"}),
#"Duplicated Column3" = Table.DuplicateColumn(#"Removed Columns4", "new col.1.1", "new col.1.1 - Copy"),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Duplicated Column3", "new col.1.1", Splitter.SplitTextByDelimiter("/", QuoteStyle.Csv), {"new col.1.1.1", "new col.1.1.2", "new col.1.1.3"}),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter2",null," ",Replacer.ReplaceValue,{"new col.1.1 - Copy"}),
#"Added Custom7" = Table.AddColumn(#"Replaced Value", "mon1", each if [mon]<=12 then [mon]
else " "),
#"Filtered Rows8" = Table.SelectRows(#"Added Custom7", each ([new col.1.1.1] <> null)),
#"Extracted Last Characters" = Table.TransformColumns(#"Filtered Rows8", {{"new col.1.1.1", each Text.End(_, 2), type text}}),
#"Filtered Rows9" = Table.SelectRows(#"Extracted Last Characters", each not Text.StartsWith([new col.1.1.1], "/")),
#"Changed Type6" = Table.TransformColumnTypes(#"Filtered Rows9",{{"new col.1.1.1", Int64.Type}, {"new col.1.1.2", Int64.Type}, {"new col.1.1.3", Int64.Type}}),
#"Added Custom5" = Table.AddColumn(#"Changed Type6", "mon", each if [new col.1.1.1]>=31 then Number.FromText(Text.End(Number.ToText([new col.1.1.1]),1)) else [new col.1.1.1]),
#"Replaced Errors4" = Table.ReplaceErrorValues(#"Added Custom5", {{"mon", 0}}),
#"Added Custom6" = Table.AddColumn(#"Replaced Errors4", "day1", each if [mon]>12 then [mon]
else " "),
#"Filtered Rows4" = Table.SelectRows(#"Added Custom6", each [new col.1.1.2] <= 31),
#"Added Custom8" = Table.AddColumn(#"Filtered Rows4", "mon2", each if [new col.1.1.2] <=12 and [day1]<>" "
then [new col.1.1.2]
else " "),
#"Added Custom16" = Table.AddColumn(#"Added Custom8", "day2", each if [day1]= " "
then [new col.1.1.2]
else " "),
#"Added Custom2" = Table.AddColumn(#"Added Custom16", "Custom.3", each if [mon2] = " " then [mon]
else " "),
#"Renamed Columns5" = Table.RenameColumns(#"Added Custom2",{{"Custom.3", "mon3"}}),
#"separate year column" = Table.AddColumn(#"Renamed Columns5", "year1", each if [new col.1.1.3] < 2000 and [new col.1.1.3] >= 10 and Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2))>=10 and Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2))<= Number.FromText(Text.End(Number.ToText(Date.Year(DateTime.FixedLocalNow())),2)) then Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2)) else if [new col.1.1.3] > Date.Year(DateTime.FixedLocalNow()) and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2)) <= Number.FromText(Text.End(Number.ToText(Date.Year(DateTime.FixedLocalNow())),2)) and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2))>= 10 and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),4))>=2100 then Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2))
else " "),
#"separate year column2" = Table.AddColumn(#"separate year column", "year2", each if [new col.1.1.3]<= Date.Year(DateTime.FixedLocalNow()) and [new col.1.1.3] >= 2000 then [new col.1.1.3] else if [new col.1.1.3] > Date.Year(DateTime.FixedLocalNow()) and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),4))>= 2000 and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),4))<= Date.Year(DateTime.FixedLocalNow()) then Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),4))
else " "),
#"separate year column3" = Table.AddColumn(#"separate year column2", "year3", each if [new col.1.1.3] < 2000 and Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2)) < 10 and Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2))>= 0 then Number.FromText(Text.End(Number.ToText([new col.1.1.3]),2)) else if [new col.1.1.3] > Date.Year(DateTime.FixedLocalNow()) and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2)) < 10 and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2))>= 0 and Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),4))>=2100 then Number.FromText(Text.Start(Number.ToText([new col.1.1.3]),2))
else " "
),
#"Changed Type7" = Table.TransformColumnTypes(#"separate year column3",{{"day1", type text}, {"mon1", type text}, {"mon2", type text}, {"day2", type text}, {"year1", type text}, {"year2", type text}, {"mon3", type text}}),
#"Added Custom19" = Table.AddColumn(#"Changed Type7", "year11", each if [year1]<>" " then Text.Insert([year1],0,"20") else if [year3]<>" " then Text.Insert([year3],0,"200")
else " "),
#"Inserted Merged Column" = Table.AddColumn(#"Added Custom19", "Date", each if [mon3] = " " and [year1] = " " then Text.Combine({[mon2],[day1],[year2]}, "/") else if [mon3] = " " and [year2]= " "then Text.Combine({[mon2],[day1],[year11]}, "/") else if [mon2] = " " and [year1] = " " then Text.Combine({[mon3],[day2],[year2]}, "/") else if [mon2] = " " and [year2] = " " then Text.Combine({[mon3],[day2],[year11]}, "/") else " ", type text),
#"Replaced Value1" = Table.ReplaceValue(#"Inserted Merged Column",null," ",Replacer.ReplaceValue,{"Date"}),
#"Changed Type" = Table.TransformColumnTypes(#"Replaced Value1",{{"Date", type date}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Date"}),
#"Grouped Rows1" = Table.Group(#"Removed Errors", {"m_id","t_id"}, {{"Count", each List.First([Date]), type nullable date}}),
#"Renamed Columns2" = Table.RenameColumns(#"Grouped Rows1",{{"Count", "Date"}})
in
#"Renamed Columns2"

 

Then I merged it with another table using the m_id and replace the null value using the date 3333,3,31:

= Table.ReplaceValue(#"Expanded onboard_date",null,#date(3333, 3, 31),Replacer.ReplaceValue,{"onboard_date.Date"})

and I finally calculate the Old Days by using DAX:  

Old Days = IF(DATEDIFF('merged date'[onboard_date.Date],TODAY(),DAY)<0,BLANK(),DATEDIFF('merged date'[onboard_date.Date],TODAY(),DAY))
0 REPLIES 0

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors