Solved: Re: Extract data after html tag

heriberto_mb · ‎09-26-2024

I want to extract the system status from the following page:

https://www.akamaistatus.com/

I was able to pull the html data, now I just want to pull the line that says "All Systems Operational", is there a way to use the <div class="page-status status-none"> as a delimiter so I can extract the data that is 2 rows below?

p.s. I'm thinking about using the html tag as a delimiter, because if I use the line number... it might change with the time.

Ahmedx · ‎09-27-2024

pls try this

let
    Source = Text.FromBinary(Web.Contents("https://www.akamaistatus.com/")),
    Text = Text.BetweenDelimiters( Source, "<span class=""status font-large"">", "<span class=""last-updated-stamp  font-small""></span>" ),
    ImportedText = List.RemoveMatchingItems( List.Transform(Lines.FromText(Text),(x)=> Text.Trim(x)),{""}){0}
in
    ImportedText

View solution in original post

Ahmedx · ‎09-27-2024

pls try this

let
    Source = Text.FromBinary(Web.Contents("https://www.akamaistatus.com/")),
    Text = Text.BetweenDelimiters( Source, "<span class=""status font-large"">", "<span class=""last-updated-stamp  font-small""></span>" ),
    ImportedText = List.RemoveMatchingItems( List.Transform(Lines.FromText(Text),(x)=> Text.Trim(x)),{""}){0}
in
    ImportedText

heriberto_mb · ‎09-27-2024

Thank you @Ahmedx ! That works perfectly...

Can you explain the following line and tell me where can I learn more about it?

ImportedText = List.RemoveMatchingItems( List.Transform(Lines.FromText(Text),(x)=> Text.Trim(x)),{""}){0}

Ahmedx · ‎09-27-2024

Lines.FromText(Text): This function converts the variable Text into a list of lines. Each line from the text becomes an individual element in the list.
List.Transform(..., (x) => Text.Trim(x)): This applies a transformation to each line in the list created in step 1. The transformation involves trimming the whitespace from both the beginning and end of each line using Text.Trim(x).
List.RemoveMatchingItems(..., {""}): This removes any empty strings (i.e., "") from the list that resulted from the transformation in step 2. If any lines were just whitespace, they are removed at this point because they become empty after trimming.
{0}: This refers to the first element of the remaining list after empty strings are removed. It extracts the first non-empty line of the original text.

lbendlin · ‎09-26-2024

Best to use Web.BrowserContents and then play with Html.Table parsing

let
    Source = Web.BrowserContents("https://www.akamaistatus.com/"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".font-small + *"}, {"Column2", ".component-inner-container:nth-child(2) .name"}, {"Column3", ".component-inner-container:nth-child(3) .name"}, {"Column4", ".component-status.tool"}, {"Column5", ".component-inner-container:nth-child(2) .component-status"}, {"Column6", ".component-inner-container:nth-child(3) .component-status"}, {"Column7", ".component-inner-container:nth-child(4) .name"}, {"Column8", ".component-inner-container:nth-child(4) .component-status"}}, [RowSelector=".component-container"])
in
    #"Extracted Table From Html"

heriberto_mb · ‎09-27-2024

Thank you very much for your answer, I see multiple outputs with their status...but I was looking just to capture the general status of the page: "All systems operational".