Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount.
Register nowThe Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.
I am looking for a way to extract hex color codes from the code of a webpage. I have a number of websites that we manage and would like to scrape the site to extract their color schemes for marketing research purposes.
Hex color codes are always #CCC or #CCCCCC, example:
color: #FFF;
background: #57C2BD;
So if I can import the html for the site, I am fairly confident I can extract these color codes. The issue I am having is finding a way to get PowerBI to import the raw html of the page versus the document/table format which is cumbersom and only appears to let me get to the visible text on the page.
Any suggestions would be appreciated.
Solved! Go to Solution.
A couple of ways.
Something like this with but it's not pretty and doesn't always get the right results depending on the page.
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"
a more reliable way would be to use r and a webscraping library
library(rvest) page=read_html("WEBSITE") data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))
If you have firefox there is an eyedropper in the developer tools that shows the hex value when you hover over.
https://developer.mozilla.org/en-US/docs/Tools/Eyedropper
Chrome dev tools have a similar tool, but its not as easy to get to.
https://paul.kinlan.me/eyedropper-chrome-dev-tools/
I aware of these tools, however we need a way to programatically extract the hex color codes. We have a large number of websites we need to scrape.
OK.
Something like this should work. I used this technique it to strip out links form a web site before.
From here you can split and filter your html source code.
let Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}) in Source
Thanks, curious to know how you extract the URLs exactly.
A couple of ways.
Something like this with but it's not pretty and doesn't always get the right results depending on the page.
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"
a more reliable way would be to use r and a webscraping library
library(rvest) page=read_html("WEBSITE") data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))
User | Count |
---|---|
120 | |
72 | |
69 | |
57 | |
50 |
User | Count |
---|---|
167 | |
82 | |
68 | |
65 | |
53 |