Solved: Extracting Hex Color Codes from Websites

bwelsh · ‎11-13-2017

I am looking for a way to extract hex color codes from the code of a webpage. I have a number of websites that we manage and would like to scrape the site to extract their color schemes for marketing research purposes.

Hex color codes are always #CCC or #CCCCCC, example:

color: #FFF;
background: #57C2BD;

So if I can import the html for the site, I am fairly confident I can extract these color codes. The issue I am having is finding a way to get PowerBI to import the raw html of the page versus the document/table format which is cumbersom and only appears to let me get to the visible text on the page.

Any suggestions would be appreciated.

stretcharm · ‎11-14-2017

A couple of ways.

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

let
 Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
 #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
 #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
 #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
 #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
 #"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
 #"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
 #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
 #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
 #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
 #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
 #"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
 #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
 #"Removed Columns"

a more reliable way would be to use r and a webscraping library

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

View solution in original post

stretcharm · ‎11-13-2017

If you have firefox there is an eyedropper in the developer tools that shows the hex value when you hover over.

https://developer.mozilla.org/en-US/docs/Tools/Eyedropper

Chrome dev tools have a similar tool, but its not as easy to get to.

https://paul.kinlan.me/eyedropper-chrome-dev-tools/

bwelsh · ‎11-14-2017

I aware of these tools, however we need a way to programatically extract the hex color codes. We have a large number of websites we need to scrape.

stretcharm · ‎11-14-2017

OK.

Something like this should work. I used this technique it to strip out links form a web site before.

From here you can split and filter your html source code.

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))})
 in
    Source

bwelsh · ‎11-14-2017

Thanks, curious to know how you extract the URLs exactly.

stretcharm · ‎11-14-2017

A couple of ways.

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

let
 Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
 #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
 #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
 #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
 #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
 #"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
 #"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
 #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
 #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
 #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
 #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
 #"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
 #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
 #"Removed Columns"

a more reliable way would be to use r and a webscraping library

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

Extracting Hex Color Codes from Websites

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

How to Get Your Question Answered Quickly

Join us at FabCon Vienna from September 15-18, 2025

Extracting Hex Color Codes from Websites

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

How to Get Your Question Answered Quickly