Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.

Reply
bwelsh
Helper I
Helper I

Extracting Hex Color Codes from Websites

I am looking for a way to extract hex color codes from the code of a webpage. I have a number of websites that we manage and would like to scrape the site to extract their color schemes for marketing research purposes. 

 

Hex color codes are always #CCC or #CCCCCC, example:

 

     color: #FFF;
     background: #57C2BD;

 

So if I can import the html for the site, I am fairly confident I can extract these color codes. The issue I am having is finding a way to get PowerBI to import the raw html of the page versus the document/table format which is cumbersom and only appears to let me get to the visible text on the page.

 

Any suggestions would be appreciated.

1 ACCEPTED SOLUTION

A couple of ways.

 

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

 

let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"

a more reliable way would be to use r and a webscraping library

 

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

View solution in original post

5 REPLIES 5
stretcharm
Memorable Member
Memorable Member

If you have firefox there is an eyedropper in the developer tools that shows the hex value when you hover over.

 

https://developer.mozilla.org/en-US/docs/Tools/Eyedropper

 

Chrome dev tools have a similar tool, but its not as easy to get to.

https://paul.kinlan.me/eyedropper-chrome-dev-tools/

 

I aware of these tools, however we need a way to programatically extract the hex color codes. We have a large number of websites we need to scrape.

OK.

 

Something like this should work. I used this technique it to strip out links form a web site before.

From here you can split and filter your html source code.

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))})
 in
    Source

Thanks, curious to know how you extract the URLs exactly.

A couple of ways.

 

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

 

let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"

a more reliable way would be to use r and a webscraping library

 

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

Helpful resources

Announcements
Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.