Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Error read_html with R script in power BI

I've made an R script to scrape a certain website. It works fine when I run it in Rstudio. Now I want to integrate it in a power Bi desktop so my co workers can work with it (without having to use Rstudio). However, I keep getting an error, it seems the read_html function doesnt work.

 

BillyBouw_2-1615459487344.png

 

My original script, which works in Rstudio. I use a .xlsx file which contains a list of URLS (input.xlsx)

rm(list = ls())
library(rvest)
library(readxl)
library(xlsx)
library(rstudioapi)

# Set working directory and import URLs
setwd(dirname(getActiveDocumentContext()$path))
dfURL <- read_xlsx("Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

I already searched allot, one of the things I found is that you should point out the file and library locations explicitly. In power Bi, I made sure that the library locations are the same as the locations used by Rstudio (.libloc()) Also, prior to this error, Power Bi returned an error that it couldn't find the xml2 package, I installed and loaded it allong with the other packages, the resulting code is where I stand now and what produces the error. I've tried to make the code in such a way, that whoever is willing and able to help me can just copy and paste the code, so that the library locations are generic.

libloc_rvest <- find.package('rvest')
libloc_rvest <- substr(libloc_rvest,1,nchar(libloc_rvest) - nchar("rvest") - 1)
libloc_readxl <- find.package("readxl")
libloc_readxl <- substr(libloc_readxl,1,nchar(libloc_readxl) - nchar("readxl") - 1)
libloc_rstudioapi <- find.package("rstudioapi")
libloc_rstudioapi <- substr(libloc_rstudioapi,1,nchar(libloc_rstudioapi) - nchar("rstudioapi") - 1)
libloc_xml2 <- find.package("xml2")
libloc_xml2 <- substr(libloc_xml2,1,nchar(libloc_xml2) - nchar("xml2") - 1)
library(rvest, lib.loc=libloc_rvest)
library(readxl, lib.loc=libloc_readxl)
library(rstudioapi, lib.loc=libloc_rstudioapi)
library(xml2, lib.loc=libloc_xml2)

# Set working directory and import URLs
dfURL <- read_xlsx("N:/ETAscraper/ETAscraper/Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
# write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

 

If anyone could help me out, or point me in the right direction, that would be greatly appreciated!

1 REPLY 1
v-kelly-msft
Community Support
Community Support

Hi @Anonymous ,

 

It should be a Rvest error,check the reference below:

https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.