Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Next up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now

Reply
Anonymous
Not applicable

Error read_html with R script in power BI

I've made an R script to scrape a certain website. It works fine when I run it in Rstudio. Now I want to integrate it in a power Bi desktop so my co workers can work with it (without having to use Rstudio). However, I keep getting an error, it seems the read_html function doesnt work.

 

BillyBouw_2-1615459487344.png

 

My original script, which works in Rstudio. I use a .xlsx file which contains a list of URLS (input.xlsx)

rm(list = ls())
library(rvest)
library(readxl)
library(xlsx)
library(rstudioapi)

# Set working directory and import URLs
setwd(dirname(getActiveDocumentContext()$path))
dfURL <- read_xlsx("Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

I already searched allot, one of the things I found is that you should point out the file and library locations explicitly. In power Bi, I made sure that the library locations are the same as the locations used by Rstudio (.libloc()) Also, prior to this error, Power Bi returned an error that it couldn't find the xml2 package, I installed and loaded it allong with the other packages, the resulting code is where I stand now and what produces the error. I've tried to make the code in such a way, that whoever is willing and able to help me can just copy and paste the code, so that the library locations are generic.

libloc_rvest <- find.package('rvest')
libloc_rvest <- substr(libloc_rvest,1,nchar(libloc_rvest) - nchar("rvest") - 1)
libloc_readxl <- find.package("readxl")
libloc_readxl <- substr(libloc_readxl,1,nchar(libloc_readxl) - nchar("readxl") - 1)
libloc_rstudioapi <- find.package("rstudioapi")
libloc_rstudioapi <- substr(libloc_rstudioapi,1,nchar(libloc_rstudioapi) - nchar("rstudioapi") - 1)
libloc_xml2 <- find.package("xml2")
libloc_xml2 <- substr(libloc_xml2,1,nchar(libloc_xml2) - nchar("xml2") - 1)
library(rvest, lib.loc=libloc_rvest)
library(readxl, lib.loc=libloc_readxl)
library(rstudioapi, lib.loc=libloc_rstudioapi)
library(xml2, lib.loc=libloc_xml2)

# Set working directory and import URLs
dfURL <- read_xlsx("N:/ETAscraper/ETAscraper/Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
# write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

 

If anyone could help me out, or point me in the right direction, that would be greatly appreciated!

1 REPLY 1
v-kelly-msft
Community Support
Community Support

Hi @Anonymous ,

 

It should be a Rvest error,check the reference below:

https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.

Users online (2,200)