Error read_html with R script in power BI

Anonymous · ‎03-11-2021

I've made an R script to scrape a certain website. It works fine when I run it in Rstudio. Now I want to integrate it in a power Bi desktop so my co workers can work with it (without having to use Rstudio). However, I keep getting an error, it seems the read_html function doesnt work.

My original script, which works in Rstudio. I use a .xlsx file which contains a list of URLS (input.xlsx)

rm(list = ls())
library(rvest)
library(readxl)
library(xlsx)
library(rstudioapi)

# Set working directory and import URLs
setwd(dirname(getActiveDocumentContext()$path))
dfURL <- read_xlsx("Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

I already searched allot, one of the things I found is that you should point out the file and library locations explicitly. In power Bi, I made sure that the library locations are the same as the locations used by Rstudio (.libloc()) Also, prior to this error, Power Bi returned an error that it couldn't find the xml2 package, I installed and loaded it allong with the other packages, the resulting code is where I stand now and what produces the error. I've tried to make the code in such a way, that whoever is willing and able to help me can just copy and paste the code, so that the library locations are generic.

libloc_rvest <- find.package('rvest')
libloc_rvest <- substr(libloc_rvest,1,nchar(libloc_rvest) - nchar("rvest") - 1)
libloc_readxl <- find.package("readxl")
libloc_readxl <- substr(libloc_readxl,1,nchar(libloc_readxl) - nchar("readxl") - 1)
libloc_rstudioapi <- find.package("rstudioapi")
libloc_rstudioapi <- substr(libloc_rstudioapi,1,nchar(libloc_rstudioapi) - nchar("rstudioapi") - 1)
libloc_xml2 <- find.package("xml2")
libloc_xml2 <- substr(libloc_xml2,1,nchar(libloc_xml2) - nchar("xml2") - 1)
library(rvest, lib.loc=libloc_rvest)
library(readxl, lib.loc=libloc_readxl)
library(rstudioapi, lib.loc=libloc_rstudioapi)
library(xml2, lib.loc=libloc_xml2)

# Set working directory and import URLs
dfURL <- read_xlsx("N:/ETAscraper/ETAscraper/Input.xlsx")

# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL

for (i in 1:nrow(dfURL)){
  # Get URL & load webpage
  url <- as.character(dfURL[i,])
  page <- read_html(url)
  
  # Extract CSS adresses
  CSSextract1 <- html_nodes(page,'.n3ata')
  CSSextract2 <- html_nodes(page,'.st')
  
  # Convert to text
  toText1 <- html_text(CSSextract1)
  toText2 <- html_text(CSSextract2)
  
  # Extract information from text
  destination <- trimws(toText1[1])
  ETA <- toText1[2]
  
  # Fill df with information
  dfETA$Vessel[i] <- toText2
  dfETA$Destination[i] <- destination
  dfETA$ETA[i] <- ETA
}
# Write to xlsx
# write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)

If anyone could help me out, or point me in the right direction, that would be greatly appreciated!

v-kelly-msft · ‎03-14-2021

Hi @Anonymous ，

It should be a Rvest error,check the reference below:

https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Error read_html with R script in power BI

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Error read_html with R script in power BI

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly