Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
I've made an R script to scrape a certain website. It works fine when I run it in Rstudio. Now I want to integrate it in a power Bi desktop so my co workers can work with it (without having to use Rstudio). However, I keep getting an error, it seems the read_html function doesnt work.
My original script, which works in Rstudio. I use a .xlsx file which contains a list of URLS (input.xlsx)
rm(list = ls())
library(rvest)
library(readxl)
library(xlsx)
library(rstudioapi)
# Set working directory and import URLs
setwd(dirname(getActiveDocumentContext()$path))
dfURL <- read_xlsx("Input.xlsx")
# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL
for (i in 1:nrow(dfURL)){
# Get URL & load webpage
url <- as.character(dfURL[i,])
page <- read_html(url)
# Extract CSS adresses
CSSextract1 <- html_nodes(page,'.n3ata')
CSSextract2 <- html_nodes(page,'.st')
# Convert to text
toText1 <- html_text(CSSextract1)
toText2 <- html_text(CSSextract2)
# Extract information from text
destination <- trimws(toText1[1])
ETA <- toText1[2]
# Fill df with information
dfETA$Vessel[i] <- toText2
dfETA$Destination[i] <- destination
dfETA$ETA[i] <- ETA
}
# Write to xlsx
write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)I already searched allot, one of the things I found is that you should point out the file and library locations explicitly. In power Bi, I made sure that the library locations are the same as the locations used by Rstudio (.libloc()) Also, prior to this error, Power Bi returned an error that it couldn't find the xml2 package, I installed and loaded it allong with the other packages, the resulting code is where I stand now and what produces the error. I've tried to make the code in such a way, that whoever is willing and able to help me can just copy and paste the code, so that the library locations are generic.
libloc_rvest <- find.package('rvest')
libloc_rvest <- substr(libloc_rvest,1,nchar(libloc_rvest) - nchar("rvest") - 1)
libloc_readxl <- find.package("readxl")
libloc_readxl <- substr(libloc_readxl,1,nchar(libloc_readxl) - nchar("readxl") - 1)
libloc_rstudioapi <- find.package("rstudioapi")
libloc_rstudioapi <- substr(libloc_rstudioapi,1,nchar(libloc_rstudioapi) - nchar("rstudioapi") - 1)
libloc_xml2 <- find.package("xml2")
libloc_xml2 <- substr(libloc_xml2,1,nchar(libloc_xml2) - nchar("xml2") - 1)
library(rvest, lib.loc=libloc_rvest)
library(readxl, lib.loc=libloc_readxl)
library(rstudioapi, lib.loc=libloc_rstudioapi)
library(xml2, lib.loc=libloc_xml2)
# Set working directory and import URLs
dfURL <- read_xlsx("N:/ETAscraper/ETAscraper/Input.xlsx")
# Initiate dataframe
dfETA <- data.frame(matrix(ncol = 4, nrow = nrow(dfURL)))
colnames(dfETA) <- c("URL", "Vessel", "Destination", "ETA")
dfETA$URL <- dfURL$URL
for (i in 1:nrow(dfURL)){
# Get URL & load webpage
url <- as.character(dfURL[i,])
page <- read_html(url)
# Extract CSS adresses
CSSextract1 <- html_nodes(page,'.n3ata')
CSSextract2 <- html_nodes(page,'.st')
# Convert to text
toText1 <- html_text(CSSextract1)
toText2 <- html_text(CSSextract2)
# Extract information from text
destination <- trimws(toText1[1])
ETA <- toText1[2]
# Fill df with information
dfETA$Vessel[i] <- toText2
dfETA$Destination[i] <- destination
dfETA$ETA[i] <- ETA
}
# Write to xlsx
# write.xlsx(dfETA,"Output.xlsx", append = FALSE, row.names = FALSE)
If anyone could help me out, or point me in the right direction, that would be greatly appreciated!
Hi @Anonymous ,
It should be a Rvest error,check the reference below:
https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md
Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 49 | |
| 44 | |
| 42 | |
| 19 | |
| 18 |
| User | Count |
|---|---|
| 74 | |
| 71 | |
| 34 | |
| 33 | |
| 31 |