Reply
jccastelblancos
New Member

implementation of R in notebook

Good evening and greetings to all

I come to this forum with the intention of requesting help with the following, load a table in a lakeHouse through a pipeline in the tables section

I am reading it with the read.df() method
So far there is no problem, the problem arises when I want to apply tidyverse methods

for example filter() returns an error like the following:

[1] "Error in UseMethod(\"filter\"): no applicable method for 'filter' applied to an object of class \"SparkDataFrame\""


In fact, when I pass class() the object that returns is SparkDataFrame

I would like to be able to manipulate the data as I would with tidyverse, I have seen that this can be achieved

https://learn.microsoft.com/es-es/fabric/data-science/r-use-tidyverse

but I don't know which route I should follow to be able to do it.

I appreciate any help

1 ACCEPTED SOLUTION

@TomMartens;

@Anonymous 

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

View solution in original post

9 REPLIES 9
FlavioLeccese
Frequent Visitor

As an R user i would suggest you to use DBI and dbplyr to preserve the tidyverse logic in MS Fabric notebook.
Tables in lakehouse are in fact tables of a database.

- You can retrieve sql endpoint of that lakehouse
- Create a Service Principal in Azure and use it as a "super user":

 

r code:

driver <- "ODBC Driver 18 for SQL Server"
server <- "put here the sql end point of lakehouse"
database <- "lakehouse, ie database name"
authentication <- "ActiveDirectoryServicePrincipal"
encrypt <- "True"
userid <- "id of the Service Principal"
password <- "Password of the Service Principal"
port <- 1433

connection_string <- glue("Driver={driver}; Server={server}; UID={userid}; PWD={password}; Port={port}; Database={database}; Encoding=UTF-8; Authentication={authentication}")

con <- DBI::dbConnect(odbc::odbc(), .connection_string = connection_string)

 

This is a clean way and you can read tables also from outside MS Fabric

jccastelblancos
New Member

@TomMartens  Thank you for answering so quickly.
In fact I have an object type:

[1] "SparkDataFrame"
attr(,"package")
[1] "SparkR"

 

the idea is to read the lakehouse or convert this object to a data.frame that can be manipulated with tidyverse

Can you guide me how to do it, please?

Hey @jccastelblancos ,

 

my assumption that you have to use a Pandas dataframe was wrong, please excuse, but I'm not that much an R or being more precise a sparkR type.

From this article: Use SparkR - Microsoft Fabric | Microsoft Learn the most essential part was this:

 

To access Lakehouse files using Spark packages, such as read.df or write.df, use its ADFS path or relative path for Spark. In the Lakehouse explorer, right click on the files or folder you want to access and copy its ADFS path or relative path for Spark from the contextual menu.

 

For this little demo I used the ADFS path, my code cell:

 

%%sparkr
df <- read.df("abfss://.../DimDate", source= "parquet", header = "true", inferSchema = "true")
display(df)

 


I assume from here you can use TidyVerse: Use Tidyverse - Microsoft Fabric | Microsoft Learn

 

Hopefully, this will help to tackle this challenge.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens;

@Anonymous 

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

avatar user
Anonymous
Not applicable

Hello @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

avatar user
Anonymous
Not applicable

Hi @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

TomMartens
Super User
Super User

Hey @jccastelblancos ,

 

I assume that you have to convert the spark dataframe to a pandas dataframe. Probably this article will help you out on this: fabric-docs/docs/data-science/read-write-pandas.md at main · MicrosoftDocs/fabric-docs · GitHub

 

Hopefully, ths provides what you are looking for.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens  @Anonymous 

It turned out to be much more complex than I thought, for now what I did was use createOrReplaceTempView to use sql with sql().
I expected it to be simpler but I have had countless problems with the use of SparkR, I think it is more designed for PySpark than for tidyverse. Still, I'll try a few more things. Thanks for the assistance

avatar user
Anonymous
Not applicable

Hi @jccastelblancos ,

Glad to know that you got some insights over your query. Please continue using Fabric Community on your further queries.


avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebFBC_Carousel

Fabric Monthly Update - February 2025

Check out the February 2025 Fabric update to learn about new features.

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)