Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
jccastelblancos
New Member

implementation of R in notebook

Good evening and greetings to all

I come to this forum with the intention of requesting help with the following, load a table in a lakeHouse through a pipeline in the tables section

I am reading it with the read.df() method
So far there is no problem, the problem arises when I want to apply tidyverse methods

for example filter() returns an error like the following:

[1] "Error in UseMethod(\"filter\"): no applicable method for 'filter' applied to an object of class \"SparkDataFrame\""


In fact, when I pass class() the object that returns is SparkDataFrame

I would like to be able to manipulate the data as I would with tidyverse, I have seen that this can be achieved

https://learn.microsoft.com/es-es/fabric/data-science/r-use-tidyverse

but I don't know which route I should follow to be able to do it.

I appreciate any help

1 ACCEPTED SOLUTION

@TomMartens;

@Anonymous 

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

View solution in original post

9 REPLIES 9
FlavioLeccese
Frequent Visitor

As an R user i would suggest you to use DBI and dbplyr to preserve the tidyverse logic in MS Fabric notebook.
Tables in lakehouse are in fact tables of a database.

- You can retrieve sql endpoint of that lakehouse
- Create a Service Principal in Azure and use it as a "super user":

 

r code:

driver <- "ODBC Driver 18 for SQL Server"
server <- "put here the sql end point of lakehouse"
database <- "lakehouse, ie database name"
authentication <- "ActiveDirectoryServicePrincipal"
encrypt <- "True"
userid <- "id of the Service Principal"
password <- "Password of the Service Principal"
port <- 1433

connection_string <- glue("Driver={driver}; Server={server}; UID={userid}; PWD={password}; Port={port}; Database={database}; Encoding=UTF-8; Authentication={authentication}")

con <- DBI::dbConnect(odbc::odbc(), .connection_string = connection_string)

 

This is a clean way and you can read tables also from outside MS Fabric

jccastelblancos
New Member

@TomMartens  Thank you for answering so quickly.
In fact I have an object type:

[1] "SparkDataFrame"
attr(,"package")
[1] "SparkR"

 

the idea is to read the lakehouse or convert this object to a data.frame that can be manipulated with tidyverse

Can you guide me how to do it, please?

Hey @jccastelblancos ,

 

my assumption that you have to use a Pandas dataframe was wrong, please excuse, but I'm not that much an R or being more precise a sparkR type.

From this article: Use SparkR - Microsoft Fabric | Microsoft Learn the most essential part was this:

 

To access Lakehouse files using Spark packages, such as read.df or write.df, use its ADFS path or relative path for Spark. In the Lakehouse explorer, right click on the files or folder you want to access and copy its ADFS path or relative path for Spark from the contextual menu.

 

For this little demo I used the ADFS path, my code cell:

 

%%sparkr
df <- read.df("abfss://.../DimDate", source= "parquet", header = "true", inferSchema = "true")
display(df)

 


I assume from here you can use TidyVerse: Use Tidyverse - Microsoft Fabric | Microsoft Learn

 

Hopefully, this will help to tackle this challenge.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens;

@Anonymous 

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

Anonymous
Not applicable

Hello @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

Anonymous
Not applicable

Hi @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

TomMartens
Super User
Super User

Hey @jccastelblancos ,

 

I assume that you have to convert the spark dataframe to a pandas dataframe. Probably this article will help you out on this: fabric-docs/docs/data-science/read-write-pandas.md at main · MicrosoftDocs/fabric-docs · GitHub

 

Hopefully, ths provides what you are looking for.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens  @Anonymous 

It turned out to be much more complex than I thought, for now what I did was use createOrReplaceTempView to use sql with sql().
I expected it to be simpler but I have had countless problems with the use of SparkR, I think it is more designed for PySpark than for tidyverse. Still, I'll try a few more things. Thanks for the assistance

Anonymous
Not applicable

Hi @jccastelblancos ,

Glad to know that you got some insights over your query. Please continue using Fabric Community on your further queries.


Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.