implementation of R in notebook

jccastelblancos · ‎06-25-2024

Good evening and greetings to all

I come to this forum with the intention of requesting help with the following, load a table in a lakeHouse through a pipeline in the tables section

I am reading it with the read.df() method
So far there is no problem, the problem arises when I want to apply tidyverse methods

for example filter() returns an error like the following:

[1] "Error in UseMethod(\"filter\"): no applicable method for 'filter' applied to an object of class \"SparkDataFrame\""

In fact, when I pass class() the object that returns is SparkDataFrame

I would like to be able to manipulate the data as I would with tidyverse, I have seen that this can be achieved

https://learn.microsoft.com/es-es/fabric/data-science/r-use-tidyverse

but I don't know which route I should follow to be able to do it.

I appreciate any help

jccastelblancos · ‎07-08-2024

@TomMartens;

@Anonymous

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

View solution in original post

FlavioLeccese · ‎10-03-2024

As an R user i would suggest you to use DBI and dbplyr to preserve the tidyverse logic in MS Fabric notebook.
Tables in lakehouse are in fact tables of a database.

- You can retrieve sql endpoint of that lakehouse
- Create a Service Principal in Azure and use it as a "super user":

r code:

driver <- "ODBC Driver 18 for SQL Server"
server <- "put here the sql end point of lakehouse"
database <- "lakehouse, ie database name"
authentication <- "ActiveDirectoryServicePrincipal"
encrypt <- "True"
userid <- "id of the Service Principal"
password <- "Password of the Service Principal"
port <- 1433

connection_string <- glue("Driver={driver}; Server={server}; UID={userid}; PWD={password}; Port={port}; Database={database}; Encoding=UTF-8; Authentication={authentication}")

con <- DBI::dbConnect(odbc::odbc(), .connection_string = connection_string)

This is a clean way and you can read tables also from outside MS Fabric

jccastelblancos · ‎06-26-2024

@TomMartens Thank you for answering so quickly.
In fact I have an object type:

[1] "SparkDataFrame"
attr(,"package")
[1] "SparkR"

the idea is to read the lakehouse or convert this object to a data.frame that can be manipulated with tidyverse

Can you guide me how to do it, please?

TomMartens · ‎06-26-2024

Hey @jccastelblancos ,

my assumption that you have to use a Pandas dataframe was wrong, please excuse, but I'm not that much an R or being more precise a sparkR type.

From this article: Use SparkR - Microsoft Fabric | Microsoft Learn the most essential part was this:

To access Lakehouse files using Spark packages, such as read.df or write.df, use its ADFS path or relative path for Spark. In the Lakehouse explorer, right click on the files or folder you want to access and copy its ADFS path or relative path for Spark from the contextual menu.

For this little demo I used the ADFS path, my code cell:

%%sparkr
df <- read.df("abfss://.../DimDate", source= "parquet", header = "true", inferSchema = "true")
display(df)

I assume from here you can use TidyVerse: Use Tidyverse - Microsoft Fabric | Microsoft Learn

Hopefully, this will help to tackle this challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

jccastelblancos · ‎07-08-2024

@TomMartens;

@Anonymous

I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.

Anonymous · ‎07-01-2024

Hello @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

Anonymous · ‎07-02-2024

Hi @jccastelblancos ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks

TomMartens · ‎06-25-2024

Hey @jccastelblancos ,

I assume that you have to convert the spark dataframe to a pandas dataframe. Probably this article will help you out on this: fabric-docs/docs/data-science/read-write-pandas.md at main · MicrosoftDocs/fabric-docs · GitHub

Hopefully, ths provides what you are looking for.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

jccastelblancos · ‎07-02-2024

@TomMartens @Anonymous

It turned out to be much more complex than I thought, for now what I did was use createOrReplaceTempView to use sql with sql().
I expected it to be simpler but I have had countless problems with the use of SparkR, I think it is more designed for PySpark than for tidyverse. Still, I'll try a few more things. Thanks for the assistance

Anonymous · ‎07-05-2024

Hi @jccastelblancos ,

Glad to know that you got some insights over your query. Please continue using Fabric Community on your further queries.

implementation of R in notebook