Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Good evening and greetings to all
I come to this forum with the intention of requesting help with the following, load a table in a lakeHouse through a pipeline in the tables section
I am reading it with the read.df() method
So far there is no problem, the problem arises when I want to apply tidyverse methods
for example filter() returns an error like the following:
[1] "Error in UseMethod(\"filter\"): no applicable method for 'filter' applied to an object of class \"SparkDataFrame\""
In fact, when I pass class() the object that returns is SparkDataFrame
I would like to be able to manipulate the data as I would with tidyverse, I have seen that this can be achieved
https://learn.microsoft.com/es-es/fabric/data-science/r-use-tidyverse
but I don't know which route I should follow to be able to do it.
I appreciate any help
Solved! Go to Solution.
@Anonymous
I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.
As an R user i would suggest you to use DBI and dbplyr to preserve the tidyverse logic in MS Fabric notebook.
Tables in lakehouse are in fact tables of a database.
- You can retrieve sql endpoint of that lakehouse
- Create a Service Principal in Azure and use it as a "super user":
r code:
driver <- "ODBC Driver 18 for SQL Server"
server <- "put here the sql end point of lakehouse"
database <- "lakehouse, ie database name"
authentication <- "ActiveDirectoryServicePrincipal"
encrypt <- "True"
userid <- "id of the Service Principal"
password <- "Password of the Service Principal"
port <- 1433
connection_string <- glue("Driver={driver}; Server={server}; UID={userid}; PWD={password}; Port={port}; Database={database}; Encoding=UTF-8; Authentication={authentication}")
con <- DBI::dbConnect(odbc::odbc(), .connection_string = connection_string)
This is a clean way and you can read tables also from outside MS Fabric
@TomMartens Thank you for answering so quickly.
In fact I have an object type:
[1] "SparkDataFrame"
attr(,"package")
[1] "SparkR"
the idea is to read the lakehouse or convert this object to a data.frame that can be manipulated with tidyverse
Can you guide me how to do it, please?
Hey @jccastelblancos ,
my assumption that you have to use a Pandas dataframe was wrong, please excuse, but I'm not that much an R or being more precise a sparkR type.
From this article: Use SparkR - Microsoft Fabric | Microsoft Learn the most essential part was this:
To access Lakehouse files using Spark packages, such as read.df or write.df, use its ADFS path or relative path for Spark. In the Lakehouse explorer, right click on the files or folder you want to access and copy its ADFS path or relative path for Spark from the contextual menu.
For this little demo I used the ADFS path, my code cell:
%%sparkr
df <- read.df("abfss://.../DimDate", source= "parquet", header = "true", inferSchema = "true")
display(df)
I assume from here you can use TidyVerse: Use Tidyverse - Microsoft Fabric | Microsoft Learn
Hopefully, this will help to tackle this challenge.
Regards,
Tom
@Anonymous
I found a solution, which I think may be better, but for now it is my contribution in case anyone else facing the same solution has an alternative.
Since you are working as a pipeline to get the data to the lakehouse, change the destination so that it is not a table but a file, that is, files/. After this we can load it with readr and work with tidyverse, here the cumbersome thing is that to take it to tables I had no alternative but to write the files/processed table again, then read with SparkR from there to finally be able to use the write procedure which writes it to tables with overwrite or append mode. It is a longer route than I expected but I hope that if anyone knows a better alternative, let us know through this thread. Thanks to everyone who came to my aid.
Hello @jccastelblancos ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks
Hi @jccastelblancos ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Thanks
Hey @jccastelblancos ,
I assume that you have to convert the spark dataframe to a pandas dataframe. Probably this article will help you out on this: fabric-docs/docs/data-science/read-write-pandas.md at main · MicrosoftDocs/fabric-docs · GitHub
Hopefully, ths provides what you are looking for.
Regards,
Tom
@TomMartens @Anonymous
It turned out to be much more complex than I thought, for now what I did was use createOrReplaceTempView to use sql with sql().
I expected it to be simpler but I have had countless problems with the use of SparkR, I think it is more designed for PySpark than for tidyverse. Still, I'll try a few more things. Thanks for the assistance
Hi @jccastelblancos ,
Glad to know that you got some insights over your query. Please continue using Fabric Community on your further queries.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.
User | Count |
---|---|
6 | |
4 | |
4 | |
4 | |
1 |
User | Count |
---|---|
16 | |
12 | |
9 | |
9 | |
4 |