March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Good morning
The question is simple, I fear that the solution is not so simple, someone has been able to connect from a factory notebook to an Azure Data Lake Storage Gen2 without having to perform a data pipeline or a data flow. That is, directly with Pyspark Spark or SparkR.
I would greatly appreciate, if possible, the code fragment in these three languages.
Thank you very much in advance for any guidance.
Solved! Go to Solution.
Hi @jcamilo1985,
You need to go to Lakehouse view and a new shortcut:
1. Specify the DFS path of the storage account (you can get it from the endpoint properties of the storage account)
2. Authentication method - You can either use your own org account or use a SAS token or service principal account. I would suggest service principal account as the best practice.
3. In the next screen, add a specific container as the path and then click on create.
4. Once added, it will be shown under unidentified folder as a shorcut with the storage account name. You will be able to see all the files underneath that.
5. To use it in Spark notebook, add the corresponding lakehouse in the notebook and then navigate to the file you want to load and click on the '...' and then you will be able to see this option for loading the data using Spark or Pandas. You can use either of them and the code for loading them in a dataframe will automatically be generated.
This is the Pyspark code that was generated for the 1.csv file I had shown
Hi @jcamilo1985,
You need to go to Lakehouse view and a new shortcut:
1. Specify the DFS path of the storage account (you can get it from the endpoint properties of the storage account)
2. Authentication method - You can either use your own org account or use a SAS token or service principal account. I would suggest service principal account as the best practice.
3. In the next screen, add a specific container as the path and then click on create.
4. Once added, it will be shown under unidentified folder as a shorcut with the storage account name. You will be able to see all the files underneath that.
5. To use it in Spark notebook, add the corresponding lakehouse in the notebook and then navigate to the file you want to load and click on the '...' and then you will be able to see this option for loading the data using Spark or Pandas. You can use either of them and the code for loading them in a dataframe will automatically be generated.
This is the Pyspark code that was generated for the 1.csv file I had shown
Hi @jcamilo1985 you could use a shortcut created in a lakehouse to work with the data in the azure data lake gen2 account https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts
or you could just reference the data lake location using the abfss url in the notebook
@AndyDDC First of all, thank you for coming to my help.
Does it have a link that explains how the reference is made.
Hi @jcamilo1985
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.