@Anonymous

jcamilo1985 · ‎01-26-2024

Good morning

The question is simple, I fear that the solution is not so simple, someone has been able to connect from a factory notebook to an Azure Data Lake Storage Gen2 without having to perform a data pipeline or a data flow. That is, directly with Pyspark Spark or SparkR.

I would greatly appreciate, if possible, the code fragment in these three languages.

Thank you very much in advance for any guidance.

@Anonymous

govindarajan_d · ‎01-27-2024

Hi @jcamilo1985,

You need to go to Lakehouse view and a new shortcut:

1. Specify the DFS path of the storage account (you can get it from the endpoint properties of the storage account)

2. Authentication method - You can either use your own org account or use a SAS token or service principal account. I would suggest service principal account as the best practice.

3. In the next screen, add a specific container as the path and then click on create.

4. Once added, it will be shown under unidentified folder as a shorcut with the storage account name. You will be able to see all the files underneath that.

5. To use it in Spark notebook, add the corresponding lakehouse in the notebook and then navigate to the file you want to load and click on the '...' and then you will be able to see this option for loading the data using Spark or Pandas. You can use either of them and the code for loading them in a dataframe will automatically be generated.

This is the Pyspark code that was generated for the 1.csv file I had shown

spark.read.format("csv").option("header","true").load("Tables/satrainingdc/week3/files/1.csv")

View solution in original post

govindarajan_d · ‎01-27-2024

Hi @jcamilo1985,

You need to go to Lakehouse view and a new shortcut:

1. Specify the DFS path of the storage account (you can get it from the endpoint properties of the storage account)

2. Authentication method - You can either use your own org account or use a SAS token or service principal account. I would suggest service principal account as the best practice.

3. In the next screen, add a specific container as the path and then click on create.

4. Once added, it will be shown under unidentified folder as a shorcut with the storage account name. You will be able to see all the files underneath that.

5. To use it in Spark notebook, add the corresponding lakehouse in the notebook and then navigate to the file you want to load and click on the '...' and then you will be able to see this option for loading the data using Spark or Pandas. You can use either of them and the code for loading them in a dataframe will automatically be generated.

This is the Pyspark code that was generated for the 1.csv file I had shown

spark.read.format("csv").option("header","true").load("Tables/satrainingdc/week3/files/1.csv")

jcamilo1985 · ‎01-31-2024

Thank you very much for the reply @govindarajan_d

govindarajan_d · ‎01-31-2024

Hi @jcamilo1985 ,

Happy to help!

AndyDDC · ‎01-26-2024

Hi @jcamilo1985 you could use a shortcut created in a lakehouse to work with the data in the azure data lake gen2 account https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts

or you could just reference the data lake location using the abfss url in the notebook

jcamilo1985 · ‎01-31-2024

@AndyDDC First of all, thank you for coming to my help.
Does it have a link that explains how the reference is made.

Anonymous · ‎01-29-2024

Hi @jcamilo1985
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

read Azure Data Lake from notebook fabric

@Anonymous

Helpful resources

Fabric Monthly Update - September 2025

FabCon is coming to Atlanta

read Azure Data Lake from notebook fabric

@Anonymous

Helpful resources

Fabric Monthly Update - September 2025