Solved: Re: Download Lakehouse Files

VIvMouret · ‎01-27-2025

Hi everyone!

From a notebook in the default Spark environment, I save a table in the same lakehouse as a file
I need to create an application that retrieves existing files from the Lakehouse files, in order to extract them, download them, etc...

To do this, I created and monitored the creation of an API for GraphQL, and the creation of a single-page application using React NodeJS.

Am I missing something to achieve my goal here, or do I have what I need?
While digging around in the topics, I came across these two potential solutions :

Azure Data Lake Storage Gen2
virtual network or local data gateways

What do you think I should use ? And if I'm wrong, what should I do instead ?
Thank you in advance !

VIvMouret · ‎01-28-2025

Update of my problem:
All the technologies I've seen above (unless I'm mistaken) only serve to update and send new data to a Lakehouse, and that's not what we're looking for here.

To answer my question, which is :
I need to make the files I've created available to colleagues

Well, to do this, I changed direction: OneLake

I had to assign specific and restricted roles, install OneLake on all the machines so that each PC then had access to the desired folder
By pushing the files into a new Lakehouse, a new workspace and managing the roles, I was able to export the data.

However, I'm only keeping all this for development, I wouldn't recommend it for production.

I have other ideas for putting the export of Lakehouse files into production, such as with Power Pages, but as I can't find any way of extracting the files at the moment, the idea has been put off.

If anyone comes up with an alternative method, don't hesitate !

View solution in original post

VIvMouret · ‎01-28-2025

Update of my problem:
All the technologies I've seen above (unless I'm mistaken) only serve to update and send new data to a Lakehouse, and that's not what we're looking for here.

To answer my question, which is :
I need to make the files I've created available to colleagues

Well, to do this, I changed direction: OneLake

I had to assign specific and restricted roles, install OneLake on all the machines so that each PC then had access to the desired folder
By pushing the files into a new Lakehouse, a new workspace and managing the roles, I was able to export the data.

However, I'm only keeping all this for development, I wouldn't recommend it for production.

I have other ideas for putting the export of Lakehouse files into production, such as with Power Pages, but as I can't find any way of extracting the files at the moment, the idea has been put off.

If anyone comes up with an alternative method, don't hesitate !

nilendraFabric · ‎01-27-2025

@VIvMouret

Try this approach instead of Graphql :

Use the Microsoft Fabric REST API. This allows you to programmatically access and download files by authenticating your app via Microsoft Entra (Azure AD). The process involves:

App Registration: Register your application in Microsoft Entra ID, assign permissions (e.g., Lakehouse.Read.All), and obtain credentials (Client ID, Client Secret, Tenant ID).
Authentication: Use these credentials to obtain an access token for API requests.
File Access: Use the Fabric REST API to list files in the Lakehouse and construct URLs for downloading them.

https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-api

Something like this :

import requests

file_url = "https://onelake.dfs.fabric.microsoft.com/<workspace>/<lakehouse>.lakehouse/path/to/file"
headers = {"Authorization": f"Bearer {token}"}

response = requests.get(file_url, headers=headers)
if response.status_code == 200:
with open("downloaded_file", "wb") as file:
file.write(response.content)
print("File downloaded successfully!")
else:
print(f"Failed to download file: {response.status_code}")

See if this is working

Thanks

VIvMouret · ‎01-28-2025

@nilendraFabric

I've already followed your first paragraph as a POC

To give you an idea of my code, I made this documentation at the beginning:

https://learn.microsoft.com/en-us/fabric/data-engineering/connect-apps-api-graphql
Then I moved on to this one as I was asked along the way:
https://learn.microsoft.com/en-us/entra/identity-platform/tutorial-single-page-app-react-prepare-spa...

The documentation link you sent me doesn't include a query for downloading a file from a Lakehouse,
And even less in Python, I'm in NodeJS...

nilendraFabric · ‎01-27-2025

@VIvMouret Please accept the solution if this resolves your query, as it will help community to find the answer quickly

nilendraFabric · ‎01-27-2025

Hi @VIvMouret

In Microsoft Fabric, data in your Lakehouse is automatically stored in OneLake (backed by Azure Data Lake Storage Gen2). Since you already created a GraphQL API layer, you should be able to query and download files through that endpoint. Your single-page React application can call the GraphQL API to list, extract, and download Lakehouse files.If your application and the API run within Microsoft Fabric or an environment that has direct access to Fabric resources, you do not need additional services.

https://learn.microsoft.com/en-us/fabric/data-engineering/connect-apps-api-graphql

https://community.fabric.microsoft.com/t5/Data-Science/How-to-get-lakehouse-files-into-Azure-Functio...

OneLake provides open access to Fabric items using ADLS Gen2-compatible APIs:

Use OneLake URIs to reference files in your Lakehouse, e.g., https://onelake.dfs.fabric.microsoft.com/<workspace>/<item>.lakehouse/<path>/<fileName>

Assign appropriate permissions to your application in Azure (e.g., "Storage Blob Data Reader" role for ADLS Gen2)

Thanks

VIvMouret · ‎01-27-2025

thank you for your quick reply!
I'm going to test and try with ADLS Gen-2 as I already have the GraphQL API ready.

nilendraFabric · ‎01-27-2025

Thanks @VIvMouret please keep me posted. This is intresting to learn about this usecase.

VIvMouret · ‎01-27-2025

I've just tested your code, but I can't access the "files" properties

When I search in "Get data", I don't have direct access to the Lakehouse files
Will I normally be able to view the files in the Lakehouse?

Because I can't see them..

I still need to check that I have read-only permissions for the API

nilendraFabric · ‎01-27-2025

Hi @VIvMouret

I have tried the Graphql query too, its not working. So you are correct here that it is not supported, I am trying few other things will share soon.

And it seems like you have only access to tables from GraphQL api , not to files. SO we have to figure out different approach to query files.

Thanks

Download Lakehouse Files

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Download Lakehouse Files

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025