Solved: How to read excel file in a notebook in fabric wi...

poojitha_yeluru · ‎04-30-2024

In Databricks to read a excel file we will use com.crealytics.spark.excel by installing libraries . In case of Fabric notebook how can we read an excel file with out using data pipeline

Expiscornovus · ‎04-30-2024

Hi @poojitha_yeluru,

Yes, it should be possible to install libraries into Fabric as well. You can manage them via an environment, https://learn.microsoft.com/en-us/fabric/data-engineering/environment-manage-library

Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

View solution in original post

Anonymous · ‎05-03-2024

Hi @poojitha_yeluru
As mentioned by @Expiscornovus we can use Environments in Fabric. and also Pandas library to read an excel file. These are the available options for now.

Please let me know if you have any further questions.

View solution in original post

Calebstimothy · ‎12-06-2024

It make Microsoft fabric stand out as the ultimate analytics tool for Data Engineers

_maclura_ · ‎12-04-2024

The easiest general-purpose solution I found until now to process Excel files in Fabric using only notebooks (PySpark) is illustrated below.

Conditions:

- The Excel source file has been uploaded in a lakehouse not attached to the notebook.
- A specific range in an Excel spreadsheet must be loaded into a dataframe for processing.
- The processed dataframe must be saved in a delta table in another lakehouse not attached to the notebook.

Strategy:

Use the Pandas Python library to read the source Excel file content.

Challenges:

The Pandas library (as well as other binary I/O activities in Python, like copy an Excel File from Sharepoint to lakehouse, but this is another topic) can't manage absolute lakehouse "abfss" file paths, only locally mounted filesystem paths.

Solution:

Mount the source lakehouse in the notebook local filesystem (use Notebookutils)
Get the detailed locally mounted filesystem full path (use Notebookutils)
Read the Excel file in a Pandas dataframe (use Pandas)
Do whatever you have to do with that dataframe (use Pandas)
Convert the Pandas dataframe in a Spark dataframe (use pyspark.sql.SparkSession)
- because, in my case, Pandas can't write a dataframe in a delta table using an absolute path
Write the Spark dataframe to the destination lakehouse e.g. in a delta table (use pyspark.sql.SparkSession)

Details:

import notebookutils as nu  # required to mount lakehouse filesystem for Pandas
import pandas as pd
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate() # you can omit this row since the shell automatically creates the session in the variable spark for users.

# Source parameters
src_ws_id  = 'any_workspace_id1'
src_lh_id  = 'any_lakhouse_id1'
src_folder = 'any_subfolder' # (in the form 'subdir/.../subdir' or empty string '')
src_file   = 'any_Excel_file_name'

# Destination parameters
dst_ws_id  = 'any_workspace_id2'
dst_lh_id  = 'any_lakhouse_id2'
dst_table  = 'any_table_name'

def mount_lakehouse(workspace_id, lakehouse_id, mount_point='/mnt'):
    """
    Mount a remote lakehouse filesystem to the local filesystem of the notebook.

    :param workspace_id: ID of the lakehouse workspace.
    :param lakehouse_id: ID of the lakehouse.
    :param mount_point: Mount point in the local filesystem (optional, default is '/mnt').
    :return: The mount point in the local filesystem.
    """
    try:
        # Construct the URL of the remote filesystem
        remote_url = f'abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}'
        
        # Mount the remote filesystem to the local filesystem
        nu.fs.mount(remote_url, mount_point)
    except Exception as e:
        # Handle exceptions and exit the notebook in case of an error
        exit_value = f'Error type: {type(e).__name__}\nError description: {e}'
        nu.notebook.exit(exit_value)
    else:
        print('Lakehouse filesystem successfully mounted!')
        return mount_point

def unmount_lakehouse(mount_point='/mnt'):
    """
    Unmount a lakehouse filesystem from the local filesystem.

    :param mount_point: Mount point in the local filesystem (optional, default is '/mnt').
    :return: Result of the unmount operation.
    """
    try:
        # Unmount the filesystem from the local mount point
        result = nu.fs.unmount(mount_point)
    except Exception as e:
        # Handle exceptions and exit the notebook in case of an error
        exit_value = f'Error type: {type(e).__name__}\nError description: {e}'
        nu.notebook.exit(exit_value)
    else:
        print('Lakehouse filesystem successfully unmounted!')
        return result

# Mount source lakehouse (because Pandas can't read abfss absolute paths)
src_mount_point = mount_lakehouse(src_ws_id, src_lh_id, src_mount_point)

# Get full local fs path for source lakehouse
src_path = f'{nu.fs.getMountPath(src_mount_point)}/Files/{src_folder}/{src_file}'

# Read an Excel file into a Pandas dataframe
# in this case I read the first 4 columns in the second sheet skipping the first 4 rows and using the first row (row n#5 because 4 are skipped) as header
df = pd.read_excel(f'{src_path}', sheet_name = 2, skiprows = 4, usecols = 'A:D')

# at this point do what you have to do with this dataframe
# and then...

# Convert pandas df in spark df
spark_df = spark.createDataFrame(df) 

# Construct the URL of the remote filesystem
delta_table_abfss_path = f'abfss://{dst_ws_id}@onelake.dfs.fabric.microsoft.com/{dst_lh_id}/Tables/{dst_table}'

# Write the Spark dataframe in a delta table
spark_df.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(delta_table_abfss_path)

# Exit notebook reporting success
exit_value = f'The notebook succeeded!'
nu.notebook.exit(exit_value)

The above solution is a general-purpose solution working in any situation, please make only sure you have permission to access the source and destination lakehouses for write operations.

I hope thi may help

Expiscornovus · ‎04-30-2024

Hi @poojitha_yeluru,

In this thread I have seen somebody use Pandas library for that:

https://community.fabric.microsoft.com/t5/General-Discussion/How-to-load-an-excel-file-from-Sharepoi...

Below is an example

import pandas as pd
df = pd.read_excel("abfss://<id>@onelake.dfs.fabric.microsoft.com/<id>/Files/Excel/TestBook.xlsx")
display(df)

Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

_maclura_ · ‎12-05-2024

Pandas can't read from absolute abfss:// paths.
In fact, in your screenshot we read:

I hope this may help

poojitha_yeluru · ‎04-30-2024

Hi @Expiscornovus ,

Thanks for sharing , I had already gone to above way but pandas is a costly operation. In databricks we will install library ie. (com.crealytics:spark-excel_2.12:0.13.5), is there any way like to install a library in fabric.So that we can read excel file..

Expiscornovus · ‎04-30-2024

Hi @poojitha_yeluru,

Yes, it should be possible to install libraries into Fabric as well. You can manage them via an environment, https://learn.microsoft.com/en-us/fabric/data-engineering/environment-manage-library

Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

poojitha_yeluru · ‎04-30-2024

Hi @Expiscornovus ,
Yes From public library we have to select PYPI and we have to use xlrd it is also pointing to pandas. Is there any other way.

Anonymous · ‎05-03-2024

Hi @poojitha_yeluru
As mentioned by @Expiscornovus we can use Environments in Fabric. and also Pandas library to read an excel file. These are the available options for now.

Please let me know if you have any further questions.

JosueMolina · ‎06-27-2024

I want to add to this by saying that installing libraries as .jar files for Spark use is not supported via Environments. So the previously mentioned library to read excel as Spark is not possible via Environment.

Manage Apache Spark libraries - Microsoft Fabric | Microsoft Learn

How to read excel file in a notebook in fabric without using datapipeline

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

How to read excel file in a notebook in fabric without using datapipeline

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025