Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
akashparthe
Regular Visitor

How to fetch storage mode and size of storage mode of a semantic model using fabric notebook

I want to programmatically fetch storage mode and size of a semantic model using fabric notebook. Can anyone help me out with the rest api/python module available.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @akashparthe ,

 

Thanks for the reply from lbendlin .

 

The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.

import sempy.fabric as fabric
import pandas as pd
import datetime


def analyze_model_size(drop_default_datasets=True):
    """
    Author : Sandeep Pawar |  fabric.guru  |  02-29-2024

    This function scans the active premium workspaces in the entire tenant and calculates the total column size
    of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
    identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
    auto datetime tables, columns with double data type and column hierarchy size.

    Note that user hierarchy and relationship size are not used in memory size calculation.   

    args: drop_default_datasets=True
          set to False to drop default datasets from the final dataframe. Default is True.  

    """

    def get_ds_size(workspace, dataset):
        all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
        size = sum(all_columns["Total Size"])
        calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
        calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
        auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
        float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
        hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
        num_tables = all_columns["Table Name"].nunique()

        return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables

    #Get active capacities only
    active_capacities = fabric.list_capacities().query('State == "Active"')
    #Premium and Fabric workspaces
    ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
    premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
    datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
    col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables', 
             'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']    
    catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
    catalog[col_list] = pd.NA

    for i, row in catalog.iterrows():
        try:
            size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
            catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables', 
                            'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
                round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
                round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
            ]
        #Excpetion to handle default datasets which do not have XMLA endpoint
        except Exception: 
            continue
    for col in col_list:
        catalog[col]=pd.to_numeric(catalog[col], errors='coerce')

    if drop_default_datasets:
        #default datasets will show NaN values. To include default, set drop_default_datasets=False
        catalog.dropna(inplace=True) 

    catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
    catalog.reset_index(drop=True, inplace=True)
    return catalog

df = analyze_model_size()
display(df)

vhuijieymsft_0-1734936496119.png

 

As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.

vhuijieymsft_1-1734936496126.png

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @akashparthe ,

 

Thanks for the reply from lbendlin .

 

The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.

import sempy.fabric as fabric
import pandas as pd
import datetime


def analyze_model_size(drop_default_datasets=True):
    """
    Author : Sandeep Pawar |  fabric.guru  |  02-29-2024

    This function scans the active premium workspaces in the entire tenant and calculates the total column size
    of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
    identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
    auto datetime tables, columns with double data type and column hierarchy size.

    Note that user hierarchy and relationship size are not used in memory size calculation.   

    args: drop_default_datasets=True
          set to False to drop default datasets from the final dataframe. Default is True.  

    """

    def get_ds_size(workspace, dataset):
        all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
        size = sum(all_columns["Total Size"])
        calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
        calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
        auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
        float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
        hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
        num_tables = all_columns["Table Name"].nunique()

        return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables

    #Get active capacities only
    active_capacities = fabric.list_capacities().query('State == "Active"')
    #Premium and Fabric workspaces
    ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
    premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
    datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
    col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables', 
             'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']    
    catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
    catalog[col_list] = pd.NA

    for i, row in catalog.iterrows():
        try:
            size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
            catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables', 
                            'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
                round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
                round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
            ]
        #Excpetion to handle default datasets which do not have XMLA endpoint
        except Exception: 
            continue
    for col in col_list:
        catalog[col]=pd.to_numeric(catalog[col], errors='coerce')

    if drop_default_datasets:
        #default datasets will show NaN values. To include default, set drop_default_datasets=False
        catalog.dropna(inplace=True) 

    catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
    catalog.reset_index(drop=True, inplace=True)
    return catalog

df = analyze_model_size()
display(df)

vhuijieymsft_0-1734936496119.png

 

As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.

vhuijieymsft_1-1734936496126.png

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

lbendlin
Super User
Super User

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.