Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I want to programmatically fetch storage mode and size of a semantic model using fabric notebook. Can anyone help me out with the rest api/python module available.
Solved! Go to Solution.
Hi @akashparthe ,
Thanks for the reply from lbendlin .
The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.
import sempy.fabric as fabric
import pandas as pd
import datetime
def analyze_model_size(drop_default_datasets=True):
"""
Author : Sandeep Pawar | fabric.guru | 02-29-2024
This function scans the active premium workspaces in the entire tenant and calculates the total column size
of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
auto datetime tables, columns with double data type and column hierarchy size.
Note that user hierarchy and relationship size are not used in memory size calculation.
args: drop_default_datasets=True
set to False to drop default datasets from the final dataframe. Default is True.
"""
def get_ds_size(workspace, dataset):
all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
size = sum(all_columns["Total Size"])
calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
num_tables = all_columns["Table Name"].nunique()
return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables
#Get active capacities only
active_capacities = fabric.list_capacities().query('State == "Active"')
#Premium and Fabric workspaces
ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables',
'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']
catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
catalog[col_list] = pd.NA
for i, row in catalog.iterrows():
try:
size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables',
'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
]
#Excpetion to handle default datasets which do not have XMLA endpoint
except Exception:
continue
for col in col_list:
catalog[col]=pd.to_numeric(catalog[col], errors='coerce')
if drop_default_datasets:
#default datasets will show NaN values. To include default, set drop_default_datasets=False
catalog.dropna(inplace=True)
catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
catalog.reset_index(drop=True, inplace=True)
return catalog
df = analyze_model_size()
display(df)
As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @akashparthe ,
Thanks for the reply from lbendlin .
The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.
import sempy.fabric as fabric
import pandas as pd
import datetime
def analyze_model_size(drop_default_datasets=True):
"""
Author : Sandeep Pawar | fabric.guru | 02-29-2024
This function scans the active premium workspaces in the entire tenant and calculates the total column size
of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
auto datetime tables, columns with double data type and column hierarchy size.
Note that user hierarchy and relationship size are not used in memory size calculation.
args: drop_default_datasets=True
set to False to drop default datasets from the final dataframe. Default is True.
"""
def get_ds_size(workspace, dataset):
all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
size = sum(all_columns["Total Size"])
calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
num_tables = all_columns["Table Name"].nunique()
return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables
#Get active capacities only
active_capacities = fabric.list_capacities().query('State == "Active"')
#Premium and Fabric workspaces
ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables',
'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']
catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
catalog[col_list] = pd.NA
for i, row in catalog.iterrows():
try:
size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables',
'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
]
#Excpetion to handle default datasets which do not have XMLA endpoint
except Exception:
continue
for col in col_list:
catalog[col]=pd.to_numeric(catalog[col], errors='coerce')
if drop_default_datasets:
#default datasets will show NaN values. To include default, set drop_default_datasets=False
catalog.dropna(inplace=True)
catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
catalog.reset_index(drop=True, inplace=True)
return catalog
df = analyze_model_size()
display(df)
As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
6 | |
4 | |
3 | |
3 | |
3 |