The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Greetings, community. I am looking at ways to handle working with subsets of data in Fabric (ideally keeping DirectLake in play) and was curious if anyone has found a solution for parameterizing a Fabric semantic model.
In traditional PBI model scenarios, you'd parameterize your Power Query queries and then alter those parameters via deployment pipelines to align your semantic model with a particular environment/workspace.
I've not found any such behavior supported for Fabric/DirectLake that doesn't involve having to copy filtered data into different workspace lakehouses or something. Does anyone know of a way to achieve something like this in Fabric?
Do deployment rules for semantic models work out for you? https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/create-rules?tabs=new You can connect your direct lake semantic model to another lakehouse from another workspace. That would create the same scenario as you showed in the picture.
If you are looking for a trick to filter out part of the dataset, you may want to look at Row Level Security and see if that can work in your scenario: https://www.datazoe.blog/post/setting-up-rls-on-a-direct-lake-semantic-model
Thanks for the reply, @FabianSchut. The issue is I don't know if/how DirectLake models get parameterized. I did just check and can confirm that it is possible to set a rule for a DirectLake semantic model's server/database (haven't tested that), but a practice with traditional models might be having a query parameter that would filter the model that gets deployed to a workspace. The concept is similar to RLS, but the difference is the data that actually gets propagated would be filtered based on the criteria rather than simply "protected" by RLS.
In cases where data isolation is needed, I'm trying to figure out the most scalable way to handle that in the world of Fabric without duplicating data. While having data stored in different lakehouses will work, that means we'd have to double-store data. We may still have to go that route.
I'm curious about the use case: why you want to work only with a subset of data?
To avoid data duplication, we can use shortcuts, and thus no need for data duplication in dev/test/prod.
To make only a subset of the rows available to the end users, we can use RLS.
The Direct Lake Semantic Model will always need to load the full column height from the underlying Delta table into vertipaq memory. Then it applies RLS on the in-memory data, to limit the number of rows it makes accessible to the end user. That is my understanding.
In Import mode semantic models, we have the option to filter source data in Power Query before it enters the semantic model. We can use the M parameters for that. But Power Query does not exist in Direct Lake semantic models. A direct lake semantic model has no way of telling the source (delta table) "I only need these rows". It can say "I only need these columns", and it always says that (column pruning). But the direct lake semantic model cannot ask the delta table to only return a subset of the rows. So the entire column height will be loaded into the direct lake semantic model's vertipaq memory, for the columns which are required by the rendered report visuals (actually, the columns which are required by the DAX queries produced by the rendered report visuals). And then the row filtering happens when the data is sent from the direct lake semantic model to the report (according to the DAX query and RLS).
If we for some reason wish to only work with a subset of the data in dev workspace in Direct Lake, we would need to apply the filtering in some ETL before the data enters the dev Lakehouse. That would require physical storage of lakehouse data in dev-test-prod, if we want to work with different subsets in dev-test-prod.
If we don't want to duplicate stored data, we can use shortcuts instead. Then we can store the data in just one place, and use the same physical data in dev-test-prod through shortcuts. But we cannot filter shortcuts. So then we would always work with the full dataset in the Lakehouse. But we will avoid duplication of stored data.
Bottom line: this depends on how our Lakehouse (or Warehouse) is handled in dev-test-prod.
Direct lake semantic models don't have Power Query, so they will load the number of rows that exist in the connected Lakehouse (or Warehouse) into the semantic model.
But the Direct Lake semantic model will only load the specific columns it needs, as requested by DAX queries.
(I don't have experience with Direct Lake + Deployment pipelines, so I'm just talking about the difference between Import mode and Direct Lake in general).