Securely Passing a Long String to a Paginated Repo...

SpartaBI · ‎07-18-2023

Implementing row level security (RLS) in a Power BI Paginated Report requires manually adding predicates to filter the data. In a Power BI Embedded App Owns Data scenario, the standard flow to accomplish RLS is as follows:

The web backend generates an Embed Token, with a desired string value in the username field, and passes it to the web frontend. The encryption of the Embed Token is managed by Microsoft.
The web frontend passes the Embed Token to the Paginated Report.
The Paginated Report incorporates the desired string value in the queries sent to the data source (using the User!UserID build-in field).

There is a caveat though - the username field cannot exceed 256 characters. While this upper bound may be raised in the future, there are cases where a much longer string is beneficial. One such case is passing a list of RLS filter values to a Paginated Report which uses Azure Data Explorer (ADX) as a data source. Some background is required to explain the reasoning.

ADX is a big data analytical PaaS offering. It runs on the Kusto engine which provides impressive performance, auto optimizations, and scale-out support.
Each ADX cluster also provides management instances which orchestrate the data ingestion processes to the Kusto engine (in streams or batches). Ingested data can be wrangled using Data Mappings and Update Policies.
The native Kusto Query Language (KQL) is user friendly and elegant. It provides many useful built-in functions for handling semi-structured data, performing advanced statistical analytics, and rendering results.
Besides KQL, ADX can be queried using a subset of the T-SQL syntax (obviously, the recommended approach is to use the native KQL). In fact, ADX supports the Tabular Data Stream (TDS) protocol, thus it can present itself as a Microsoft SQL Server DB. This is how a Paginated Report can query ADX.
The Kusto engine and the core ADX capabilities are also available, in a SaaS-like flavor, as part of Microsoft Fabric Real-Time Analytics.

Kusto stores its data in a partitioned columnar format. This is a common concept with varied implementations and augmentations.
In Kusto terminology, the main data-holding units are called Extents. In addition to compressed data structures and metadata, Extents hold an index for each column (including detailed indexes for dynamic data type columns which represent semi-structured data).
Over time, configurable automatic backend processes maintain the Extents:

Combining small Extents using Merge (recomputing indexes) or Rebuild (recompressing from scratch)
Moving Extents from hot data cache (memory/local SSD/Azure premium SSD managed disks) to cold data cache (Azure general purpose v2 blob storage hot tier) or vice versa
Dropping old or replaced Extents.

Technically speaking, Extents are immutable. Thus, ingesting data to Kusto always involves an append operation.
Deletion of a subset of an Extent’s records is accomplished by recomputing the whole Extent and transactionally switching it with the original version (this is referred to as Data Purge in Kusto terminology). A more performant option is to use the Soft Delete feature. Soft deletion creates a new Extent, with minimal changes, using pointers to the original Extent and an additional column indicating a soft deletion flag (this is a similar feature to the Delta Lake Deletion Vectors optimization).
There is no update records command. Updating records involve performing separate append and delete operations which require planning to be executed transactionally. Materialized Views can be used to easily automate update records processes transactionally. However, they have some limitations and a query performance/latency cost.

Simply put, deleting or updating records in Kusto is a resource intensive (and potentially non-trivial) task. It should be planned carefully and, if acceptable, be avoided. This is the main reason for preferring to pass the list of RLS filter values to a Paginated Report.
Permissions change over time and often require many-to-many types of logic. Implementing an RLS table (i.e., a mapping of user IDs to RLS values used as foreign keys in the fact tables) in Kusto can be cumbersome and error prone. Even more so at streaming ingestion scenarios that may require tombstone logic to imitate deletes.

Another reason for preferring to pass the list of RLS filter values to a Paginated Report has to do with performance.
Relying on an RLS table requires the use of Exists logic (i.e., Semi-Join/In) at query time.
Depending on the size of the tables and the complexity of the query, this can lead to non-neglectable performance overhead. Populating the list of RLS values in the query string is one way to achieve applicative session level caching.

So how can we securely pass a long string value from the web backend to a Paginated Report?
We can use common techniques by employing the native capability of Paginated Reports to run Visual Basic (VB) code.

A mandatory disclaimer: You should consult a professional digital security expert before attempting to implement custom security related solutions, like those described in this blog post.

One technique is to use hashing. A standard hashing algorithm takes any string as an input and produces a short, fixed length string as an output. Additionally, it has the following characteristics:

A given input string will always produce the same output
There is a small chance of collision (i.e., two different inputs which produce the same output). This possibility should be considered.
The process is not reversible. Given the hashing algorithm and the output value - the input string cannot be reproduced (except by hashing arbitrary input strings and hoping to get a match).

For example, sha-256 is a commonly used hashing algorithm. It has a practically non existing collision chance, and its output can be expressed as 64 hexadecimal characters.

In this technique, the web backend uses a known hashing algorithm and passes the following to the web frontend:

The non hashed long string value in plain text. This is treated as a simple report parameter
The hashed long string value. As the value of the username field through the encrypted Embed Token.

The Paginated Report employs VB to hash the plain text value using the known hashing algorithm. It then verifies the result by comparing it to the hashed value.
Security wise, this technique may be used only if the long string value does not contain sensitive information and the implications of a possible collision are acceptable.

Another technique is to use symmetric encryption. A standard symmetric encryption algorithm takes any string as an input and produces an encrypted output based on a set of parameters (a key and other parameters specific to the algorithm). Additionally, it has the following characteristics:

A given input string, and set of parameters, will always produce the same output
There is no chance of collision for the same set of parameters
The process is reversible. Given the symmetric encryption algorithm, the set of parameters, and the output value - the input string can be decrypted.

For example, aes-256-cbc is a commonly used symmetric encryption algorithm which requires a Key and an Initialization Vector (IV) parameters.

In this technique, the web backend uses a known symmetric encryption algorithm with randomly generated parameter values. It passes the following to the web frontend:

The encrypted long string value. This is treated as a simple report parameter
The concatenation of the parameter values. As the value of the username field through the encrypted Embed Token.

The paginated report extracts the parameter values (they can be separated based on number of characters or a predefined delimiter). It then employs VB to decrypt the long text value using the known symmetric encryption algorithm.

The two techniques can also be combined by passing an encrypted long string value that should be decrypted before comparing it to a hashed value.

Securely Passing a Long String to a Paginated Report

Get ready for Fabric Data Days Contests!

Re: Fabric Data Days | Get Certified

Dataviz Contest for Pros | Fabric Data Days

QuickViz Challenges | Fabric Data Days

Student Dataviz Contest | Fabric Data Days

Fabric Data Days starts November 4th!