A year ago, I was developing a solution for collecting and analyzing usage data of a Power BI premium capacity. There were not only some simple log files, but also data that I had to convert into a slowly changing dimension type 2. Therefore, I decided for the following architecture - Azure Data Factory pipelines collect data on daily basis, the raw data is stored in a data lake forever, and the cleansed data is then moved to a SQL Server database. Because the data is stored on a SQL Server, I can use incremental refresh in Power BI service. It works perfectly.
But the times are changing, new requirements are coming, and I have found a new straight way, how to load logs directly from the data lake into a Power BI dataset. And all that incrementally! Where I need a SCD, a SQL Server database will stay in the middle, whereas for all other data I can use a new mindset.
In this article I want to show you how you can load parquet files stored in an Azure data lake direct into your Power BI dataset. It involves an incremental refresh and an ETL process, too!
Read more...