Where Do I Start With Consuming Source Files That can be Queries in Fabric Data Warehouse

dnauflett · ‎06-25-2024

Hello,

I know how to do this in AWS, but want to make sure I start on the correct path with MicroSoft Fabric Warehouse. We receive 520 files from our core system everyday. We want to create a persistent stage in Azure Fabric Warehouse, so our DGO team can query this data historically (we only Stage one days worth of data in SQLServer for Warehouse processing). We have to manually load the day they want to inspect into SQLServer, which is painful for everyone.

The files themsoelves do NOT have column headers nor do they all have Dates in them to denote the Business Effective Date of the data within. We do have meta-data for each table to let a process know the struture. We can easily transform this meta-data so it can be used by Fabric so it can know the struture for each file. The names of the individual file will be the same everyday. We have an internal process for moving files (move-it) within our company that can write to Azure.

In another life, we used AWS S3 to place each days file in a dictory (Date Named: 20240625 as a example) and used the Date on the folders as the Business Effective Date. We used a crawler that crawled the directory and made new data available. Was looking to do something similar. In Azure Fabric. Just dont know where to start with Azure for accomplishing this.

So my qustions. Within the Fabric eco system what is the best strategy for getting those daily files loaded into Fabric so they can be queried accross multiple days.

frithjof_v · ‎06-25-2024

What BI tool will you use? (Power BI?)

Will you query the files directly from your BI tool? Or will you load the data from the files into delta tables first, and then query the delta tables from your BI tool?

Could you write the contents of your files into a Delta table (by using Notebook), and use Notebook to add the Business Effective Date as a column in the Delta table?

View solution in original post

dnauflett · ‎06-25-2024

Thanks frithjof_v,

From what I was reading, I think you confirmed that I need to park the actual files in Lakehouse. I was just wondering how to get the meta-data layer on top of them so we can query lake house and the files using our BI tool.

frithjof_v · ‎06-25-2024

What BI tool will you use? (Power BI?)

Will you query the files directly from your BI tool? Or will you load the data from the files into delta tables first, and then query the delta tables from your BI tool?

Could you write the contents of your files into a Delta table (by using Notebook), and use Notebook to add the Business Effective Date as a column in the Delta table?

dnauflett · ‎07-01-2024

We are going to continue using MicroStrategy. After months of trying and multiple high end consultants, we can't get Power BI to do Multi Fact Table analysis.

The plan is to query the data from MicroStrategy. I'm very new to the MicroSoft eco system and did not know how to get started. Not having File headers is definitly a complicating matter. I'm a little familur with Pipelines, but never used Notebook. Would I have to write a Notebook job for each file? or can it use Meta-data? I'm going to start reading about Note book.

frithjof_v · ‎07-01-2024

I'm not quite sure what is meant by Multi Fact Table analysis.

If it is the same as Multi Fact Star Schema, then it should be quite straightforward to work with it in Power BI. https://community.fabric.microsoft.com/t5/Desktop/creating-a-multi-fact-star-schema/m-p/272010

I don't have experience with Microstrategy.

I think it should be possible with only one notebook for all the files. The notebook should be able to loop through the files, and do transformations, adding schema, column headers, etc. by using python code. I don't have super much experience with notebook, and I am not sure what a crawler is, but I think notebook should be good for working with manipulating files in directories, adding schema (column headers) and writing to tables like delta tables.

Notebook can be run on a schedule, or triggered by a Data pipeline.

If you can share some screenshots or illustrations of the files and the data structures you will be working with, it could be easier to understand and give suggestions.

v-cboorla-msft · ‎06-26-2024

Hi @dnauflett

Thanks for using Microsoft Fabric Community.

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond back with the more details and we will try to help.

Thank you.

v-cboorla-msft · ‎06-30-2024

Hi @dnauflett

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. If you have any question relating to the current thread, please do let us know and we will try out best to help you. In case if you have any other question on a different issue, we request you to open a new thread.

Thank you.

dnauflett · ‎07-01-2024

Sorry for my slow responses, I'm trying to figure out how to get started. I'm going to do some reading on Notebook to see if we could use it. We have about 520 files that we need to expose daily. I was hoping Fabric had some type of crawler that could crawl a directory struture and use Meta-data to define and expose the data for our BI tool.

frithjof_v · ‎06-25-2024

I've never done something like that before, but if you want to use files and directories I think you need to use the Lakehouse (for storing and working with the directories and files) instead of the Warehouse.

Sounds like a Notebook (or a combination of Notebook and Data pipeline) could be relevant tools.

Or Spark job (I never tried Spark job).

Where Do I Start With Consuming Source Files That can be Queries in Fabric Data Warehouse