Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
davidz106
Helper III
Helper III

Building flat file pipeline

Hi everyone,

I've recently come across Microsoft Fabric and its connected services, and I'm intrigued by the functionality where you can simply drop files in an explorer interface, and they become available in OneLake, ready for use in tools like Power BI.

Our organization deals with a lot of semi-structured flat files (mainly CSVs and XLSX). Currently, we rely on Python scripts to parse these files into a structured table format and move them to sql db as doing so. However, current Luigi python pipelines are not well maintained and transition to other (MS based) tool would be very benefitical.

I'm interested in migrating this process to the Fabric ecosystem in a way that maintains the simplicity shown in the promo videos, but also ensures that file parsing happens automatically to deliver the formatted data we need in OneLake.

I'm completely new to Fabric and Microsoft's modern data services, so I would appreciate any insights on:

  1. Which specific Microsoft services within the Fabric framework are best suited for this task?
  2. What are the recommended first steps to start this integration?

I'm aiming for a solution that minimizes manual intervention and maximizes efficiency in making these semi-structured files readily available and usable for our analytics and reporting tools.


Would love if somebody could be willing to prepare some demo on this subject, maybe with some very simple minimal example dataset. We can arrange it as a paid one-time job.

4 REPLIES 4
Anonymous
Not applicable

Hi @davidz106 ,

Thanks for using Fabric Community.

In order to guide you better I have few queries,
1. Can you please share some dummy data or sample files ( 2 or 3 )
2. How you are expecting data to load in SQL DB? I would like to understand the sink table structure.
3. How big is the data?
4. Can you please share challenges in your current python pipelines implementation?
5. Are you looking for real time triggers? Like whenever we upload a file in One lake File explorer it starts processing the file and loads to SQL DB.

Please help me sharing your response, so that I can guide you better.


Hi v-gchenna-msft,

 

1. and 3. Sample data could be a simple 20x20 table with one empty row. By 'parsing' I mean operation like skipping empty rows in df or such. 

2. We have thousands of files so uploading them directly into PBI is not an option, therefore a data pipeline is needed. Currently parsing functions are already written in python but we could use other tool if needed.
4. Solutions are ok but a more robust approach is wanted, preferably inside MS enviroment since we would then use this database as input to PBI. 

5. This is the gist of my question and exactly what I or we are aiming for, you put it nicely: we upload a file in One lake File explorer it starts processing the file and loads to SQL DB. 

A tutorial on that would be very welcome. 

Anonymous
Not applicable

Hi @davidz106 ,

We haven’t heard from you on the last response and was just checking back to see if you got a chance to look into above reply. 

Anonymous
Not applicable

Hi @davidz106 ,

We haven’t heard from you on the last response and was just checking back to see if you got a chance to look into above reply. 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.