Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
davidz106
Helper III
Helper III

Building flat file pipeline

Hi everyone,

I've recently come across Microsoft Fabric and its connected services, and I'm intrigued by the functionality where you can simply drop files in an explorer interface, and they become available in OneLake, ready for use in tools like Power BI.

Our organization deals with a lot of semi-structured flat files (mainly CSVs and XLSX). Currently, we rely on Python scripts to parse these files into a structured table format and move them to sql db as doing so. However, current Luigi python pipelines are not well maintained and transition to other (MS based) tool would be very benefitical.

I'm interested in migrating this process to the Fabric ecosystem in a way that maintains the simplicity shown in the promo videos, but also ensures that file parsing happens automatically to deliver the formatted data we need in OneLake.

I'm completely new to Fabric and Microsoft's modern data services, so I would appreciate any insights on:

  1. Which specific Microsoft services within the Fabric framework are best suited for this task?
  2. What are the recommended first steps to start this integration?

I'm aiming for a solution that minimizes manual intervention and maximizes efficiency in making these semi-structured files readily available and usable for our analytics and reporting tools.


Would love if somebody could be willing to prepare some demo on this subject, maybe with some very simple minimal example dataset. We can arrange it as a paid one-time job.

4 REPLIES 4
Anonymous
Not applicable

Hi @davidz106 ,

Thanks for using Fabric Community.

In order to guide you better I have few queries,
1. Can you please share some dummy data or sample files ( 2 or 3 )
2. How you are expecting data to load in SQL DB? I would like to understand the sink table structure.
3. How big is the data?
4. Can you please share challenges in your current python pipelines implementation?
5. Are you looking for real time triggers? Like whenever we upload a file in One lake File explorer it starts processing the file and loads to SQL DB.

Please help me sharing your response, so that I can guide you better.


Hi v-gchenna-msft,

 

1. and 3. Sample data could be a simple 20x20 table with one empty row. By 'parsing' I mean operation like skipping empty rows in df or such. 

2. We have thousands of files so uploading them directly into PBI is not an option, therefore a data pipeline is needed. Currently parsing functions are already written in python but we could use other tool if needed.
4. Solutions are ok but a more robust approach is wanted, preferably inside MS enviroment since we would then use this database as input to PBI. 

5. This is the gist of my question and exactly what I or we are aiming for, you put it nicely: we upload a file in One lake File explorer it starts processing the file and loads to SQL DB. 

A tutorial on that would be very welcome. 

Anonymous
Not applicable

Hi @davidz106 ,

We haven’t heard from you on the last response and was just checking back to see if you got a chance to look into above reply. 

Anonymous
Not applicable

Hi @davidz106 ,

We haven’t heard from you on the last response and was just checking back to see if you got a chance to look into above reply. 

Helpful resources

Announcements
Oct Fabric Update Carousel

Fabric Monthly Update - October 2024

Check out the October 2024 Fabric update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.