Solved: Large Volume - Flat file cleaning and adding to da...

jonsteele · ‎06-13-2025

I am attempting to develop a more efficient workflow to handle a large volume of flat files (CSV) that will enable robust data cleaning and validation before adding them to a shared Power BI model and dashboard. I have not been able to find a suitable solution and wanted to reach out here to see if anyone had some suggestions. The main limitation I have is that my licensing is at the Pro level, and I am currently unable to upgrade to premium.

Once a month, I receive 194 CSV files containing bandwidth data from various entities, similar to the table below. Each of the CSV files I receive are approximately 10,000 rows, roughly 1.8 million rows every month.

Date Time	Date Time(RAW)	Traffic Total (Volume)	Traffic Total (Volume)(RAW)	Traffic Total (Speed)	Traffic Total (Speed)(RAW)	Traffic In (Volume)	Traffic In (Volume)(RAW)	Traffic In (Speed)	Traffic In (Speed)(RAW)
5/31/2025 11:55:00 PM - 12:00:00 AM	45809.25	160 MB	167972838	4.48 Mbit/s	559928.1243	24 MB	25264881.19	0.67 Mbit/s	84219.0779
5/31/2025 11:50:00 PM - 11:55:00 PM	45809.24653	95 MB	99420075	2.65 Mbit/s	331422.3448	29 MB	30314714.72	0.81 Mbit/s	101055.7861

There are a few differences between the files depending on the source, but for the most part, I have been able to consolidate the files into three groups and use power query import from folder to process multiple files at a time. There is also no data in the tables that Identify the entity the data is associated with, so I have created an additional bridge table that will connect part of the filename for the entity data as needed to add it to the final table. These report are a scheduled/automated reports so the filenames generally stay consistent however there are times where they make changes and the bridge table to accomodate the file name changes.

I currently begin with power query in excel to load the folder and process the files and load to data model. Once I am sure that the data has been fully verified I will then use DAX Studio to export a CSV file of the clean data table and add the CSV to a folder that gets pulled into my shared Power BI Model.

I am curious is anyone out there has an alternative method of doing this that could ease the process. I would like to end my reliance on DAX Studio if possible. Perhaps an all Power BI version?

DataNinja777 · ‎06-13-2025

Hi @jonsteele ,

You can streamline your workflow entirely within Power BI and eliminate the need for DAX Studio or Excel preprocessing. Start by using Power BI Desktop’s “Folder” connector to load all CSV files. When connecting, Power Query exposes the file metadata like Name and Folder Path. You can extract the entity identifier using something like:

= Table.AddColumn(Source, "Entity", each Text.BeforeDelimiter([Name], "_"))

or adapt to your specific filename structure. Once the raw files are loaded, apply transformation steps in Power Query to clean and normalize each of the three schema groups. Use conditional logic to manage schema differences:

= Table.TransformColumnTypes(YourTable, {{"Traffic Total (Volume)(RAW)", Int64.Type}})

Use custom columns to parse the “Date Time” field if it needs restructuring or splitting. After transformations, load the cleaned data into the Power BI data model.

Instead of exporting the cleaned data via DAX Studio, you can keep it within the model or publish the PBIX as a dataset to a shared workspace. If you want to enable other reports to reference the clean data, consider publishing this model as a shared dataset or use a Power BI Dataflow to house the logic and make the clean output accessible to other reports. Dataflows work under Pro licenses if everyone is in the same workspace.

To simulate incremental refresh without Premium, you can load your historical files once into one table, and append only the latest month’s files to another table before combining them in Power Query:

= Table.Combine({HistoricalTable, NewMonthTable})

You can also build a small validation page in Power BI itself with visuals that flag missing data, nulls, or duplicate timestamps using DAX like:

Has Nulls =
IF (
    COUNTROWS ( FILTER ( Data, ISBLANK ( Data[Traffic Total (Volume)(RAW)] ) ) ) > 0,
    "Yes",
    "No"
)

This keeps the verification interactive and avoids round-tripping to Excel or external tools. With this setup, you can go from raw file to clean model to published dashboard—all within Power BI Pro—without needing DAX Studio or creating intermediate CSVs.

View solution in original post

v-menakakota · ‎06-15-2025

Hi @jonsteele ,

Thanks for reaching out to the Microsoft fabric community forum.

I would also take a moment to thank  @DataNinja777 , for actively participating in the community forum and for the solutions you’ve been sharing in the community forum. Your contributions make a real difference.

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

DataNinja777 · ‎06-13-2025

Hi @jonsteele ,

You can streamline your workflow entirely within Power BI and eliminate the need for DAX Studio or Excel preprocessing. Start by using Power BI Desktop’s “Folder” connector to load all CSV files. When connecting, Power Query exposes the file metadata like Name and Folder Path. You can extract the entity identifier using something like:

= Table.AddColumn(Source, "Entity", each Text.BeforeDelimiter([Name], "_"))

or adapt to your specific filename structure. Once the raw files are loaded, apply transformation steps in Power Query to clean and normalize each of the three schema groups. Use conditional logic to manage schema differences:

= Table.TransformColumnTypes(YourTable, {{"Traffic Total (Volume)(RAW)", Int64.Type}})

Use custom columns to parse the “Date Time” field if it needs restructuring or splitting. After transformations, load the cleaned data into the Power BI data model.

Instead of exporting the cleaned data via DAX Studio, you can keep it within the model or publish the PBIX as a dataset to a shared workspace. If you want to enable other reports to reference the clean data, consider publishing this model as a shared dataset or use a Power BI Dataflow to house the logic and make the clean output accessible to other reports. Dataflows work under Pro licenses if everyone is in the same workspace.

To simulate incremental refresh without Premium, you can load your historical files once into one table, and append only the latest month’s files to another table before combining them in Power Query:

= Table.Combine({HistoricalTable, NewMonthTable})

You can also build a small validation page in Power BI itself with visuals that flag missing data, nulls, or duplicate timestamps using DAX like:

Has Nulls =
IF (
    COUNTROWS ( FILTER ( Data, ISBLANK ( Data[Traffic Total (Volume)(RAW)] ) ) ) > 0,
    "Yes",
    "No"
)

This keeps the verification interactive and avoids round-tripping to Excel or external tools. With this setup, you can go from raw file to clean model to published dashboard—all within Power BI Pro—without needing DAX Studio or creating intermediate CSVs.

Large Volume - Flat file cleaning and adding to data model workflow

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Large Volume - Flat file cleaning and adding to data model workflow

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly