Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Prefer CSV or JSON for my dataflows

Hi,

 

I am new to PowerBI and will appreciate any help with my question.

 

I have created  multiple dataflows which extracts data from folder locations in my file system with the help of an on-premise gateway. The folders consist of multiple (~60) json files (about 15-20 MB each) which are combined and then transformed in the dataflows. Also, these dataflows are to be refreshed every day.

 

I also have an aleternative to use CSV files instead of JSONs in my dataflows.

 

To anyone who has experience in using both JSONs and CSVs in dataflows, I wanted to ask which should be the preferred file format that I should be using for me to:

       1) minimize the use of Power BI resources/capacity as much as possible.

       2) minimize the refresh time for the dataflows.

       3) reduce refresh failures as much as possible.

 

I am aware that my transformation querries that I am using will affect the above mentioned points but I wanted to know if using different file formats can also make a difference.

 

Regards,

Diptanshu Lal

4 REPLIES 4
bcdobbs
Super User
Super User

I can't find it at the moment but I did see a blog comparing load times and think csv came out top. Under the hood when you run a data flow it stores the data as csv in azure data lake storage so it's very optimised for that sort of thing.

 

My hunch would be that you're unlikely to notice a huge difference. (Worth testing!)



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
Anonymous
Not applicable

Hi @bcdobbs ,

 

Do you mean to say that, according to that blog, csv seemed to be taking the least time? It would be great if you could direct me to that blog or any other source discussing this issue.

 

Also, yes I have made separate dataflows to perform the extraction and transformation operations on. This actually does help a lot.

 

Best Regards,

Diptanshu Lal

Sorry. Blog was csv vs parquet.

https://www.datalineo.com/post/parquet-and-csv-querying-processing-in-power-bi

Both csv and json are plain text formats so there won't be much in it. However json carries all the field names for every record and so for the same volume of data the files will be larger. Therefore I would expect csv to load marginally faster than json (a lot -

also depends on how flat the json files are).

 

Honestly I don't think you'll find a huge performance impact either way.

 

 

 



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

One thing to consider that will make a difference if you're reading lots of csv (or json) in is two separate the ingest from transformations in dataflows (assuming you have premium or premium per user).

 

Create a first data flow that simply pulls in the raw data.

Then create a second which uses the ingested data in the first to do transformations on.



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.