cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
kuraitengai
Frequent Visitor

Redshift native query file size

So the short version of the story is that when we were creating the data set in Power Query and using M, we were having a lot of issues with the dataset not refreshing and getting a lot of timeout errors. We decided to rebuild using the native query in the Redshift connector and had it all done as a single SQL statement. After doing so, there haven't been any refresh issues so far, but the filesize is 44% larger, so I'm just trying to figure out why.

 

Our old data model for our commission line items was built using the Redshift connector. It pulled in the entire table of 20 years of data, deleted half of the columns, then merged in 2-3 columns each from 3 other tables. One of those extra columns is a run date, which is then filtering to keep anything within the last 400 days. When all is said and done, (for size testing purposes) I disabled the load on those other tables and only kept the commission items table. It ends up being about 15.5 million rows and 12 columns, with a file size of 310MB.

 

When I rebuild the same merges/appends as in the description above using the optional SQL statement in the same Redshift connector, with the only difference being it's pulling the last 12 months from the start of the current month instead of the last 400 days. All the other merges/appends are identical. The table ends up being 13.8 million rows and 12 columns, but the file size is 446MB.

 

I get the row count being less because is has a month less of data. But why is the file size nearly 44% larger? The final table size is exactly the same in terms of columns and there are fewer rows. The content of what is there is identical. So why does the SQL queried table take up so much more space? We are really loving the improved refreshing with the native query, but not too thrilled with how much larger the file size is.

 

Is there any way to tweak the native query so that the file size is more in lines of the older M code way we did it? I just can't figure out where the bloat is coming from.

 
1 REPLY 1
v-eqin-msft
Community Support
Community Support

Hi @kuraitengai ,

 

As the official document said,Power Query gives you the flexibility to import data from wide variety of databases that it supports. It can run native database queries, which can save you the time it takes to build queries using the Power Query interface. 

 

Here is a blog talking about a similar scenario, you may kindly refer to:

Solved! Query folding for native SQL in Power BI

Differences between the M Language and DAX in Power BI

Query Folding On SQL Queries In Power Query

 

Best Regards,
Eyelyn Qin

Helpful resources

Announcements
PBI Sept Update Carousel

Power BI September 2023 Update

Take a look at the September 2023 Power BI update to learn more.

Learn Live

Learn Live: Event Series

Join Microsoft Reactor and learn from developers.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

MPPC 2023 PBI Carousel

Power Platform Conference-Power BI and Fabric Sessions

Join us Oct 1 - 6 in Las Vegas for the Microsoft Power Platform Conference.

Top Solution Authors
Top Kudoed Authors