Redshift native query file size

Anonymous · ‎05-09-2022

So the short version of the story is that when we were creating the data set in Power Query and using M, we were having a lot of issues with the dataset not refreshing and getting a lot of timeout errors. We decided to rebuild using the native query in the Redshift connector and had it all done as a single SQL statement. After doing so, there haven't been any refresh issues so far, but the filesize is 44% larger, so I'm just trying to figure out why.

Our old data model for our commission line items was built using the Redshift connector. It pulled in the entire table of 20 years of data, deleted half of the columns, then merged in 2-3 columns each from 3 other tables. One of those extra columns is a run date, which is then filtering to keep anything within the last 400 days. When all is said and done, (for size testing purposes) I disabled the load on those other tables and only kept the commission items table. It ends up being about 15.5 million rows and 12 columns, with a file size of 310MB.

When I rebuild the same merges/appends as in the description above using the optional SQL statement in the same Redshift connector, with the only difference being it's pulling the last 12 months from the start of the current month instead of the last 400 days. All the other merges/appends are identical. The table ends up being 13.8 million rows and 12 columns, but the file size is 446MB.

I get the row count being less because is has a month less of data. But why is the file size nearly 44% larger? The final table size is exactly the same in terms of columns and there are fewer rows. The content of what is there is identical. So why does the SQL queried table take up so much more space? We are really loving the improved refreshing with the native query, but not too thrilled with how much larger the file size is.

Is there any way to tweak the native query so that the file size is more in lines of the older M code way we did it? I just can't figure out where the bloat is coming from.

Anonymous · ‎05-12-2022

Hi @Anonymous ,

As the official document said,Power Query gives you the flexibility to import data from wide variety of databases that it supports. It can run native database queries, which can save you the time it takes to build queries using the Power Query interface.

Here is a blog talking about a similar scenario, you may kindly refer to:

Solved! Query folding for native SQL in Power BI

Differences between the M Language and DAX in Power BI

Query Folding On SQL Queries In Power Query

Best Regards,
Eyelyn Qin