Re: Appending Excel Data To SQL Data

jseek · ‎01-06-2016

Hi all,

I am working with a source data set from Azure (SQL Server), however, the data is often not updated in a timely fashion. I am able to manually pull the same data set, updated at the time I run it. I then have Power Query set up to append the excel data set and remove any duplicate records with what comes from Azure (SQL), based on the timestamp from when the data was gathered. However, this has resulted in the data set staying in limbo forever, never finishing the query changes when the changes are committed.

Has anyone tried to do anything similar and ran into the same issue?

mike_honey · ‎01-06-2016

Your question is a bit unclear to me - I couldnt follow your last sentence in particular. Also "Azure" is a very broad term - can you be more specific?

Anyway, perhaps you should try keeping the Query against "Azure" very simple and writing it straight to a "staging" Excel Table? Then you can build your "append and remove andy duplicate records" logic in a Query that starts from that "staging" Table.

jseek · ‎01-07-2016

Update: I went through step by step in the process. It appears that it is actually the process to remove duplicates that is causing the performance lag.

let
#"Azure" = #"Azure SQL Data",
#"Appended Query" = Table.Combine({Azure, #"Excel Source"}),
#"Removed Duplicates" = Table.Distinct(Table.Sort(#"Appended Query",{{"Date Imported", Order.Descending}}), {"ID", "Entity"})
in
#"Changed Type"

mike_honey · ‎01-07-2016

mike_honey · ‎01-11-2016

Actually I wasnt suggesting a staging SQL table, but a staging Excel table. So you might just run the extract from SQL Azure and load that result into an Excel table. Then your Append and Remove duplicates steps would be working against local data, which can be faster.

But looking at your Query code I suspect there is just a lot of data to de-duplicate. Can you give us an example of "the query is large"?

fbrossard · ‎01-07-2016

jseek · ‎01-07-2016

Thanks, I clarified the post. A SQL server hosted on Azure. The query is large and I would think that adding another step - loading to an Excel table - would add additional issues and processing time. Does the fact that this is a SQL server change the recommendation?

fbrossard · ‎01-07-2016

Agree with @mike_honey you should bulk load your data into a staging table of your Azure SQL Db and then remove dups and insert your data into the destination table by using a stored proc.

On large volume, ELT is better than ETL.

Notice that an Excel spreadsheet is limited to 1 million rows.

If you have to load data into a SQL Db, use a true ETL solution like SSIS or Azure Data Factory