We've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
Hi all,
I am working with a source data set from Azure (SQL Server), however, the data is often not updated in a timely fashion. I am able to manually pull the same data set, updated at the time I run it. I then have Power Query set up to append the excel data set and remove any duplicate records with what comes from Azure (SQL), based on the timestamp from when the data was gathered. However, this has resulted in the data set staying in limbo forever, never finishing the query changes when the changes are committed.
Has anyone tried to do anything similar and ran into the same issue?
Your question is a bit unclear to me - I couldnt follow your last sentence in particular. Also "Azure" is a very broad term - can you be more specific?
Anyway, perhaps you should try keeping the Query against "Azure" very simple and writing it straight to a "staging" Excel Table? Then you can build your "append and remove andy duplicate records" logic in a Query that starts from that "staging" Table.
Update: I went through step by step in the process. It appears that it is actually the process to remove duplicates that is causing the performance lag.
let
#"Azure" = #"Azure SQL Data",
#"Appended Query" = Table.Combine({Azure, #"Excel Source"}),
#"Removed Duplicates" = Table.Distinct(Table.Sort(#"Appended Query",{{"Date Imported", Order.Descending}}), {"ID", "Entity"})
in
#"Changed Type"
Actually I wasnt suggesting a staging SQL table, but a staging Excel table. So you might just run the extract from SQL Azure and load that result into an Excel table. Then your Append and Remove duplicates steps would be working against local data, which can be faster.
But looking at your Query code I suspect there is just a lot of data to de-duplicate. Can you give us an example of "the query is large"?
Thanks, I clarified the post. A SQL server hosted on Azure. The query is large and I would think that adding another step - loading to an Excel table - would add additional issues and processing time. Does the fact that this is a SQL server change the recommendation?
Agree with @mike_honey you should bulk load your data into a staging table of your Azure SQL Db and then remove dups and insert your data into the destination table by using a stored proc.
On large volume, ELT is better than ETL.
Notice that an Excel spreadsheet is limited to 1 million rows.
If you have to load data into a SQL Db, use a true ETL solution like SSIS or Azure Data Factory
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 5 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |
| User | Count |
|---|---|
| 11 | |
| 10 | |
| 7 | |
| 7 | |
| 6 |