The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I am working on a project where I have to pull 20+ million rows of data from a remote SQL server, It was configured through dataflow.
but it's been causing high CPU usage and crashes on the server, would simplifying my queries help to reduce the load?
Instead of performing multiple joins and If statements, would it help if I do that once my data is in the dataflow?
Also, this dataflow has to be refreshed daily and incremental does not really work with my current query, it causes more load on the server and takes 60+ minutes to refresh vs. 10 minutes for a full refresh.
there is nothing in the query that I won't be able to do Power query or DAX
SELECT O.OrderID,
#(lf)
CASE
WHEN O.CSR LIKE 'Inet%' THEN
'Inet'#(lf)
WHEN O.CSR LIKE 'Scheduler%' THEN
'Scheduler'#(lf)
WHEN O.CSR IS NULL THEN
''#(lf)
ELSE 'CSR'#(lf)
END AS CSR_Type,#(lf) C.RateChart,#(lf)O.CustID,#(lf)DR.DriverID,#(lf)FL.FleetID,#(lf)FL.Description FleetName,#(lf)C.LocalName, #(lf)C.CustSource, #(lf)C.EnsendaURL,#(lf)CT.CustType AS CustomerType, #(lf)O.Status, O.CSR, #(lf)CAST(O.OrderDate AS datetime) AS OrderDate,#(lf)O.OriginAddress,#(lf)O.OriginCity,#(lf)O.OriginState,#(lf)O.OriginZip,#(lf)o.OriginPlus4,#(lf)O.DestAddress,#(lf)O.DestCity,#(lf)O.DestState,#(lf)O.DestZip,#(lf)O.DestPlus4,#(lf)O.Caller,#(lf)OT.Description AS Order_Type,#(lf)O.Pieces, #(lf)O.ParcelType, #(lf)O.Weight,O.OrderType,#(lf)ORS.CompletionType,#(lf)CASE
WHEN O.Pieces IS NULL
OR O.Pieces<= 0 THEN
'No Qty'#(lf)WHEN O.Pieces IS NOT NULL THEN
'Qty Available'#(lf)End AS Pieces_Status,#(lf)CASE
WHEN ORS.POD IS NULL THEN
'NO'#(lf)WHEN ORS.POD IS NOT NULL THEN
'YES'#(lf)End AS POD_Status,#(lf)CASE
WHEN OS.Delivered IS NULL THEN
'NO'#(lf)WHEN OS.Delivered IS NOT NULL THEN
'YES'#(lf)End AS Delivered_Status,#(lf)CASE
WHEN OS.PickedUp IS NULL THEN
'NO'#(lf)WHEN OS.PickedUp IS NOT NULL THEN
'YES'#(lf)End AS PickedUp_Status,#(lf)CASE
WHEN OS.PickedUp IS NULL THEN
'NO'#(lf)WHEN OS.PickedUp IS NOT NULL THEN
'YES'#(lf)End AS PickedUp_Status,#(lf)CASE
WHEN OS.Dispatched IS NULL THEN
'NO'#(lf)WHEN OS.Dispatched IS NOT NULL THEN
'YES'#(lf)End AS Dispatched_Status,#(lf)CASE
WHEN S.Signature IS NULL THEN
'NO'#(lf)WHEN S.Signature IS NOT NULL THEN
'YES'#(lf)End AS Signature,#(lf)CAST(OS.PickedUp AS datetime) AS PickedUp,#(lf)CAST(OS.AtOrigin AS datetime) AS AtOrigin,#(lf)CAST(O.CreatedWhen AS datetime) CreatedWhen,#(lf)O.UpdatedWhen,#(lf)CAST(OS.Received AS datetime) AS Received, #(lf)CAST(OS.Delivered AS datetime) AS Delivered,#(lf)OS.Pod,CAST(OS.Invoiced AS datetime) AS Invoiced,#(lf)CAST(OS.AtDestination AS datetime) AS AtDestination,#(lf)CAST(O.DueTimeTo AS datetime) AS DueTimeTo,#(lf)CAST(O.ReadyTimeTo AS datetime) AS ReadyTimeTo, #(lf)OS.InvoiceNumber, #(lf)CAST(OS.Paid AS datetime) AS Paid,#(lf)CASE
WHEN CustSource = 0 THEN
'Nothing'#(lf)WHEN CustSource = 1 THEN
'Phone Directory'#(lf)WHEN CustSource = 2 THEN
'Web'#(lf)WHEN CustSource = 3 THEN
'Cold Call'#(lf)WHEN CustSource = 4 THEN
'Advertisement'#(lf)WHEN CustSource = 5 THEN
'Referral'#(lf)End AS CustomerSource,#(lf)CASE
WHEN CompletionType IS NULL THEN
'stop not completed'#(lf)WHEN CompletionType = 1 THEN
'stop completed & driver didn’t scan'#(lf)WHEN CompletionType = 2 THEN
'N/A'#(lf)WHEN CompletionType = 3 THEN
'stop completed & driver scanned'#(lf)WHEN CompletionType = 4 THEN
'stop completed – CSR manual entry/not scanned'#(lf)WHEN CompletionType = 5 THEN
'N/A'#(lf)End AS 'Completion Type',#(lf)
CASE #(lf)WHEN DATEDIFF(MINUTE, AtOrigin, PickedUp) >= 0 THEN
DATEDIFF(MINUTE, AtOrigin, PickedUp)#(lf)WHEN DATEDIFF(MINUTE, AtOrigin, PickedUp) IS NULL THEN
NULL#(lf)ELSE 0#(lf)END#(lf)AS 'Waiting Time at Origin location',#(lf)CASE #(lf)WHEN DATEDIFF(MINUTE, AtDestination, Delivered) >= 0 THEN
DATEDIFF(MINUTE, AtDestination, Delivered)#(lf)WHEN DATEDIFF(MINUTE, AtDestination, Delivered) IS NULL THEN
NULL#(lf)ELSE 0#(lf)END#(lf)AS 'Waiting Time at Destination location ',#(lf)CASE #(lf)WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) >= 0 THEN
DATEDIFF(MINUTE, DueTimeTo, Delivered)#(lf)WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) IS NULL THEN
NULL#(lf)ELSE 0#(lf)END#(lf)AS 'On Time Performance ',#(lf)DATEDIFF(MINUTE, DueTimeTo, Delivered) AS 'Performance'#(lf) ,CASE
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) IS NULL THEN
NULL#(lf)#(tab)
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) <= 10
AND DATEDIFF(MINUTE, DueTimeTo, Delivered) >= -100000 THEN
1#(lf)#(tab)
ELSE 0#(lf)#(tab)
END AS 'On Time Performance (10 minute)'#(lf) ,CASE
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) IS NULL THEN
NULL#(lf)
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) <= 10
AND DATEDIFF(MINUTE, DueTimeTo, Delivered) >= -10000 THEN
'On Time'#(lf)
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) < -10000 THEN
'Early'#(lf)#(tab)
WHEN DATEDIFF(MINUTE, DueTimeTo, Delivered) > 10 THEN
'Late'#(lf)#(tab)
END AS 'Early/Late'#(lf)#(tab) ,CASE
WHEN Delivered IS NULL
AND DATEDIFF(MINUTE, DueTimeTo, getdate()) <= 10
AND DATEDIFF(MINUTE, DueTimeTo, getdate()) >= -10 THEN
'Due within 10 mins'#(lf)
ELSE 'Due within' + CAST(DATEDIFF(MINUTE, DueTimeTo, getDate()) AS VARCHAR) + ' mins'#(lf)#(tab)
END AS 'Expected On time',#(lf)CASE
WHEN Delivered IS NULL THEN
DATEDIFF(MINUTE, getdate(),DueTimeTo)#(lf)#(tab)
ELSE NULL
END AS 'Minutes until delivery',#(lf)CASE #(lf)WHEN DATEDIFF(MINUTE, ReadyTimeTo, PickedUp) >= 0 THEN
DATEDIFF(MINUTE, ReadyTimeTo, PickedUp)#(lf)WHEN DATEDIFF(MINUTE, ReadyTimeTo, PickedUp) IS NULL THEN
NULL#(lf)ELSE 0#(lf)END#(lf)AS 'Pickup Performance ',#(lf)DT.Price AS Price,#(lf)DT.Base_Rate_Price,DT.Fuel_Surcharge_Price,DT.GST_Price,DT.Loading_Time_Price,DT.Pieces_Price,DT.Unloading_Time_Price,DT.Waiting_Time_Price,DT.Weight_Price#(lf)FROM dbo.tblOrder AS O
WITH (NOLOCK) INNER JOIN#(lf) (#(lf) SELECT#(lf) OrderID,
#(lf) MAX(CASE itemid
WHEN 0 THEN
Price END) AS 'Base_Rate_Price',#(lf) MAX(CASE itemid
WHEN 1 THEN
Price END) AS 'Waiting_Time_Price',#(lf) MAX(CASE itemid
WHEN 2 THEN
Price END) AS 'Loading_Time_Price',#(lf) MAX(CASE itemid
WHEN 3 THEN
Price END) AS 'Pieces_Price', #(lf) MAX(CASE itemid
WHEN 4 THEN
Price END) AS 'Weight_Price', #(lf) MAX(CASE itemid
WHEN 5 THEN
Price END) AS 'Fuel_Surcharge_Price', #(lf) MAX(CASE itemid
WHEN 8 THEN
Price END) AS 'GST_Price',#(lf) MAX(CASE itemid
WHEN 25 THEN
Price END) AS 'Unloading_Time_Price',#(lf) SUM(Price) AS Price#(lf)
FROM tblOrderItems
GROUP BY OrderID#(lf)) DT #(lf)ON DT.OrderID = O.OrderID
INNER JOIN #(lf) dbo.tblCustomer AS C
WITH (NOLOCK)
ON O.CustID = C.CustID INNER JOIN#(lf) dbo.tblOrderStatus AS OS
WITH (NOLOCK)
ON O.OrderID = OS.OrderID LEFT OUTER JOIN#(lf) dbo.tblNFO AS NFO
WITH (NOLOCK)
ON O.OrderID = NFO.OrderID LEFT OUTER JOIN#(lf) dbo.tblCustomerTypes AS CT
WITH (NOLOCK)
ON CT.CustTypeID = C.CustType LEFT OUTER JOIN#(lf) dbo.tblOrderRouteStops AS ORS
WITH (NOLOCK)
ON ORS.StampID = O.OrderID LEFT OUTER JOIN#(lf) dbo.tblOrderTypes AS OT
WITH (NOLOCK)
ON OT.OrderTypeID = O.OrderType LEFT OUTER JOIN#(lf)#(tab) dbo.tblOrderDrivers AS OD
WITH (NOLOCK)
ON OD.OrderID = O.OrderID LEFT OUTER JOIN#(lf) tblSignature AS S WITH(NOLOCK)
ON S.PKID = O.OrderID
AND S.Type = 0#(lf)#(tab) LEFT OUTER
JOIN tblDrivers DR
WITH (NOLOCK)
ON OD.DriverID = DR.DriverID#(lf)#(tab) LEFT OUTER
JOIN tblFleets FL
WITH (NOLOCK)
ON FL.FleetID = OS.AssignedFleet#(lf) #(lf)Where Delivered >= '2016-1-1'
Solved! Go to Solution.
Instead of performing multiple joins and If statements, would it help if I do that once my data is in the dataflow?
Performing merges in Power Query is extremely costly. Usually the SQL query engine is much (much) better at that. The best merge/join is the one you don't have to do until the data is in the in-memory Vertipaq engine. Let the data model do the work for you.
Dataflows are there to shield you (the developer) from slow source systems. If your SQL server is not slow, then ditch the dataflows and use datasets with incremental refresh instead.
Instead of performing multiple joins and If statements, would it help if I do that once my data is in the dataflow?
Performing merges in Power Query is extremely costly. Usually the SQL query engine is much (much) better at that. The best merge/join is the one you don't have to do until the data is in the in-memory Vertipaq engine. Let the data model do the work for you.
Dataflows are there to shield you (the developer) from slow source systems. If your SQL server is not slow, then ditch the dataflows and use datasets with incremental refresh instead.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.