Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a dataset and I want Table.SelectRows to return the rows that has max fileIndex group by file Name.
I can currently do this
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
#"Added Custom" = Table.AddColumn(ct, "Custom", each Table.SelectRows(
ct,
(r) => r[fileName]
= [fileName] and
r[fileIndex]
= List.Max(Table.SelectRows(ct, (q) => q[fileName] = [fileName])[fileIndex])
))
in
#"Added Custom"
but instead of adding a column I want to utilize that in Table .SelectRows so that it returns the filtered rows only that have the max fileIndex group by fileName.
This is what I tried which does not work
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
Custom1 = Table.SelectRows(
ct, each
(r) => r[fileName]
= [fileName] and
r[fileIndex]
= List.Max(Table.SelectRows(ct, (q) => q[fileName] = [fileName])[fileIndex])
)
in
Custom1
DAX equivalent is
pbix is attached
Thank you in advance
Solved! Go to Solution.
Group and filter:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
#"Max fileIndex per filName" = Table.Group(ct,{"fileName"},{
{"filtered", each Table.FromRecords({Table.Max(_,"fileIndex" )}),
type table[fileName=text, fileIndex=Int64.Type, Content=text]}
}),
#"Expanded filtered" = Table.ExpandTableColumn(#"Max fileIndex per filName", "filtered", {"fileIndex", "Content"})
in
#"Expanded filtered"
This might help. Check out my Approach #4 in particular:
Here's a version with Table.Join instead of Table.NestedJoin:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
#"Grouped Rows" = Table.Group(ct, {"fileName"}, {{"fileIndex", each List.Max([fileIndex]), type nullable number}}),
#"Merged Queries" = Table.Join(ct, {"fileName", "fileIndex"}, #"Grouped Rows", {"fileName", "fileIndex"}, JoinKind.Inner)
in
#"Merged Queries"
In terms of performance, Table.Group() is better. All in all, try to use database to finish this task.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
Grouped = Table.Group(ct, "fileName", {"grp", each Table.Max(_, "fileIndex")}),
#"Expanded grp" = Table.ExpandRecordColumn(Grouped, "grp", {"fileIndex", "Content"}, {"fileIndex", "Content"})
in
#"Expanded grp"
-- for SQL Server
WITH cte_rn
AS (SELECT *
, Row_number ()
OVER (
partition BY filename
ORDER BY fileindex DESC) AS RN
FROM yearly_sales)
SELECT *
FROM cte_rn
WHERE rn = 1
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
In terms of performance, Table.Group() is better. All in all, try to use database to finish this task.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
Grouped = Table.Group(ct, "fileName", {"grp", each Table.Max(_, "fileIndex")}),
#"Expanded grp" = Table.ExpandRecordColumn(Grouped, "grp", {"fileIndex", "Content"}, {"fileIndex", "Content"})
in
#"Expanded grp"
-- for SQL Server
WITH cte_rn
AS (SELECT *
, Row_number ()
OVER (
partition BY filename
ORDER BY fileindex DESC) AS RN
FROM yearly_sales)
SELECT *
FROM cte_rn
WHERE rn = 1
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
How much I wish that the data already went to a server so that I did not have to use PQ. I have nothing against PQ but it is not scalable. Datamart is also switched off (org-level) which is why can't take advantage of that too.
While we are still at it, other TSQL to pull the same string
declare @t1 as table (fileName int, fileIndex int, cat varchar(max))
insert into @t1
select * from
(values(1,1,'bin'),(1,2,'bin'),(1,3,'bin'),(2,1,'bin'),(2,2,'bin')) t(a,b,c)
--option#1
select a.fileName, a.fileIndex, a.cat
from @t1 a
where exists
(select c.fileName, c.fileIndex from
(select b.fileName, max(b.fileIndex) as fileIndex from @t1 b group by b.fileName) as c
where a.fileName =c.fileName and a.fileIndex=c.fileIndex)
--option#2
select top 1 with ties
a.fileName, a.fileIndex, a.cat
from @t1 a
order by row_number() over (partition by a.fileName order by a.fileIndex desc)
This might help. Check out my Approach #4 in particular:
Here's a version with Table.Join instead of Table.NestedJoin:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
#"Grouped Rows" = Table.Group(ct, {"fileName"}, {{"fileIndex", each List.Max([fileIndex]), type nullable number}}),
#"Merged Queries" = Table.Join(ct, {"fileName", "fileIndex"}, #"Grouped Rows", {"fileName", "fileIndex"}, JoinKind.Inner)
in
#"Merged Queries"
Group and filter:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjNU0lEC4WinzLzEospYpVgdqKgRVlFjDFEjrCYYYTXBCKcJJiiisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [fileName = _t, fileIndex = _t, Content = _t]),
ct = Table.TransformColumnTypes(Source,{{"fileName", type text}, {"fileIndex", Int64.Type}, {"Content", type text}}),
#"Max fileIndex per filName" = Table.Group(ct,{"fileName"},{
{"filtered", each Table.FromRecords({Table.Max(_,"fileIndex" )}),
type table[fileName=text, fileIndex=Int64.Type, Content=text]}
}),
#"Expanded filtered" = Table.ExpandTableColumn(#"Max fileIndex per filName", "filtered", {"fileIndex", "Content"})
in
#"Expanded filtered"
@ronrsnfld thanks for this, I am going to use this on a 12mil data source which has not been incorporated into any server yet.
Hence, I am looking for a working and optimized solution. Is it the most optimized solution?
I don't know how optimal this is. Often things may be faster if you can get it done on the server. Maybe someone will contribute a different solution and you can test it.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.