Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hello,
We have a very large dataset we are trying to manage (100 million+ rows) but this data is accross the past couple of months and gets updated/uploaded daily to our sharepoint. We can get our data into a more manageable size if we can cut out everything out in the Query to only the files that were created in the current month.
I have created the "PeriodM" column which will return the month the file was created. I want to create a flag that will return a 1 in a new column if the "PeriodM" month is equal to the max of all those values.
Example in the following image the latest/current month is 6/1/2022 so Latest Period should return a 1. And for 5/1/2022 it should return a 0.
This way I will be able to apply a step in our query to only filter rows with 1 in this new column, hopefully soving our size issue.
Solved! Go to Solution.
Your goal is to filter down to just the latest period, right? Although it doesn't create a flag column as an intermediate step, that's exactly what I'm doing in the example I linked to. You can add a flag column too but that adds extra intermediate steps.
Here's an example that adds a flag an then uses it to filter:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTLVN9Q3MjAyAjKNTA2UYnUgwmYIYUNTU7CwEapqZGEk1UbGBnBhcyTVBgjVFkjCZkDVsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, PeriodM = _t, Value = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"PeriodM", type date}, {"Value", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"PeriodMax", each List.Max([PeriodM]), type nullable date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"ID", "PeriodM"}, #"Grouped Rows", {"ID", "PeriodMax"}, "Grouped Rows", JoinKind.LeftOuter),
#"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"PeriodMax"}, {"Flag"}),
#"Replaced Value" = Table.ReplaceValue(#"Expanded Grouped Rows",each [Flag], each if [Flag] = null then 0 else 1,Replacer.ReplaceValue,{"Flag"}),
#"Filtered Rows" = Table.SelectRows(#"Replaced Value", each ([Flag] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Flag"})
in
#"Removed Columns"
Here's a simpler version that bypasses that extra work:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTLVN9Q3MjAyAjKNTA2UYnUgwmYIYUNTU7CwEapqZGEk1UbGBnBhcyTVBgjVFkjCZkDVsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, PeriodM = _t, Value = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"PeriodM", type date}, {"Value", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"PeriodMax", each List.Max([PeriodM]), type nullable date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"ID", "PeriodM"}, #"Grouped Rows", {"ID", "PeriodMax"}, "Grouped Rows", JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"Grouped Rows"})
in
#"Removed Columns"
This post might help with this.
https://community.powerbi.com/t5/Community-Blog/Select-Distinct-Rows-Ordered-by-Another-Column-Power...
I'd recommend approach #4 so that it can be folded into a native query.
@AlexisOlson Thanks for the suggestion but this is not what I'm looking to do and I'm not sure how to apply it to my our case. I was really looking for the "flag" or 1/0 solution as this is also something we can apply to other projects.
Your goal is to filter down to just the latest period, right? Although it doesn't create a flag column as an intermediate step, that's exactly what I'm doing in the example I linked to. You can add a flag column too but that adds extra intermediate steps.
Here's an example that adds a flag an then uses it to filter:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTLVN9Q3MjAyAjKNTA2UYnUgwmYIYUNTU7CwEapqZGEk1UbGBnBhcyTVBgjVFkjCZkDVsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, PeriodM = _t, Value = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"PeriodM", type date}, {"Value", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"PeriodMax", each List.Max([PeriodM]), type nullable date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"ID", "PeriodM"}, #"Grouped Rows", {"ID", "PeriodMax"}, "Grouped Rows", JoinKind.LeftOuter),
#"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"PeriodMax"}, {"Flag"}),
#"Replaced Value" = Table.ReplaceValue(#"Expanded Grouped Rows",each [Flag], each if [Flag] = null then 0 else 1,Replacer.ReplaceValue,{"Flag"}),
#"Filtered Rows" = Table.SelectRows(#"Replaced Value", each ([Flag] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Flag"})
in
#"Removed Columns"
Here's a simpler version that bypasses that extra work:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTLVN9Q3MjAyAjKNTA2UYnUgwmYIYUNTU7CwEapqZGEk1UbGBnBhcyTVBgjVFkjCZkDVsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, PeriodM = _t, Value = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"PeriodM", type date}, {"Value", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"PeriodMax", each List.Max([PeriodM]), type nullable date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"ID", "PeriodM"}, #"Grouped Rows", {"ID", "PeriodMax"}, "Grouped Rows", JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"Grouped Rows"})
in
#"Removed Columns"
Please note I did not try this solution and the PBI forum auto selected this as a correct answer.
I was able to come up with my own solution.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
35 | |
16 | |
12 | |
11 | |
9 |
User | Count |
---|---|
44 | |
24 | |
16 | |
14 | |
14 |