Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.

Reply
vivi_nainai2021
Frequent Visitor

Categorize data by number of entries

Hi PowerBi experts, I have 2 questions:

1. I have a survey dataset where I'm trying to create a category with a condition that if the ID column has more than 5 entries, it should be tagged as either "global" or "regional". For example, if ID 1 contains 4 regions, it should be tagged as regional. Any ID with more than 5 or more regions should be tagged "global". How can I do this in PowerQuery? (see image below)

 

vivi_nainai2021_0-1622849479421.png

 

2. In the same dataset, I have a column that contains open-ended responses. The responses contain a lot of text but I need to extract out only certain words such as the name of countries they mention. For example, in the image below, one respondent has mentioned that they work in Australia and Canada. How can I create another column which only extracts the name of countries by ID? I have a list of countries in a seperate table as reference if needed.

vivi_nainai2021_0-1622850436330.png

 

 

 

Thank you! 

1 ACCEPTED SOLUTION
edhans
Super User
Super User

For the first question, use a Group BY @vivi_nainai2021 grouping on the ID.

edhans_0-1622852891653.pngI then added a custom column with the formula:

if [Count] >= 5 then "Global" else "Regional"

It returns this:

edhans_1-1622852949938.png

I then expanded the Regional Column that was inside the All Rows column and deleted the temporary Count column, and final result is this:

edhans_2-1622853011832.png

 

The code for this is here:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWBsJLgrGQ4KwXMMgayUsEsEyArDc5Kh7My4KxMOCsLzspWio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Region = _t]),
    #"Grouped Rows" = Table.Group(Source, {"ID"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"All Rows", each _, type table [ID=nullable text, Region=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Type", each if [Count] >= 5 then "Global" else "Regional", type text),
    #"Expanded All Rows" = Table.ExpandTableColumn(#"Added Custom", "All Rows", {"Region"}, {"Region"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded All Rows",{"Count"})
in
    #"Removed Columns"

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 

On the last question, one approach could be this - very easy to do with the UI.

  1. Remove all punctuation with the Replace. Replace it with nothing, so leave the 2nd box blank. Get rid of commas, periods, etc.
  2. Split the column by spaces.
  3. Add an Index column
  4. Select the index column and Unpviot other columns.

You will have something like this:

edhans_3-1622853358181.png

You can now merge the Value column shown with your list of countries, and filter out wherever there was a null. You know which ones belonged with each other by the index. So the 0 record had both Australia and Canada, but the 1 record had only Germany. Again, here it the full code example:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCsnILFbISCxWcCwtLilKzMlMVEjMS1FwTsxLTElUyE3NK8nMz0tN0VOK1YEqzgIqVMjJLC4pVnBPLcpNzKsESsYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".","",Replacer.ReplaceText,{"Column1"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",","",Replacer.ReplaceText,{"Column1"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6"}),
    #"Added Index" = Table.AddIndexColumn(#"Split Column by Delimiter", "Index", 0, 1, Int64.Type),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value")
in
    #"Unpivoted Other Columns"

You could definitely get more fancy using List.Generate to go through each record, but the code would be about 4 times a long and 10 times more complex, and I think the above logic will get the job done.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

2 REPLIES 2
edhans
Super User
Super User

For the first question, use a Group BY @vivi_nainai2021 grouping on the ID.

edhans_0-1622852891653.pngI then added a custom column with the formula:

if [Count] >= 5 then "Global" else "Regional"

It returns this:

edhans_1-1622852949938.png

I then expanded the Regional Column that was inside the All Rows column and deleted the temporary Count column, and final result is this:

edhans_2-1622853011832.png

 

The code for this is here:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWBsJLgrGQ4KwXMMgayUsEsEyArDc5Kh7My4KxMOCsLzspWio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Region = _t]),
    #"Grouped Rows" = Table.Group(Source, {"ID"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"All Rows", each _, type table [ID=nullable text, Region=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Type", each if [Count] >= 5 then "Global" else "Regional", type text),
    #"Expanded All Rows" = Table.ExpandTableColumn(#"Added Custom", "All Rows", {"Region"}, {"Region"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded All Rows",{"Count"})
in
    #"Removed Columns"

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 

On the last question, one approach could be this - very easy to do with the UI.

  1. Remove all punctuation with the Replace. Replace it with nothing, so leave the 2nd box blank. Get rid of commas, periods, etc.
  2. Split the column by spaces.
  3. Add an Index column
  4. Select the index column and Unpviot other columns.

You will have something like this:

edhans_3-1622853358181.png

You can now merge the Value column shown with your list of countries, and filter out wherever there was a null. You know which ones belonged with each other by the index. So the 0 record had both Australia and Canada, but the 1 record had only Germany. Again, here it the full code example:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCsnILFbISCxWcCwtLilKzMlMVEjMS1FwTsxLTElUyE3NK8nMz0tN0VOK1YEqzgIqVMjJLC4pVnBPLcpNzKsESsYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".","",Replacer.ReplaceText,{"Column1"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",","",Replacer.ReplaceText,{"Column1"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6"}),
    #"Added Index" = Table.AddIndexColumn(#"Split Column by Delimiter", "Index", 0, 1, Int64.Type),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value")
in
    #"Unpivoted Other Columns"

You could definitely get more fancy using List.Generate to go through each record, but the code would be about 4 times a long and 10 times more complex, and I think the above logic will get the job done.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
vivi_nainai2021
Frequent Visitor

I figured out question 1. Simply counted the number of rows per ID by name of region. Still looking for an asnwer to the second question 🙂

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code FABINSIDER for a $400 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors