Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.

Reply
Luzadriana255
Helper II
Helper II

Help Conversion of information contained in many lines into one line for 1 identifier

We're aiming to transform a table with unstructured data into a more organized format using Power Query. The goal is to group information by an "Identifier" column, extracting specific data points like company, address, and ZIP code into their respective columns. Now I have a table with the  information mix into many rows in just one column, in other column is the identificatory appearing in every row containing information about that identificator.  I need to identify with information in the rows that belong to the Company column, those to Address and those to Postcode. I want to ignore those that do not belong anywhere. The information about the company can appear in two rows. How can I perform this conversion in Power Query? I show one table as it looks now and the other as I want it to look.

I have other information in the table in other columns that cannot be deleted (e.g. by deleting all rows that contain @)

 

Current table

Identifikator

Information

6

 

6

Unternehmen 1

6

Abteilung 3

6

Hügelstr. 69

6

Kommentar

6

60433 Frankfurt

8

 

8

Unternehmen 2

8

 

8

Zeil 5

8

60313 Frankfurt am Main

8

kk.g@gmail.com

9

 

9

Unternehmen 3

9

Elbestrasse 48

9

60329 Frankfurt

9

 

10

Unternehmen 4

10

Abteilung 6

10

Gutleutstr.7-11

10

60329 Frankfurt

10

info@yahoo.de

 

Final table

Identificator

Company

Adress

Zip Code

 

6

Unternehmen 1 - Abteilung 3

Hügelstr. 69

60433 Frankfurt

 

8

Städtische Kitas Frankfurt am Main

Zeil 5

60313 Frankfurt am Main

 

9

ABG Frankfurt Holding

Elbestrasse 48

60329 Frankfurt

 

10

Stadt Frankfurt am Main -Amt für Bau und Immobilien

Gutleutstr.7-11

60329 Frankfurt

 

 

Notes on the transformations

- Identifiers 6 and 10 have company information in two rows, they need to be combined in the column for the company

- Emails should be ignored

- Address goes in one column

- Zip code also goes in one column

 

I will very grateful if you have an idea on how to proceed for this transformation. Thank you!

1 ACCEPTED SOLUTION
v-stephen-msft
Community Support
Community Support

Hi @Luzadriana255 ,

 

@Shravan133 's reply is great and vaild.

Here's my solution with .pbix file attached.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bZC7DoIwFIZfpemshFKssOngJTGOLhKGqgck9JKUMvhubr6YiPFg0K39/vT/Tk+WUUEnlOaT9+FgPDgDVw2GMKTLk4dKtaYkHNn2cS9BNd4FRKRId1Z3T710SEQYc07WTpq6aJ3vefJRJiNl9BsfOzOZ4VWEnH3VEanJXlYG87oOykWpZaWCs9U9Tj916cjGka7UCbqvyKYBEieIO1mUjmbHNhaO6uIBDwsTA9y0XkHrXyubTxkbgn+aPqhMYRc3ebU2uADN8yc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Identifikator = _t, Information = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Identifikator", Int64.Type}, {"Information", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Information] <> "")),
    //Rows where the Information column is empty are filtered out.
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each not Text.Contains([Information], "@")),
    //Rows where the Information column contains “@” are filtered out.
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each if Text.Contains([Information], "Zeil ") or Text.Contains([Information], "Hügelstr") or Text.Contains([Information], "Elbestrasse") or Text.Contains([Information], "Gutleutstr") then "Address"

else if Text.Contains([Information], "603") or Text.Contains([Information], "604") then "ZIP Code"

else if Text.Contains([Information], "Unternehmen") or Text.Contains([Information], "Abteilung") then "Company"

else null),
//A new column Custom is added based on the content of the Information column:
//If Information contains certain street names, it is labeled as “Address”.
//If Information contains certain ZIP codes, it is labeled as “ZIP Code”.
//If Information contains certain company-related terms, it is labeled as “Company”.
//Otherwise, it is labeled as null.
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Table.Group(#"Added Custom", {"Custom","Identifikator"}, {{"Information", each Text.Combine([Information], "-"), type text}})),
    //Another custom column is added, grouping the table by Custom and Identifikator, and combining the Information values into a single string separated by “-”.
    #"Expanded Custom.1" = Table.ExpandTableColumn(#"Added Custom1", "Custom.1", {"Information"}, {"Information.1"}),
    //The grouped table is expanded back into the main table, adding a new column Information.1.
    #"Added Custom2" = Table.AddColumn(#"Expanded Custom.1", "Custom.1", each Text.Contains([Information.1],[Information])),
    //A new column Custom.1 is added, checking if Information.1 contains the original Information value.
    #"Filtered Rows2" = Table.SelectRows(#"Added Custom2", each ([Custom.1] = true)),
    //Rows where Custom.1 is true are kept.
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows2",{"Information"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Filtered Rows3" = Table.SelectRows(#"Removed Duplicates", each ([Custom] <> null)),
    //Rows where Custom is not null are kept.
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows3",{"Custom.1"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Custom]), "Custom", "Information.1")
    //The table is pivoted, using the distinct values in the Custom column as new columns, and the Information.1 values as the data.
in
    #"Pivoted Column"

Current table:

vstephenmsft_1-1722567070725.png

Final result:

vstephenmsft_0-1722567063190.png

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

View solution in original post

3 REPLIES 3
Luzadriana255
Helper II
Helper II

Thank you so much!
v-stephen-msft
Community Support
Community Support

Hi @Luzadriana255 ,

 

@Shravan133 's reply is great and vaild.

Here's my solution with .pbix file attached.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bZC7DoIwFIZfpemshFKssOngJTGOLhKGqgck9JKUMvhubr6YiPFg0K39/vT/Tk+WUUEnlOaT9+FgPDgDVw2GMKTLk4dKtaYkHNn2cS9BNd4FRKRId1Z3T710SEQYc07WTpq6aJ3vefJRJiNl9BsfOzOZ4VWEnH3VEanJXlYG87oOykWpZaWCs9U9Tj916cjGka7UCbqvyKYBEieIO1mUjmbHNhaO6uIBDwsTA9y0XkHrXyubTxkbgn+aPqhMYRc3ebU2uADN8yc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Identifikator = _t, Information = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Identifikator", Int64.Type}, {"Information", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Information] <> "")),
    //Rows where the Information column is empty are filtered out.
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each not Text.Contains([Information], "@")),
    //Rows where the Information column contains “@” are filtered out.
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each if Text.Contains([Information], "Zeil ") or Text.Contains([Information], "Hügelstr") or Text.Contains([Information], "Elbestrasse") or Text.Contains([Information], "Gutleutstr") then "Address"

else if Text.Contains([Information], "603") or Text.Contains([Information], "604") then "ZIP Code"

else if Text.Contains([Information], "Unternehmen") or Text.Contains([Information], "Abteilung") then "Company"

else null),
//A new column Custom is added based on the content of the Information column:
//If Information contains certain street names, it is labeled as “Address”.
//If Information contains certain ZIP codes, it is labeled as “ZIP Code”.
//If Information contains certain company-related terms, it is labeled as “Company”.
//Otherwise, it is labeled as null.
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Table.Group(#"Added Custom", {"Custom","Identifikator"}, {{"Information", each Text.Combine([Information], "-"), type text}})),
    //Another custom column is added, grouping the table by Custom and Identifikator, and combining the Information values into a single string separated by “-”.
    #"Expanded Custom.1" = Table.ExpandTableColumn(#"Added Custom1", "Custom.1", {"Information"}, {"Information.1"}),
    //The grouped table is expanded back into the main table, adding a new column Information.1.
    #"Added Custom2" = Table.AddColumn(#"Expanded Custom.1", "Custom.1", each Text.Contains([Information.1],[Information])),
    //A new column Custom.1 is added, checking if Information.1 contains the original Information value.
    #"Filtered Rows2" = Table.SelectRows(#"Added Custom2", each ([Custom.1] = true)),
    //Rows where Custom.1 is true are kept.
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows2",{"Information"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Filtered Rows3" = Table.SelectRows(#"Removed Duplicates", each ([Custom] <> null)),
    //Rows where Custom is not null are kept.
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows3",{"Custom.1"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Custom]), "Custom", "Information.1")
    //The table is pivoted, using the distinct values in the Custom column as new columns, and the Information.1 values as the data.
in
    #"Pivoted Column"

Current table:

vstephenmsft_1-1722567070725.png

Final result:

vstephenmsft_0-1722567063190.png

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Shravan133
Super User
Super User

To transform your unstructured data into a more organized format using Power Query, follow these steps:

  1. Load the Data into Power Query
  1. Open Power BI Desktop or Excel.
  2. Go to the Data tab and select Get Data.
  3. Choose the appropriate data source (e.g., Excel, CSV) and load your current table into Power Query.
  1. Filter Out Emails
  1. In Power Query, select the column that contains the Information data.
  2. Use the Filter dropdown in the column header to exclude rows containing "@".
  1. Add Custom Columns for Data Categorization
  1. Go to the Add Column tab and choose Custom Column.
  2. Add a custom column to identify whether a row contains company information, address, or ZIP code.

Example Custom Column Formula:

if Text.Contains([Information], "Str.") or Text.Contains([Information], "Hauptstraße") then "Address"

else if Text.Contains([Information], "603") or Text.Contains([Information], "604") then "ZIP Code"

else if Text.Contains([Information], "Unternehmen") or Text.Contains([Information], "Firma") then "Company"

else null

  1. Pivot the Data to Reshape It
  1. Select the Identifier column.
  2. Go to the Transform tab and choose Pivot Column.
  3. For the Values Column, use the newly created custom column (e.g., Company, Address, ZIP Code).
  1. Combine Multiple Rows into Single Cells
  1. After pivoting, you will have multiple columns for company information and other attributes. For combining multiple rows into single cells (e.g., multiple company rows):
    • Go to Group By under the Home tab.
    • Group by Identifikator, and use the All Rows operation or custom aggregation to concatenate values.

Example of concatenating company information:

Text.Combine([Information], " - ")

  1. Clean Up and Finalize the Table
  1. Remove any unnecessary columns.
  2. Rename columns as needed (e.g., rename concatenated columns to Company, Address, ZIP Code).
  3. Filter out any remaining null or unwanted rows.
  1. Load Data Back to Excel or Power BI
  1. Close & Load the data back into Excel or Power BI.

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code FABINSIDER for a $400 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.