Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Syndicate_Admin
Administrator
Administrator

Problem performing Text.Split to columns that are formatted (Data1; Data2; Data3; etc)

Good morning dear,

It is my first post in the Power BI forum since I am starting with this, I hope to receive your help to strengthen my learning and knowledge, very grateful in advance. The doubt is mainly in Power Query, it is not a . PBIX I have it in Excel first but I think I can't share the file because it's data from the company I'm in.

I have 3 columns, EVT_INV, EVT_INS, EVT_ASI that refer to Invited, Registered and Attended Events. Of these columns EVT_INV has much more content than the others, because not everyone signs up or attends. My problem is that to test, I divided only the column EVT_INV that would correspond to the number of guests, which gives me 79256 rows, but when applying the Split to the others and filtering them with the required, I have 77094 rows left, that is, I lost 2000 records approx.

I would like to specify that for the columns EVT_INS, EVT_ASI when they do not "fit" with the information of EVT_INV and are null, I replaced them with "Does not specify" since that information still serves me. If in the filter I remove those that have "Does not specify" I would lose data from the column EVT_INV corresponding to the guests so I filter the codes that I require and also add "Does not specify". If anyone is kind enough to have a few minutes to make maybe a call or something to explain, I would be enormously grateful and willing :D!

Thank you very much in advance, please excuse the inconvenience and hope you understand my problem.

1 ACCEPTED SOLUTION

Hello!

First, thank you very much for taking the time to try to fix my problem, but I found another solution that suited my needs. This consists of the following steps:

    #"EVT_INV_SPL agregada" = Table.AddColumn(#"null a NE EVT_ASI", "EVT_INV_SPL", each Text.Split([EVT_INV], "; ")),
    #"EVT_INS_SPL agregada" = Table.AddColumn(#"EVT_INV_SPL agregada", "EVT_INS_SPL", each Text.Split([EVT_INS], "; ")),
    #"EVT_ASI_SPL agregada" = Table.AddColumn(#"EVT_INS_SPL agregada", "EVT_ASI_SPL", each Text.Split([EVT_ASI], "; ")),
    #"Se expandió EVT_INV_SPL" = Table.ExpandListColumn(#"EVT_ASI_SPL agregada", "EVT_INV_SPL"),
    #"Filas filtradas" = Table.SelectRows(#"Se expandió EVT_INV_SPL", each ([EVT_INV_SPL] = "QV1" or [EVT_INV_SPL] = "QV2" or [EVT_INV_SPL] = "QV3" or [EVT_INV_SPL] = "QV4")),
    #"Inscrito (Si-No)" = Table.AddColumn(#"Filas filtradas", "Inscrito (Si-No)", each if List.Contains([EVT_INS_SPL], [EVT_INV_SPL]) then "Si"
else "No"),
    #"Asistente (Si-No)" = Table.AddColumn(#"Inscrito (Si-No)", "Asistente (Si-No)", each if List.Contains([EVT_ASI_SPL], [EVT_INV_SPL]) then "Si"
else "No")

Which is based on using Text.Split() in the three columns (EVT_INV, EVT_INS and EVT_ASI) and then expanding only EVT_INV so I can have the contacts who were invited to the events with "QV" codes, then with List.Contains() I ask if a record in the EVT_INS list and EVT_ASI matches the EVT_INV code and if it matches then it is "Yes" , otherwise "No".

Again, thank you so much for taking the time to respond.

Best regards.

View solution in original post

9 REPLIES 9
lbendlin
Super User
Super User

Yes, but that should not be a surprise to you. Effectively you are multiplying all split values in EVT_INV by all split values in EVT_INS by all split values in EVT_ASI - unless you can provide a business logic that only picks individual combinations like the QV filter you mentioned.  Should that be applied to all three splits?

That's the problem I have, I'm starting with Power Query and Power BI and I have no knowledge of whether it is possible to split the 3 columns and leave only the information that is related to the "QV" filter, without the problem of repeating the values of EVT_INV. Create a column with a result If - No, that will query whether EVT_INS or EVT_ASI had the same code as EVT_INV but filtering by If it leaves me only the matches as expected and removes the other codes associated with Invitations that do not have Enrollments or that have not been attended.

Anonymous
Not applicable

Hi @Syndicate_Admin ,

Please try to apply the below codes in your Advanced Editor and check whether it can get what you want.

let
    Source = Excel.Workbook(File.Contents("\*\Ejemplo columnas con SPLIT (EVT_INV, EVT_INS, EVT_ASI).xlsx"), null, true),
    TablaContactos_Table = Source{[Item="TablaContactos",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(TablaContactos_Table,{{"Nombre Completo", type text}, {"Correo", type text}, {"Empresa", type text}, {"Tipo de Cargo (C)", type text}, {"País (C)", type text}, {"EVT_INV", type text}, {"EVT_INS", type text}, {"EVT_ASI", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Contain QV?", each if Text.Contains([EVT_INV], "QV") or Text.Contains([EVT_INS], "QV") or Text.Contains([EVT_ASI], "QV") then "Y" else "N"),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each true)
in
    #"Filtered Rows"

yingyinr_0-1628159396497.png

Best Regards

Hello!

First, thank you very much for taking the time to try to fix my problem, but I found another solution that suited my needs. This consists of the following steps:

    #"EVT_INV_SPL agregada" = Table.AddColumn(#"null a NE EVT_ASI", "EVT_INV_SPL", each Text.Split([EVT_INV], "; ")),
    #"EVT_INS_SPL agregada" = Table.AddColumn(#"EVT_INV_SPL agregada", "EVT_INS_SPL", each Text.Split([EVT_INS], "; ")),
    #"EVT_ASI_SPL agregada" = Table.AddColumn(#"EVT_INS_SPL agregada", "EVT_ASI_SPL", each Text.Split([EVT_ASI], "; ")),
    #"Se expandió EVT_INV_SPL" = Table.ExpandListColumn(#"EVT_ASI_SPL agregada", "EVT_INV_SPL"),
    #"Filas filtradas" = Table.SelectRows(#"Se expandió EVT_INV_SPL", each ([EVT_INV_SPL] = "QV1" or [EVT_INV_SPL] = "QV2" or [EVT_INV_SPL] = "QV3" or [EVT_INV_SPL] = "QV4")),
    #"Inscrito (Si-No)" = Table.AddColumn(#"Filas filtradas", "Inscrito (Si-No)", each if List.Contains([EVT_INS_SPL], [EVT_INV_SPL]) then "Si"
else "No"),
    #"Asistente (Si-No)" = Table.AddColumn(#"Inscrito (Si-No)", "Asistente (Si-No)", each if List.Contains([EVT_ASI_SPL], [EVT_INV_SPL]) then "Si"
else "No")

Which is based on using Text.Split() in the three columns (EVT_INV, EVT_INS and EVT_ASI) and then expanding only EVT_INV so I can have the contacts who were invited to the events with "QV" codes, then with List.Contains() I ask if a record in the EVT_INS list and EVT_ASI matches the EVT_INV code and if it matches then it is "Yes" , otherwise "No".

Again, thank you so much for taking the time to respond.

Best regards.

Anonymous
Not applicable

Hi @Syndicate_Admin ,

It's glad to hear that you got the solution for your problem. And thanks for sharing the solution here. Could you please mark your post as Answered? It will help the others in the community find the solution easily if they face the same problem with yours. Thank you.

Best Regards

lbendlin
Super User
Super User

 

let
    Source = Excel.Workbook(File.Contents("Ejemplo columnas con SPLIT (EVT_INV, EVT_INS, EVT_ASI).xlsx"), null, true),
    TablaContactos_Table = Source{[Item="TablaContactos",Kind="Table"]}[Data],
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(TablaContactos_Table, {{"EVT_INV", Splitter.SplitTextByDelimiter("; ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "EVT_INV"),
    #"Added Index" = Table.AddIndexColumn(#"Split Column by Delimiter", "Index", 0, 1, Int64.Type)
in
    #"Added Index"

 

 

Go to Power Query, Get Data...Blank Query... Advanced Editor  and then paste this code in.  Adjust the path to the Excel file accordingly. I am counting 88150 rows, by the way. So your transforms must add filters.

You also seem to have some inconsistency in your sample data, for example people attended events they were not invited to etc.

Indeed, I add the filters because I require only the information of the EVT columns that contain the QV code and mainly there the problem is generated, by generating the split in the EVT_INS column and EVT_ASI since EVT_INV begins to double 😞

lbendlin
Super User
Super User

To avoid this kind of data loss/aggregation you may want to consider to add an index column to your data (at the beginning of your transform steps) , to make sure each row is unique.

Thank you very much for the answer, could you be so kind as to guide me a little bit with the idea?

I gave myself the job of creating an imitation of the DataSet that I am using, the important columns such as EVT_INV, EVT_INS and EVT_ASI are the same as the original and there are some steps taken in which I was comparing the results after performing the split to the other columns. Here is the link to the file.

Thank you very much in advance and apologies for the inconvenience.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors