Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
vizdor
Frequent Visitor

split long string by a varies of string delimiter

Hello, 

I have a table {[ID], [au_aff]} and a list of countries (from worldbankData). The column [au_aff] contains some institutes with many details, all of the details are separated by comma. In this [au_aff] could be more than one institute, and even the institutes are separated by comma. The only available separator could be the name of the country of the institute. 

I have to split the institutes. I could manage to match the list of countries, but some institutes could be from the same country. So the code finds the first, but stops there.

Could you help me with any idea to solve and an M code?

 

Thanks

1 ACCEPTED SOLUTION

Hi!

 

The problem seemed to be so complex, and there were 14K records. A kind of solution was found, some python code was translated into M. Thank you all for the time and energy to find solutions. 

 

Regards

Dori

View solution in original post

12 REPLIES 12
v-sgandrathi
Community Support
Community Support

Hi @vizdor

Thankyou for using Microsoft Community Forum!

 

Thankyou @wini_R and @Akash_Varuna for your reponse regarding the query.

Has your issue been resolved?If the response provided by the community member addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.

If yes, kindly accept the useful reply as a solution and give us Kudos. It would be appreciated.

Thank you for your understanding!

Hi!

 

The problem seemed to be so complex, and there were 14K records. A kind of solution was found, some python code was translated into M. Thank you all for the time and energy to find solutions. 

 

Regards

Dori

HI @vizdor,

 

I'm glad to hear that you found a solution and resolved the query! Please consider sharing your solution and marking your reply as the accepted answer to help others in the community find it easily.

 

Thankyou for connecting with Microsoft Community Forum!

Hi @vizdor,


we haven't heard back from you regarding our last response from community member and wanted to check if your issue has been resolved.

If our response addressed by the community member for  your query, please mark it as Accept Answer and give Kudos if you found it helpful.

Should you have any further questions, feel free to reach out.


Thank you for being a part of the Microsoft Fabric Community Forum!

Hi @vizdor,

 

Has your issue been resolved?If the response provided by the community member addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.

If yes, kindly accept the useful reply as a solution and give us Kudos. It would be appreciated.

 

Thank you for your understanding!

 

v-sgandrathi
Community Support
Community Support

Hi @vizdor,

 

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.

Thanks for using Microsoft Fabric Community Forum.

wini_R
Solution Supplier
Solution Supplier

Hey @vizdor,

Can you please provide a sample of your data in text format as well as the expected outcome?


How to Get Your Question Answered Quickly 

vizdor
Frequent Visitor

Hello, 

We have managed to create a kind find of solution to split long text in a table of 80K rows and also have a table with a list from World Bank with all possible countries in the world. Institutes in cells are separated mostly with comma, but sometimes only with 'and'. The perfect solution would lead to a possibility to count the institutes in the cell, or better to split the institutes by a delimiter, f.e. | (pipe)

The examples: 

example1: two institute, both from the same country

Sántha G., Faculty of Health Sciences, Doctoral School of Health Sciences, University of Pécs, Pécs, Hungary, 'Juhász Gyula' Faculty of Education, University of Szeged, Szeged, Hungary


example2: two institutes, one of them is from a country wich name contains two words (United States)

Longobardi S., Department of Internal Medicine, HCA Healthcare, USF Morsani College of Medicine GME, HCA Florida Blake Hospital, Bradenton, FL, United States, Department of Internal Medicine and Hematology, Semmelweis University Alumnus, Budapest, Hungary


example3: three institutes, one is from Gibraltar, which word is also a name of a city and a country, but must be count only once

Demetrovics Z., Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary, Centre of Excellence in Responsible Gaming, University of Gibraltar, Gibraltar, College of Education, Psychology and Social Work, Flinders University, Adelaide, SA, Australia


example4: 3 institutes, two of them from a country which name's contains the word: 'and'

Đogo M., University of East Sarajevo, Faculty of Economics Pale, Bosnia and Herzegovina; Gligorić D., University of Banja Luka, Faculty of Economics, Bosnia and Herzegovina; Berecz M., Ministry of Foreign Affairs of Hungary, Budapest, Hungary

 

example5: two institutes separated by 'and'

Kiss L., Department of Organic and Medicinal Chemistry, Faculty of Pharmacy, University of Pécs, Honvéd Street 1, Pécs, H-7624, Hungary and János Szentágothai Research Center, University of Pécs, Ifjúság Street 20, Pécs, H-7624, Hungary

 

Thanks for trying!

wini_R
Solution Supplier
Solution Supplier

Hi @vizdor,

 

The only logic I can see based on your examples is counting occurrences or splitting the text by this dynamic substring: ", [countryName]", i.e.

wini_R_0-1741675879501.png

 

You can transform your list of countries to this format:

wini_R_1-1741675931742.png

and then use it to replace the string with some delimiter, i.e. "|". Final step should be splitting the text or just counting the occurrences:

wini_R_2-1741676075862.png

 

To test the solution use queries below:

countries table

 

// countries
let
    Source = List.Transform ({"Hungary", "United States", "Gibraltar", "Australia", "Bosnia and Herzegovina"}, each ", " & _)
in
    Source

 

 

 

 

// tbl4
let
//sample data
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fVTLUttAEPyVKZ8VilCpcMjJb0PsiitKKlUhHIbVIA+sdly7KyfmB3LNL3DkBzjkKj4sI9mADXZO0m5punt6enR21kqrWxdnCMODBAZoShuXIJcwIrRxBqlhcoZCAj0xUTxavZqJ2J3ffHW8IB94BTGt7oxerh+j0uXolwn8KA8Pj45Py1l1G25guCwtrq426ftZaTCyuJeg6Q3llCVPzzVs6zw5a43F5XKBPmNItZsezdHHglysC09cJO9U/4QyNuxIa7vtdQ8GvZ6/pgOYiA/oGLpirTLUlY8FMJz0V0UDK54zhI7Fa4KRhDlHtAl0PGZKV6sejBvlkTJII8bGwf/rAXSZyikwipVcjUqpKMj+JA6bHrRtWbhS4TplhnMKcduDHhUUvSzYBPiuJpy4EDmWselkGpY6vBV6f/ylD/3qPi6q+wBj8ZqDbINoB0ECXRXvG6j+L0PqkM4d2MFnCnNxgS+suoQFu/zl3IZ8oeGJ6JPN1w2XNyb+LLPxJBXD6tM38dfqq2WXKeyW0nZGFjnTEaZtPZUhKgFjY8jDH8kFJgcvBfUxREjR4xUtZCv6fSNOitrAKVrF7EhwjOvxeM2duuvwAwwt55qDh9/Qe4XeQXeFMC6vcTf0ftQOeTI3jeAJO9ZWmsqBeOLcQfvyElnbr9fvcSq7k/CRg8711R588rnm2zTE6+ipud0ZFQ3XltzpDH2BZrlns0fiFtVdHXBPFOHt866/OX5/9O5JTsN1qvmSUO+ti9VtLvrP4To3hN7MmmCR38NzcnlV/Q1a9Mh0dLiPqnV+/g8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [text = _t]),

//transformations start here
    ReplaceCountries = List.Accumulate(List.Buffer(countries), Source, (current, state) => Table.TransformColumns(current, {{"text", each Text.Replace(_, state, "|") }}) ),
    #"Added count" = Table.AddColumn(ReplaceCountries, "count", each List.Count(Text.PositionOf([text], "|", Occurrence.All)))
in
    #"Added count"

 

Hi @vizdor,

 

Thank you for providing the information. However, we are unable to reproduce the scenario based on the data you have provided.

We kindly request you to share additional details or clarification so that we can accurately reproduce the issue and provide a solution that meets your requirements.

Thank you for your Paitence and Understanding.

vizdor
Frequent Visitor

Hello, 

Unfortunatelly this solution gave me an error "Expression.Error: We cannot convert a value of type List to type Text.
Details:
Value=[List]
Type=[Type]"

 

this part of the code was already tested: Text.SplitAny(Text.Lower([au_aff]), CountriesList)
but always gave me this error. 

Akash_Varuna
Super User
Super User

Hi @vizdor Could you try this please

let
    Source = YourTable,
    // Create a list of countries
    CountriesList = List.Transform(WorldBankCountries[Country], Text.Lower),
    // Split the [au_aff] column by country names
    SplitInstitutes = Table.AddColumn(Source, "SplitInstitutes", each 
        Text.SplitAny(Text.Lower([au_aff]), CountriesList)
    ),
    ExpandedTable = Table.ExpandListColumn(SplitInstitutes, "SplitInstitutes"),
    CleanedTable = Table.TransformColumns(ExpandedTable, {{"SplitInstitutes", Text.Trim}})
in
    CleanedTable

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Top Kudoed Authors