The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello,
I have a table {[ID], [au_aff]} and a list of countries (from worldbankData). The column [au_aff] contains some institutes with many details, all of the details are separated by comma. In this [au_aff] could be more than one institute, and even the institutes are separated by comma. The only available separator could be the name of the country of the institute.
I have to split the institutes. I could manage to match the list of countries, but some institutes could be from the same country. So the code finds the first, but stops there.
Could you help me with any idea to solve and an M code?
Thanks
Solved! Go to Solution.
Hi!
The problem seemed to be so complex, and there were 14K records. A kind of solution was found, some python code was translated into M. Thank you all for the time and energy to find solutions.
Regards
Dori
Hi @vizdor
Thankyou for using Microsoft Community Forum!
Thankyou @wini_R and @Akash_Varuna for your reponse regarding the query.
Has your issue been resolved?If the response provided by the community member addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.
If yes, kindly accept the useful reply as a solution and give us Kudos. It would be appreciated.
Thank you for your understanding!
Hi!
The problem seemed to be so complex, and there were 14K records. A kind of solution was found, some python code was translated into M. Thank you all for the time and energy to find solutions.
Regards
Dori
HI @vizdor,
I'm glad to hear that you found a solution and resolved the query! Please consider sharing your solution and marking your reply as the accepted answer to help others in the community find it easily.
Thankyou for connecting with Microsoft Community Forum!
Hi @vizdor,
we haven't heard back from you regarding our last response from community member and wanted to check if your issue has been resolved.
If our response addressed by the community member for your query, please mark it as Accept Answer and give Kudos if you found it helpful.
Should you have any further questions, feel free to reach out.
Thank you for being a part of the Microsoft Fabric Community Forum!
Hi @vizdor,
Has your issue been resolved?If the response provided by the community member addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.
If yes, kindly accept the useful reply as a solution and give us Kudos. It would be appreciated.
Thank you for your understanding!
Hi @vizdor,
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.
Thanks for using Microsoft Fabric Community Forum.
Hey @vizdor,
Can you please provide a sample of your data in text format as well as the expected outcome?
Hello,
We have managed to create a kind find of solution to split long text in a table of 80K rows and also have a table with a list from World Bank with all possible countries in the world. Institutes in cells are separated mostly with comma, but sometimes only with 'and'. The perfect solution would lead to a possibility to count the institutes in the cell, or better to split the institutes by a delimiter, f.e. | (pipe)
The examples:
example1: two institute, both from the same country
Sántha G., Faculty of Health Sciences, Doctoral School of Health Sciences, University of Pécs, Pécs, Hungary, 'Juhász Gyula' Faculty of Education, University of Szeged, Szeged, Hungary
example2: two institutes, one of them is from a country wich name contains two words (United States)
Longobardi S., Department of Internal Medicine, HCA Healthcare, USF Morsani College of Medicine GME, HCA Florida Blake Hospital, Bradenton, FL, United States, Department of Internal Medicine and Hematology, Semmelweis University Alumnus, Budapest, Hungary
example3: three institutes, one is from Gibraltar, which word is also a name of a city and a country, but must be count only once
Demetrovics Z., Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary, Centre of Excellence in Responsible Gaming, University of Gibraltar, Gibraltar, College of Education, Psychology and Social Work, Flinders University, Adelaide, SA, Australia
example4: 3 institutes, two of them from a country which name's contains the word: 'and'
Đogo M., University of East Sarajevo, Faculty of Economics Pale, Bosnia and Herzegovina; Gligorić D., University of Banja Luka, Faculty of Economics, Bosnia and Herzegovina; Berecz M., Ministry of Foreign Affairs of Hungary, Budapest, Hungary
example5: two institutes separated by 'and'
Kiss L., Department of Organic and Medicinal Chemistry, Faculty of Pharmacy, University of Pécs, Honvéd Street 1, Pécs, H-7624, Hungary and János Szentágothai Research Center, University of Pécs, Ifjúság Street 20, Pécs, H-7624, Hungary
Thanks for trying!
Hi @vizdor,
The only logic I can see based on your examples is counting occurrences or splitting the text by this dynamic substring: ", [countryName]", i.e.
You can transform your list of countries to this format:
and then use it to replace the string with some delimiter, i.e. "|". Final step should be splitting the text or just counting the occurrences:
To test the solution use queries below:
countries table
// countries
let
Source = List.Transform ({"Hungary", "United States", "Gibraltar", "Australia", "Bosnia and Herzegovina"}, each ", " & _)
in
Source
// tbl4
let
//sample data
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fVTLUttAEPyVKZ8VilCpcMjJb0PsiitKKlUhHIbVIA+sdly7KyfmB3LNL3DkBzjkKj4sI9mADXZO0m5punt6enR21kqrWxdnCMODBAZoShuXIJcwIrRxBqlhcoZCAj0xUTxavZqJ2J3ffHW8IB94BTGt7oxerh+j0uXolwn8KA8Pj45Py1l1G25guCwtrq426ftZaTCyuJeg6Q3llCVPzzVs6zw5a43F5XKBPmNItZsezdHHglysC09cJO9U/4QyNuxIa7vtdQ8GvZ6/pgOYiA/oGLpirTLUlY8FMJz0V0UDK54zhI7Fa4KRhDlHtAl0PGZKV6sejBvlkTJII8bGwf/rAXSZyikwipVcjUqpKMj+JA6bHrRtWbhS4TplhnMKcduDHhUUvSzYBPiuJpy4EDmWselkGpY6vBV6f/ylD/3qPi6q+wBj8ZqDbINoB0ECXRXvG6j+L0PqkM4d2MFnCnNxgS+suoQFu/zl3IZ8oeGJ6JPN1w2XNyb+LLPxJBXD6tM38dfqq2WXKeyW0nZGFjnTEaZtPZUhKgFjY8jDH8kFJgcvBfUxREjR4xUtZCv6fSNOitrAKVrF7EhwjOvxeM2duuvwAwwt55qDh9/Qe4XeQXeFMC6vcTf0ftQOeTI3jeAJO9ZWmsqBeOLcQfvyElnbr9fvcSq7k/CRg8711R588rnm2zTE6+ipud0ZFQ3XltzpDH2BZrlns0fiFtVdHXBPFOHt866/OX5/9O5JTsN1qvmSUO+ti9VtLvrP4To3hN7MmmCR38NzcnlV/Q1a9Mh0dLiPqnV+/g8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [text = _t]),
//transformations start here
ReplaceCountries = List.Accumulate(List.Buffer(countries), Source, (current, state) => Table.TransformColumns(current, {{"text", each Text.Replace(_, state, "|") }}) ),
#"Added count" = Table.AddColumn(ReplaceCountries, "count", each List.Count(Text.PositionOf([text], "|", Occurrence.All)))
in
#"Added count"
Hi @vizdor,
Thank you for providing the information. However, we are unable to reproduce the scenario based on the data you have provided.
We kindly request you to share additional details or clarification so that we can accurately reproduce the issue and provide a solution that meets your requirements.
Thank you for your Paitence and Understanding.
Hello,
Unfortunatelly this solution gave me an error "Expression.Error: We cannot convert a value of type List to type Text.
Details:
Value=[List]
Type=[Type]"
this part of the code was already tested: Text.SplitAny(Text.Lower([au_aff]), CountriesList)
but always gave me this error.
Hi @vizdor Could you try this please
let
Source = YourTable,
// Create a list of countries
CountriesList = List.Transform(WorldBankCountries[Country], Text.Lower),
// Split the [au_aff] column by country names
SplitInstitutes = Table.AddColumn(Source, "SplitInstitutes", each
Text.SplitAny(Text.Lower([au_aff]), CountriesList)
),
ExpandedTable = Table.ExpandListColumn(SplitInstitutes, "SplitInstitutes"),
CleanedTable = Table.TransformColumns(ExpandedTable, {{"SplitInstitutes", Text.Trim}})
in
CleanedTable