The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hello. I'm new in using Power Query and I'm trying to make a query but I'm strugling to make it work. As logic differs from usual language (ie: Python), I have difficulty ti make it work properly.
If you could help, it will be great.
Below is an explanation of what I try to do.
I have a table, containing several columns. One of them is "Job Title". What I want is to classify the various "Job Title" (as it is a free text field, almost all values are different) according to a Dictionary of keywords. Each keyword is in a 2nd table with keyword and associated classification.
Table 1: Job Title
Job Title |
gestion de projet |
Comptable |
Acheteur |
y.Assistant(e) |
Dir/Resp Informatique |
Commercial |
y.Autre Fct Achat/Serv. Gen. |
Dir/Resp Admin. Et Financier |
Gérant/Co Gérant |
Dir/Resp Achat |
Directeur Général |
Directeur/Sous-Directeur |
Directeur general |
Dir/Resp Système Information |
Gerant/Co Gerant |
à repréciser |
y.Autre Fonction Adm et Financ |
President |
Dirigeant/Gérant |
Dir/Responsable Télécom |
IT manager |
Table 2: Dictionary
Keyword | Classification |
projet | gestion de projet |
project | gestion de projet |
prj | gestion de projet |
consultant | gestion de projet |
assistant | administratif |
comptable | administratif |
acheteur | administratif |
finance | administratif |
responsable | responsable / manager |
dir | directeur / direction |
resp | responsable / manager |
directeur | directeur / direction |
gérant | directeur / direction |
manager | responsable / manager |
dsi | directeur / direction |
ceo | directeur / direction |
cfo | directeur / direction |
chef | directeur / direction |
dg | directeur / direction |
rssi | directeur / direction |
head | responsable / manager |
snr | responsable / manager |
senior | responsable / manager |
Objective is to perform a classification based on Keyword if "Job title" contains the keyword. A kind of Table.NestedJoin but instead of getting and exact match, we are looking for a contains. And once we got at least a match, we take the corresponding classification. We stop at first match as dictionary is set in a proper order. Please note that we are not case sensitive in our match.
So results we are expecting in our example should be :
Job Title | Classification |
Chef de projet | gestion de projet |
Comptable | administratif |
Acheteur | administratif |
y.Assistant(e) | administratif |
Dir/Resp Informatique | directeur / direction |
Commercial | administratif |
y.Autre Fct Achat/Serv. Gen. | |
Dir/Resp Admin. Et Financier | directeur / direction |
Gérant/Co Gérant | directeur / direction |
Dir/Resp Achat | directeur / direction |
Directeur Général | directeur / direction |
Directeur/Sous-Directeur | directeur / direction |
Directeur general | directeur / direction |
Dir/Resp Système Information | directeur / direction |
Gerant/Co Gerant | directeur / direction |
à repréciser | |
y.Autre Fonction Adm et Financ | |
President | directeur / direction |
Dirigeant/Gérant | directeur / direction |
Dir/Responsable Télécom | directeur / direction |
IT manager | responsable / manager |
Job Title ClassificationTo do so, we first put our keywords in a list called "KEYWORD"
Then, we try to compare "job title" string to the list in order to see if we have some match.
So we did a "do while" like to compare each "job title" string with any value of "KEYWORD" list. We tried using List.Generated to generate a new column to do so:
Table.AddColumn(#"previous step", "CHECK RES", each List.Generate(
() => [COUNTER = 0, MAX=List.Count(KEYWORD), res="n.a", char=[Job Title] ] ,
each [COUNTER] < [MAX]+1 ,
each if Text.Contains([char],KEYWORD{[COUNTER]})
then [ res= KEYWORD{[COUNTER]}, COUNTER=[COUNTER]+1, MAX=[MAX], char=[char] ]
else [ res= "n.a", COUNTER=[COUNTER]+1, MAX=[MAX], char=[char] ],
each [res] ))
As a result, we got in each row a list containing either a "n.a" for each comparison with our list that failed, and the corresponding keyword for the matching test.
So we got something like:
List = {"n.a", ""n.a", "n.a", "n.a", "n.a", "n.a", "compatble","n.a", …}
Job Title | CHECK RES |
Chef de projet | [List] |
Comptable | [List] |
Acheteur | [List] |
y.Assistant(e) | [List] |
Dir/Resp Informatique | [List] |
Commercial | [List] |
y.Autre Fct Achat/Serv. Gen. | [List] |
Dir/Resp Admin. Et Financier | [List] |
Gérant/Co Gérant | [List] |
Dir/Resp Achat | [List] |
Directeur Général | [List] |
If we look at [List] in row2, we have something like : List = {"n.a", ""n.a", "n.a", "n.a", "n.a", "n.a", "compatble","n.a", "n.a", …}
So it seems to work.
But then we are stucked at next step. We tried then to extract matching value with KEYWORD by using feature such as List.Intersect and then to extract first value but we don't succeed to make the List.Intersect work.
= Table.AddColumn(#"Check Responsability", "Intersect", each List.Intersect({[CHECK RES],KEYWORD}))
So could you help here?
• What is wrong with our List.Intersect? It looks like it doesn't recognized our "cell row" [CHECK RES] as a list. But how to make it then?
• Is the way we are doing is the right one or is there easier way to make it?
Thanks in advance for your advices and sorry for this long post.
Solved! Go to Solution.
Hello @MathMar
I think List.Generate here is an overkill. I tried to use a approach with Table.SelectRows. Here an example
let
JobTitle =
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bVFLTsMwEL3KKCuQIDlDVGjVHSLdRV0Y5xGM4nEYO0i9DUtyjlyMOCEprdh5Pu83Lsukhg/GMVWgVtw7QnK8K5ONs21QLw2mKtdvCOhkKk5p7r3xQXG4we3UejCSPcO3tOdXJ1YF89Fh4bEQbVSzYLsgoK0ONJKqkBWQz5R24PSSKa+s4ZQeA20NK9YGs/pu6GVUzjaOfp9XuMi6tKCj67jIcbe57GeF6/z9Wl6BajD+QGb24uTD8G1xDup4toXVFVZTwxcJWhl6bTzk8gCO9XT2MSdhCTmtPAm8qXAOZmpE7v/iOvbxk+gw9M0o4+w03R/IKlZ11Dz+AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Job Title" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Job Title", type text}})
in
#"Changed Type",
Dictionary =
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZDBDcMgDEVXQTlH6kBRDi4Y4igxESZDdY4uVlpo1UPiXBDwvr/tPwzdluKMueu7gJIpsnFo2t/YN2w1Pp8yG1n2JQOfl4MISVOAW4nLK0Em3wzWLcN9wUMKdsKMezqEnhjYHhcmlK2M1oz/XuZmVmAImD46R2/vcpYASp9C672s8fO5Mqilqk14PlJN4FzzdVW7CakeFqPO/QWf0KsCF/S45GLACcGpGwrrCQgyRUUyvgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Keyword = _t, Classification = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Keyword", type text}, {"Classification", type text}})
in
#"Changed Type",
BufferDictionary = Table.Buffer(Dictionary),
AddDictionaryToJobTitle = Table.AddColumn
(
JobTitle,
"Dictionary",
(add)=> Table.SelectRows
(
BufferDictionary,
each Text.Contains(Text.Lower(add[Job Title]),Text.Lower([Keyword]))
)
),
#"Added Custom" = Table.AddColumn(AddDictionaryToJobTitle, "Classification", each try [Dictionary][Classification]{0} otherwise null)
in
#"Added Custom"
Be aware that the last AddColumn is not needed, as you could do this job already in the step before. This is only to show you what table you are getting with the Table.SelectRows. There you can see that in some cases more then one row is found (the selectrows is checking if the keyword is somehow included in the Job Title)
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Thanks again.
I correctly writted it. But it is not recognized.
Expression.Error : Le nom « Table.FuzzyNestedJoin » n'a pas été reconnu. Veuillez vérifier qu'il est correctement orthographié.
I also saw in some screenshot some Fuzzy option when trying to Nest table via the menu. I don’t got these options on my version. So it looks like depending on version of Power Query, some instructions are missing.
Regards,
Thanks for your suggestion related to write my last instruction in a different way: L_Keyword_Responsability{List.PositionOfAny([CHECK RES],List.Buffer(L_Keyword_Responsability))-1}
I succeed to make what I want by then using a Table.NestedJoin until I got latest version of Office to be able to use directly Table.FuzzyNestedJoin.
Until we succeed to get from our IT proper version, we would like to use what we built. However, on very large file (>200K records), now request is taking ages. We tried to use Table.Buffer and List.Buffer to speed things but still very very low.
Do you have suggestion to improve our code here to speed up things?
Here is our full code:
let
Source = Csv.Document(File.Contents("F:\all contact.csv"),[Delimiter=",", Columns=41, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
#"En-têtes promus" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"move Job Title to Lower case" = Table.TransformColumns(#"En-têtes promus",{{"Job Title", Text.Lower, type text}}),
#"Check match with Responsability Matrix" = Table.AddColumn(#"move Job Title to Lower case", "CHECK RES", each List.Generate(
() => [COUNTER = 0, MAX=List.Count(L_Keyword_Responsability), res="n.a", char=[Job Title] ] ,
each ([COUNTER] < [MAX]+1) ,
each if Text.Contains([char],L_Keyword_Responsability{[COUNTER]})
then [ inter=List.Buffer(L_Keyword_Responsability) ,res= inter{[COUNTER]}, COUNTER=[COUNTER]+1, MAX=[MAX], char=[char] ]
else [ res= "n.a", COUNTER=[COUNTER]+1, MAX=[MAX], char=[char] ],
each [res] )),
#"Get corresponding value in the table" = Table.AddColumn(#"Check match with Responsability Matrix", "Intersect", each L_Keyword_Responsability{List.PositionOfAny([CHECK RES],List.Buffer(L_Keyword_Responsability))-1}),
#"Erreurs remplacées" = Table.ReplaceErrorValues(#"Get corresponding value in the table", {{"Intersect", "vide"}}),
#"croisement matrice Responsabilité" = Table.NestedJoin(#"Erreurs remplacées",{"Intersect"},Table.Buffer(Matrix_Responsability),{"Mot clé autour de la responsabilité"},"Matrix_Responsability",JoinKind.LeftOuter),
#"Récupération responsabilité estimée" = Table.ExpandTableColumn(#"croisement matrice Responsabilité", "Matrix_Responsability", {"résponsabilité résumée"}, {"résponsabilité résumée"})
in
#"Récupération responsabilité estimée"
Hello @MathMar
I think List.Generate here is an overkill. I tried to use a approach with Table.SelectRows. Here an example
let
JobTitle =
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bVFLTsMwEL3KKCuQIDlDVGjVHSLdRV0Y5xGM4nEYO0i9DUtyjlyMOCEprdh5Pu83Lsukhg/GMVWgVtw7QnK8K5ONs21QLw2mKtdvCOhkKk5p7r3xQXG4we3UejCSPcO3tOdXJ1YF89Fh4bEQbVSzYLsgoK0ONJKqkBWQz5R24PSSKa+s4ZQeA20NK9YGs/pu6GVUzjaOfp9XuMi6tKCj67jIcbe57GeF6/z9Wl6BajD+QGb24uTD8G1xDup4toXVFVZTwxcJWhl6bTzk8gCO9XT2MSdhCTmtPAm8qXAOZmpE7v/iOvbxk+gw9M0o4+w03R/IKlZ11Dz+AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Job Title" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Job Title", type text}})
in
#"Changed Type",
Dictionary =
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZDBDcMgDEVXQTlH6kBRDi4Y4igxESZDdY4uVlpo1UPiXBDwvr/tPwzdluKMueu7gJIpsnFo2t/YN2w1Pp8yG1n2JQOfl4MISVOAW4nLK0Em3wzWLcN9wUMKdsKMezqEnhjYHhcmlK2M1oz/XuZmVmAImD46R2/vcpYASp9C672s8fO5Mqilqk14PlJN4FzzdVW7CakeFqPO/QWf0KsCF/S45GLACcGpGwrrCQgyRUUyvgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Keyword = _t, Classification = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Keyword", type text}, {"Classification", type text}})
in
#"Changed Type",
BufferDictionary = Table.Buffer(Dictionary),
AddDictionaryToJobTitle = Table.AddColumn
(
JobTitle,
"Dictionary",
(add)=> Table.SelectRows
(
BufferDictionary,
each Text.Contains(Text.Lower(add[Job Title]),Text.Lower([Keyword]))
)
),
#"Added Custom" = Table.AddColumn(AddDictionaryToJobTitle, "Classification", each try [Dictionary][Classification]{0} otherwise null)
in
#"Added Custom"
Be aware that the last AddColumn is not needed, as you could do this job already in the step before. This is only to show you what table you are getting with the Table.SelectRows. There you can see that in some cases more then one row is found (the selectrows is checking if the keyword is somehow included in the Job Title)
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hello @MathMar
use Tabel.FuzzyNestedJoind to join both tables. Use the treshholp option to find more similarities.
In your solution a code in a new column has to look like this
= Table.AddColumn(#"Check Responsability", "Intersect", each [KEYWORD]{List.PositionOf([CHECK RES],"compatble")})
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hello, thank you for your prompt reply. I wasn’t aware about Fuzzy option.
However, I have the impression that my Power Query version hasn’t this instruction as I got a message error stating that it is not correctly written. Is Fuzzy feature linked to a specific version of Power Query?
Current version I used for Power Query is : Version : 2.59.5135.201 32 bits
Regards,
Hello @MathMar
the correct spelling is this
I don't know when exactly was added. So try to update to the latest version and it should be available
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Thanks again.
I correctly writted it. But it is not recognized.
Expression.Error : Le nom « Table.FuzzyNestedJoin » n'a pas été reconnu. Veuillez vérifier qu'il est correctement orthographié.
I also saw in some screenshot some Fuzzy option when trying to Nest table via the menu. I don’t got these options on my version. So it looks like depending on version of Power Query, some instructions are missing.
Regards,