Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Reply
Anonymous
Not applicable

Wildcard Matching 2024

Hi,

 

First of all, I'm not a BI user, I do support for Excel.

 

Wildcard comparisons in PQ are a real challenge and everything I've found so far are rudimentary solutions or workarounds to the actual problem. And everything was created several years ago.

 

Now I've tried it myself and found a solution that works (for my purposes), but it contains at least one error.

 

The code below supports the wildcards "*", "?" and "#". One problem is when the search pattern contains a "*" before a "#" and I just can't see a solution to the problem. For example, comparing "08erig1" and "*8*#*1" should give a FALSE as result...

 

If anyone has an idea how to fix the error or has a better solution, I'm grateful for any suggestions.
Or if anyone notices any other problems, I'd like to know about them too. 😀

 

EDIT: Code removed, follow up thread here:
https://community.fabric.microsoft.com/t5/Power-Query/Wildcard-Matching-2024-Attempt-2/m-p/4045339#M...

 

Andreas.

 

 

 

 

 

 

 

 

 

 

 

 

 
 

 

 

 
 

 

 

 
 

 

 

18 REPLIES 18
ronrsnfld
Super User
Super User

You can use Regular Expressions. This can be implemented in PQ for Power BI using Python or R, and in Power Query for Excel using a javascript construction as a custom function.

 

Here is an example of a javascript implementation in PQ for Excel to extract a substring. 

 

//see http://www.thebiccountant.com/2018/04/25/regex-in-power-bi-and-power-query-in-excel-with-java-script/
// and https://gist.github.com/Hugoberry/4948d96b45d6799c47b4b9fa1b08eadf

let   fx=(text,regex)=>
    Web.Page(
        "<script>
            var x='"&text&"';
            var y=new RegExp('"&regex&"','g');
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}

in
fx

 

 

 There are other examples using Python or R in PQ for Power BI.

Anonymous
Not applicable

Please don't be mad, I'm not interested in more workarounds.

 

EDIT: And I said I'm an Excel user, your workaround did not work in Excel.

Not mad at all. But I think what you are really looking for is implementation of wild cards in M code. You're doing that with your custom function. The JavaScript implementation is also a custom function using only functions that are available in M

, unlike the python or r implementations which require something external. 

So I don't really see it as any more of a workaround than your custom function.

I agree the use of python or or might be considered a workaround since they require external libraries to be installed. 

To their own, but the JavaScript implementation certainly does work in Excel, I have used it many times.

 

 

 

 

Anonymous
Not applicable

If you look back to my first post, I said my code did not work comparing "08erig1" and "*8*#*1" should give a FALSE as result.

 

You query did not work at all for that:

Unbenannt.PNG

 

In real life I ​​need a wildcard routine that has a very simple syntax because the end user specifies the mask. If I come to my customer with RegEx, he'll kick me out. No normal end user uses that, it's far too complicated to get the correct pattern.

 

* ? and # is pretty simple to use for everyone and that's what I'm after. My code works to filter filenames that contains names and years, but I know it's not perfect, there is a bug that I can't find.

 

I can't use calls to external libraries, I don't know if every end user really has that or if the admin has disabled it for security reasons. I need a solution in MCode, no detours.

 

To my 2nd question: Any hint how to post an MCode with recursive calls here and prevent this forum editor to modify the code?

 

Andreas.

Now that I realize the limitations of the person entering the mask, your post makes more sense. If it were me, I would write a routine that translates the user input into a valid JavaScript regex and then feed that into the regex function I posted. 

So far as your second problem, I don't have a clue.I've yet to figure out what invalid HTML means in that context. Perhaps you could post a link to a text file that you've uploaded someplace. 

  • Unfortunately, I can't work on a conversion routine right now as my computer is having fits leaving unable to run programs from my main drive. I somehow am in the position of having to rebuild a raid 1 array from scratch unless someone comes up with an idea that circumvents that necessity.

 

 

 

Anonymous
Not applicable

I've looked around a bit and a conversion should be possible with a few small replacements.

One difficulty is that the dot in RegEx itself is a wildcard, all the sources I've found say you should use \. instead of . and if I want to ignore the spelling, I should append /i.

Everything else later and thus a simple test:

 

// fnRegEx
let   fx=(text,regex)=>
    Web.Page(
        "<script>
            var x='"&text&"';
            var y=new RegExp('"&regex&"','g');
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}

in
fx

// Test_fnRegEx
// Test_fnRegEx
let
    a = fnRegEx("qweXxls", "qwe.xls"),   // Result: qweXxls => wong
    b = fnRegEx("qweXxls", "qwe\.xls"),  // Result: qweXxls => wrong
    c = fnRegEx("Qwe.xls", "qwe\.xls/i") // Result: null => wrong
in
    c

 

 

 

Doesn't even come close to working. What now?
Does the RegEx that we call with the script use a different syntax??

 

Andreas.

 

Finally got my computer working.

For your three wild cards which I assume as similar meanings as the Like vba operator, something like:

(text,regex)=>

let 
    regList = Text.ToList(regex),
    translate = 
        List.ReplaceMatchingItems(regList, List.Zip({{"*","#","?","."}, {".*","[0-9]",".","\\."}})),
    reg = Text.Combine(translate,""),


    fx = Web.Page(
        "<script>
            var x='"&text&"';
            var y=new RegExp('"& reg &"','g');
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}

in
  fx

Results

fnRegexExtr("08erig1", "*8*#*1") => null
fnRegexExtr("qweXxls", "qwe.xls") => null

 

Anonymous
Not applicable

Hi Ron,

 

thanks for that function and that you take up this challange. If I assume correctly then you know me and you know what you are getting yourself into. 😁

 

You are correct VBA.Like is my proof function. And as I need a true/false result I modified the last line of your function:

EDIT: if fx="null" then false else true

 

If this is wrong, or you have a better solution, tell me. I will make a completely fair comparison.

 

Here is my test file:
https://www.dropbox.com/scl/fi/hnv86prqiewyajra5h0yr/CompareRegEx.xlsm?rlkey=70ge7534tuugp17ottiw2ww... 

 

In C1 is the number of test patterns that are generated if you click the Generate button.

 

The query steps and used VBA codes are exactly the same for both queries to obtain a comparable result (as best we can do). If you have any concerns about how I did it, tell me. I only want to make a fair comparison of both methods, nothing else.

 

In column O we can see there are a lot of cases where your function fails, more then 50% is wrong.

 

Once you have resolved these problems, increase the number of test patterns to 10000, click the Generate button, a blink later they are generated. Click the refresh button above TestCompareWildcards, my function need around a half second.

 

Click the refresh button above TestRegEx... and take a coffee break.

 

To be fair, my function is also still not perfect. Therefore, the comparison is not yet really meaningful. But given these time differences, do you think RegEx is a really viable solution?

 

Andreas.

Yes, my regex function is much slower than yours.

  1. You can shorten the last line of my function to "fx <> null" and it will return the logical
  2. The errors with my function are from two issues, both of which can be fixed simply in the code:
    1. Case sensitivity, handled by an argument in the regex call:
      1. "var y=new RegExp('"& reg &"','i');"
    2. Your function apparently needs the entire word to match, and not just part of the word. So this needs the addition of the boundary tokens to the regex translation.

Having said that, it still takes a lot longer than yours.

 

Some of your function errors occur when the data type in text column is a number. These can be handled most simply by making that column text type.

 

Others I need to look at more closely.

 

So far as which to use, that's up to you. Obviously, if your actual use case involves data such as you present in your example table, a corrected routine of yours would be inarguably better. Under different use cases,  the time difference might be irrelevant.

 

Anonymous
Not applicable

Hi Ron,

 

Thank you for the code change and the tip for my function, I made a few changes, test file updated. I have expanded the generator and included special characters that can appear in file paths.

 

Although the main use is to compare file paths and file names, I would still like to be able to compare unstructured data and thus numbers. I agree with you, this could be avoided by changing the data type. However, if I do it within the function, there are no significant disadvantages, but the function is more universally applicable.

 

At the moment it seems as if I only have to fix one problem with my function. But my experience tells me that I will encounter more.

 

Your routine already works much better, but there seems to be a problem when calling RegEx, the function often fails with an error.

 

Even though I am close to the goal and RegEx is painfully slow, I (and not only me) would appreciate it if you have a solution for this. I am sure that our codes will be used by many followers for a long time to come.

 

Before I wrote this post, I spent a few days researching and having conversations with ChatGPT, Gemini and Copilot... without any really useful results. A wildcard comparison is a real problem and is often requested, but I have not really found any working solutions or code in any high-level languages. There are only partial solutions and a lot of workarounds. RegEx is also often recommended, but there is no RegEx routine that accepts a wildcard pattern anywhere.

 

Thanks again for taking your time into this.

 

Andreas.

You write: "when calling RegEx, the function often fails with an error.".

Using my routine modified as above, I found no errors, even on the 10000 row Data table.

And I had removed your line in your test code that was filtering out the errors.

 

Here is my current function code: I have it set to case-insensitive, but it could be changed so that the case sensitivity is an option

(text,regex)=>

let 
    regList = Text.ToList(regex),
    translate = 
        List.ReplaceMatchingItems(regList, List.Zip({{"*","#","?","."}, {".*","[0-9]",".","\\."}})),
    reg = "\\b" & Text.Combine(translate,"") & "\\b",


    fx = Web.Page(
        "<script>
            var x='"&Text.From(text)&"';
            var y=new RegExp('"& reg &"','i');
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
    fx <> "null"

 

and here is the current "test" code: (with the #"Replaced Errors" step commented out)

let
    Source = Data,
    #"Invoked Custom Function" = Table.Buffer(Table.AddColumn(Source, "fnRegExWildcard", each fnRegexExtr([text], [mask]))),
    //#"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function", {{"fnRegExWildcard", "Error"}}),
    #"Added Custom" = Table.AddColumn(#"Invoked Custom Function", "Correct", each [fnLIKE]=[fnRegExWildcard])
in
    #"Added Custom"

I would appreciate data that produces the errors.

Anonymous
Not applicable

Hi Ron,

I tested your routine on Win10 English (USA) system and get a lot of errors and wrong results.

Andreas_Killer_1-1721663883172.png

 

It seem I got my routine working, I have to test many regular pattern during the next days, fingers crossed.

 

PQ function here:
https://www.dropbox.com/scl/fi/nw4nwwxa4g99dvbm0vl0s/fnCompareWildcards.pq?rlkey=kw8ze79eytk7uzdpm4m... 

 

Test file here:
https://www.dropbox.com/scl/fi/hnv86prqiewyajra5h0yr/CompareRegEx.xlsm?rlkey=70ge7534tuugp17ottiw2ww... 

 

Seems my routine is several 1000% faster then RegEx!

 

Andreas.

Glad you got yours working as it is clearly faster with large data.

 

Not sure what happened to my last response, but the reason for the errors and mismatches in my regex solution is two-fold.

  • There are characters in some of your masks (other than the dot) that have a special meaning in regex and need to be escaped. EG "( ) { } [ ] +"
  • I used the word boundary token to ensure that only whole words were returned. But some of your text strings begin or end with a non-Word character, so the word boundary is not at the beginning/end of the string. Would need to know a bit more about your data in order to properly translate this. (A word character is anything in the set "[A-Za-z0-9_]"). 
Anonymous
Not applicable

Hi Ron,

 

The main use is to filter paths and/or file names in order to reduce the amount of data loaded into the data model.

 

An example of file names in a folder:
Report 2023 Qtr-1.xlsx
Report 2023-Qtr2.xlsm
Whatever.pdf
Report 2023 Quarter 3.xlsx
Report 2024 Quarter1.xlsx
Report 2024 Qtr2.xlsx
Data 2024.xlsx
Report 2024 Q3.xlsx
Report 2024 Q4.xlsx

 

A user could request:
a) Analyze all reports from 2023 => Rep*2023*.xls* =>
Report 2023 Qtr-1.xlsx
Report 2023-Qtr2.xlsm
Report 2023 Quarter 3.xlsx

 

b) Compare the second quarter of all years => Rep*####*Q*2* =>
Report 2023-Qtr2.xlsm
Report 2024 Qtr2.xlsx

 

Here is a reduced example from our company, as you see it's the worst:
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 30.05.13 KW22.xlsx
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 31.01.13 KW05.xlsx
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 31.10.13 KW44.xlsx
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW24-01.xlsm
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW25.xlsm
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW26.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 27.02.14 KW09.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 27.03.14 KW13.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 29.01.14 KW05.xlsx
I:\LOG\Kapazitätsplanung\2015\Fertigungsliste ab 07.04.15.xlsx
I:\LOG\Kapazitätsplanung\2015\Gesamt VK-Aufträge 2014.xlsx
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW11.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW12.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW12_alt.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW13.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-17.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-18 - neu.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-18.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-20.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-22.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-24.xlsm
I:\LOG\Kapazitätsplanung\2024\Produktion Kapazitätsgrobplanung 2024.xlsx
I:\LOG\Kapazitätsplanung\2024\Produktion Kapazitätsgrobplanung 35000000.xlsx

 

I have 1604 files in just that folder... welcome to my world of crap data. 😆

 

We produce systems for the food industry worldwide, and almost every customer has different requirements. If you have a new request, you can remember "yes, we've done something like that for customer XY before." Where is the data for it?

 

And in most companies it's not much different, you start somehow and then over time you develop a system that grows.

 

Andreas.

That explains your problem better.

 

A suggestion: Instead of relying on the user to construct a valid wild-card "like" input, why not have them submit the parameters utilizing a User Form. Then you'd get valid data input and your filtering code would probably be much simpler.

Anonymous
Not applicable

That means I would have to split the process into PQ.

 

I would first have to read in all the files with PQ, then call up the user form, then filter the data with VBA and LIKE, write the result into a table, read this into PQ and now I can continue with the actual data import.

 

IMHO this is quite a big detour and fraught with further difficulties. Thanks, but no thanks.

Let’s stick to the topic: Wildcard comparison.

 

During my research I often read: Use RegEx. But I have never seen how to filter real-life data with RegEx.

 

Don't get me wrong, you can make amazing comparisons with RegEx that you would otherwise have to write an entire parser for, very impressive.

 

The big problem with RegEx is the syntax, it looks like a rattlesnake has walked across the screen, no normal user understands it and even for me as an experienced programmer it is too complicated. A wildcard comparison, on the other hand, is simple and intuitive to use.

 

And it seems as it is too difficult to do a wildcard comparison with file paths using RegEx for you too. (Don’t get me wrong, I didn’t manage it either). Or is there an update that I missed?

 

Andreas.

a. I stopped working on the regex when you reported you had your routine working, and also that the regex was considerably slower.  But if your routine is not returning what you need, I'll be happy to finish up the regex now that I have more information.

 

b. I didn't realize you would have to read in all the files if you chose to use a user form. My idea of that was just to replace whatever you are using now to obtain the wild card string from the user, with a form that had a more structured input.

 

Anonymous
Not applicable

You wrote: Using my routine modified as above, I found no errors, even on the 10000 row Data table.

 

If you open my text file as is, you can see that I got errors and the error is always the same:

Andreas_Killer_0-1721621067788.png

Expression.Error: The operation could not be completed because the enumeration did not contain enough elements.

Even with your updated code, I get a lot of wrong results:

Andreas_Killer_1-1721621325138.png

 

As you can see I have a German (DE) system, I that an issue for RefEx?
Which locale do you have?

 

Andreas.

Helpful resources

Announcements
September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.