The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hi i am working with data in Korean and would like to remove punctuations while keeping the characters and numbers in a column. It would be great if i can do this in one step rather than removing each type of punctuation one by one.
i tried using this but realise it doesnt work for languages that does not use the alphabet (Korean/Japanese)
=Text.Select([Column1],{"a".."z","A".."Z","0".."9"})
Is there another way to do so?
Solved! Go to Solution.
Hi @kellyylx
I would recommend creating a separate list query containing all characters you want to keep.
You would need to identify all ranges of Unicode characters you want to keep, which may require a little research.
For example, here is a list containing (probably) all Korean characters plus the alphabet and numerals. You can add/remove whatever is required:
// CharactersToKeep
{
Character.FromNumber(0x1100)..Character.FromNumber(0x11FF), // Hangul Jamo
Character.FromNumber(0xa960)..Character.FromNumber(0xA97F), // Extended Hangul Jamo 1
Character.FromNumber(0xD7B0)..Character.FromNumber(0xD7FF), // Extended Hangul Jamo 2
Character.FromNumber(0xAC00)..Character.FromNumber(0xD7A3), // Hangul Syllables
Character.FromNumber(0x3130)..Character.FromNumber(0x318F), // Hangul Compatibility Jamo
"0".."9", // Numerals
"a".."z", // Alphabet lowercase
"A".."Z" // Alphabet uppercase
}
Then reference this list as the 2nd argument of Text.Select:
Text.Select([Original Text],CharactersToKeep)
Small example attached.
Regards
Hi @kellyylx
I would recommend creating a separate list query containing all characters you want to keep.
You would need to identify all ranges of Unicode characters you want to keep, which may require a little research.
For example, here is a list containing (probably) all Korean characters plus the alphabet and numerals. You can add/remove whatever is required:
// CharactersToKeep
{
Character.FromNumber(0x1100)..Character.FromNumber(0x11FF), // Hangul Jamo
Character.FromNumber(0xa960)..Character.FromNumber(0xA97F), // Extended Hangul Jamo 1
Character.FromNumber(0xD7B0)..Character.FromNumber(0xD7FF), // Extended Hangul Jamo 2
Character.FromNumber(0xAC00)..Character.FromNumber(0xD7A3), // Hangul Syllables
Character.FromNumber(0x3130)..Character.FromNumber(0x318F), // Hangul Compatibility Jamo
"0".."9", // Numerals
"a".."z", // Alphabet lowercase
"A".."Z" // Alphabet uppercase
}
Then reference this list as the 2nd argument of Text.Select:
Text.Select([Original Text],CharactersToKeep)
Small example attached.
Regards
User | Count |
---|---|
69 | |
64 | |
62 | |
55 | |
28 |
User | Count |
---|---|
203 | |
82 | |
65 | |
48 | |
38 |