The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi Community,
I have a column with multiple information that I would like to split up into several columns.
The input is roughly like this:
Row 1 | aaa.bb.cccc.dd.xxx.eeee.ff |
Row 2 | aaa.bbbb.cc.ddd.eee.ffff |
As you can see, both rows contain similar information categories split up in different blocks, separated by "." as delimiter. The information itself may vary within a category. However, some rows contain an additional information block ("xxx." in row 1).
I would like to create a separate column for each information category block, however the additional information block ("xxx." in row 1) is disturbing as I cannot separate the columns only by referring to the delimiter.
As I don't necessarily need the information in the additional information block ("xxx."), I thought it could make sense to delete this text part first and then separate columns based on "." as delimiter.
Finally, the desired output should look like this:
Row 1 | aaa | bb | cccc | dd | eeee | ff |
Row 2 | aaa | bbbb | cc | ddd | eee | ffff |
or
Row 1 | aaa | bb | cccc | dd | xxx | eeee | ff |
Row 2 | aaa | bbbb | cc | ddd | eee | ffff |
Do you have any suggestions how to solve this task?
Many thanks 🙂
Solved! Go to Solution.
Hi,
if the pattern is always the same you can get :
only by :
- for semplicity i added custom
- then with
and only chosing from the start for the firts 3 columns and from the end for the last 2.
If this post is useful to help you to solve your issue consider giving the post a thumbs up
and accepting it as a solution !
Ok, so how do we know what to remove? Are we removing the letter that is not part of an otherwise contiguous sequence letters that do not have a fixed number of items? What is the logic here?
I can't help but feel like this is some made up question that was put into the ether to make me expend my limited brainpower on some old haphazardness, only to realize that I don't even know what the question is?
How in the world did this data get into this predicament? Is this based on something from a dataset, or a log, or replacing all of the spaces in Hamlet with dots? Just what is this anyway?! And can you put it back the way it was? Or is this the original un"finessed" data? Please help me understand.
--Nate
@Anonymous ,
the dummy data I shared is based on I/O-maschine data (text data) which I cannot share.
There is supposed to be a pattern, however in some cases it is broken and I cannot recognize the logic behind it.
Anyway, especially the approach from @serpiva64 already helped to solve some of the tasks and I'll combine this with some manual adaptions.
Thank you!
If you know that you want to replace all x's and y's, or some other specific characters, you can just use Replace values to and replace y with nothing. Then change your newly made .. values with .
let
Source = Table,
#"Replaced Value" = Table.ReplaceValue(Source,"x","",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","y","",Replacer.ReplaceValue,{"Column2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","..",".",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value2", "Column2", Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv), {"Column2.1", "Column2.2", "Column2.3", "Column2.4", "Column2.5", "Column2.6"})
in
#"Split Column by Delimiter1"
--Nate
Hi @Anonymous
unfortunately, that's not the case as the information may vary within a certain information category. Therefore, it would be quite some manual work replacing all these values and I'd prefer avoiding this.
Thank you anyway! 🙂
Hi,
if the pattern is always the same you can get :
only by :
- for semplicity i added custom
- then with
and only chosing from the start for the firts 3 columns and from the end for the last 2.
If this post is useful to help you to solve your issue consider giving the post a thumbs up
and accepting it as a solution !
Hi @serpiva64
thanks for your input.
Indeed, that looks like an acceptable workaround for simple cases.
I implemented it as follows:
- First, I created a custom column with extra dots at the start and end
- Then, another coustom column extracting the first 4 categories
- Then, another custom column extracting the last 2 categories.
However, in some cases I actually have more than one information category that needs to be removed, like this:
Column |
aaa.bb.cccc.dd.xxx.eeee.ff |
aaa.bbbb.cc.ddd.eee.ffff.yyy |
So in this case, the approach would lead to the following:
Column | Custom | Cutsom.1 | Custom.2 |
aaa.bb.cccc.dd.xxx.eeee.ff | .aaa.bb.cccc.dd.xxx.eeee.ff. | aaa.bb.cccc.dd | eeee.ff |
aaa.bbbb.cc.ddd.eee.ffff.yyy | .aaa.bbbb.cc.ddd.eee.ffff.yyy. | aaa.bbbb.cc.ddd | ffff.yyy |
However, I'd like to have the following:
Column | Custom | Cutsom.1 | Custom.2 |
aaa.bb.cccc.dd.xxx.eeee.ff | .aaa.bb.cccc.dd.xxx.eeee.ff. | aaa.bb.cccc.dd | eeee.ff |
aaa.bbbb.cc.ddd.eee.ffff.yyy | .aaa.bbbb.cc.ddd.eee.ffff.yyy. | aaa.bbbb.cc.ddd | eee.ffff |
Do you have further ideas how to solve such cases?
Thank you! 🙂
I think that is not possible to solve your exemple. The only possibility is to try to find a schema that you have not seen in some real data (column names, spaces, something else?)
if the items can be made into a list, can we have an algorithm that says:
if there are 7 items, remove the 5th.
Is that the only way to identify if there is an extra item?