Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
I want to compare paragraphs by sentence. I can split the paragraph into sentences when there is a line feed and/or carriage return. Do you have any suggestions, how I may split a paragraph into separate sentences with numbered lists.
1. Bicycle racks shall be installed on a durable surface, preferably near the associate entrance without pedestrian route conflicts. 2. Designated bicycle corridors connecting the public right-of-way with the bicycle parking shall not be provided unless required per local code. If required and provided, the bike corridor shall consist of two (1.2m) wide pavement lanes (on the outside of travel lanes, one in each direction) separated from the vehicle travel lane by pavement striping. Bike lanes should be signed and striped in accordance to local code and regulations for all traffic signs. 3. At sites with a designated bike corridor, bikes should not travel on sidewalks (with exception of parking) or in vehicle travel lanes.
I can split this paragraph with a (period) (space), but this will also split the 1, 2, 3, etc.
The desired output is three records, one for each sentence.
Please advise.
Solved! Go to Solution.
One possible method is finding the positions of all the digits and then selecting those digit positions immediately followed by a period and a space. Those are the positions you need to split on.
Here's an example query putting this all together:
let
Source = Table.FromRows({{"1. Bicycle racks shall be installed on a durable surface, preferably near the associate entrance without pedestrian route conflicts. 2. Designated bicycle corridors connecting the public right-of-way with the bicycle parking shall not be provided unless required per local code. If required and provided, the bike corridor shall consist of two (1.2m) wide pavement lanes (on the outside of travel lanes, one in each direction) separated from the vehicle travel lane by pavement striping. Bike lanes should be signed and striped in accordance to local code and regulations for all traffic signs. 3. At sites with a designated bike corridor, bikes should not travel on sidewalks (with exception of parking) or in vehicle travel lanes."}}, type table [Text = text]),
SplitToList = Table.AddColumn(Source, "Split", each
[
Digits = Text.PositionOfAny([Text], {"0".."9"}, Occurrence.All),
Positions = List.Select(Digits, (i) => Text.Middle([Text], i + 1, 2) = ". "),
Split = Splitter.SplitTextByPositions(Positions)([Text])
][Split], type {text}),
ExplandList = Table.ExpandListColumn(SplitToList, "Split")
in
ExplandList
Hi @MarkusEng1998, another solution here:
Result:
let
Source = Table.FromRows({{"1. Bicycle racks shall be installed on a durable surface, preferably near the associate entrance without pedestrian route conflicts. 2. Designated bicycle corridors connecting the public right-of-way with the bicycle parking shall not be provided unless required per local code. If required and provided, the bike corridor shall consist of two (1.2m) wide pavement lanes (on the outside of travel lanes, one in each direction) separated from the vehicle travel lane by pavement striping. Bike lanes should be signed and striped in accordance to local code and regulations for all traffic signs. 3. At sites with a designated bike corridor, bikes should not travel on sidewalks (with exception of parking) or in vehicle travel lanes."}}, type table [Text = text]),
Ad_Splitted = Table.AddColumn(Source, "Splitted", each
[ a = Text.Split([Text], " "),
b = List.Select(a, (x)=> Text.EndsWith(x, ".")),
delimiters = List.Select(b, (x)=> (try Number.From(Text.Remove(x, ".")) otherwise false) is number),
d = List.Accumulate( {0..List.Count(delimiters)-1}, {}, (s,c)=> s & { try delimiters{c} & " " & Text.Trim(Text.BetweenDelimiters([Text], delimiters{c}, delimiters{c+1})) otherwise delimiters{c} & " " & Text.Trim(Text.AfterDelimiter([Text], delimiters{c})) } )
][d], type list),
ExpandedSplitted = Table.ExpandListColumn(Ad_Splitted, "Splitted")
in
ExpandedSplitted
One possible method is finding the positions of all the digits and then selecting those digit positions immediately followed by a period and a space. Those are the positions you need to split on.
Here's an example query putting this all together:
let
Source = Table.FromRows({{"1. Bicycle racks shall be installed on a durable surface, preferably near the associate entrance without pedestrian route conflicts. 2. Designated bicycle corridors connecting the public right-of-way with the bicycle parking shall not be provided unless required per local code. If required and provided, the bike corridor shall consist of two (1.2m) wide pavement lanes (on the outside of travel lanes, one in each direction) separated from the vehicle travel lane by pavement striping. Bike lanes should be signed and striped in accordance to local code and regulations for all traffic signs. 3. At sites with a designated bike corridor, bikes should not travel on sidewalks (with exception of parking) or in vehicle travel lanes."}}, type table [Text = text]),
SplitToList = Table.AddColumn(Source, "Split", each
[
Digits = Text.PositionOfAny([Text], {"0".."9"}, Occurrence.All),
Positions = List.Select(Digits, (i) => Text.Middle([Text], i + 1, 2) = ". "),
Split = Splitter.SplitTextByPositions(Positions)([Text])
][Split], type {text}),
ExplandList = Table.ExpandListColumn(SplitToList, "Split")
in
ExplandList
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.