Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
When accessing the HTML body of an email via an Exchange Server connection, the characters ä, ü, ö, ß, etc. are not displayed correctly: e.g. "Gem�se" instead of "Gemüse". The attempt using
Text.FromBinary(Text.ToBinary([HtmlBody],TextEncoding.Utf8), TextEncoding.Windows)
also results in an incorrect "Gem�se". All other conceivable combinations of common code page identifiers (https://learn.microsoft.com/de-de/windows/win32/intl/code-page-identifiers) also failed to produce a result.
It does not matter whether the original mail was sent in charset windows-1252 or iso-8859-1.
Interestingly, the characters are displayed correctly in the text body. However, as the table structures are missing here, this cannot be used.
Is there any other way to encode the text correctly?
Hello @v-jingzhan-msft
Maybe I wasn't entirely clear with the description of the problem.
I receive an email with the following structured content:
I access these with the Exchange Connector,
Source = Exchange.Contents("somebody@example.com"),
Mail = Source{[Name="Mail"]}[Data],
SelectedFolder = Table.SelectRows(Mail, each ([Folder Path] = "\Somewhere\")),
Body = Record.ToTable(SelectedFolder{0}[Body]),
where I then get the following result for the record of the [Body] field:
The TextBody shows the correct characters without the structure elements and the HTML shows the structural elements but the wrong characters.
Now, when I try to decode the HtmlBody with all conceivable CPI combinations and filter it to the string in the table, I get an empty table:
SelectHtmlBody = Table.SelectRows(Body, each ([Name] = "HtmlBody")),
AddConnecterColumn = Table.AddColumn(SelectHtmlBody, "connect", each 1),
CPI =
let
Source = Table.TransformColumnTypes(Web.Page(Web.Contents("https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers")){0}[Data],{{"Identifier", Int64.Type}}),
AddConnecterColumn = Table.AddColumn(Source, "connect", each 1),
JoinTable = Table.NestedJoin(AddConnecterColumn, {"connect"}, AddConnecterColumn, {"connect"}, "AddConnecterColumn", JoinKind.LeftOuter),
ExpandIdentifierColumn = Table.ExpandTableColumn(JoinTable, "AddConnecterColumn", {"Identifier"}, {"Identifier.1"})
in
ExpandIdentifierColumn,
JoinCPI = Table.NestedJoin(AddConnecterColumn, {"connect"}, CPI, {"connect"}, "CPI", JoinKind.LeftOuter),
ExpandIdentifier = Table.ExpandTableColumn(JoinCPI, "CPI", {"Identifier", "Identifier.1"}, {"Identifier", "Identifier.1"}),
AddEncodingColumn = Table.AddColumn(ExpandIdentifier, "Encoding", each Text.FromBinary(Text.ToBinary([Value], [Identifier]),[Identifier.1])),
RemoveAndSelect = Table.SelectRows(Table.RemoveRowsWithErrors(AddEncodingColumn, {"Encoding"}), each Text.Contains([Encoding],"Üäöüß"))
Since the characters are displayed correctly in the text body, you may try extracting them from html code through Html.Table function. This may be an alternative.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!