Incorrect encoding of html body of an email from E...

FelipeSeptimo · ‎04-11-2024

When accessing the HTML body of an email via an Exchange Server connection, the characters ä, ü, ö, ß, etc. are not displayed correctly: e.g. "Gem�se" instead of "Gemüse". The attempt using

Text.FromBinary(Text.ToBinary([HtmlBody],TextEncoding.Utf8), TextEncoding.Windows)

also results in an incorrect "Gemï¿½se". All other conceivable combinations of common code page identifiers (https://learn.microsoft.com/de-de/windows/win32/intl/code-page-identifiers) also failed to produce a result.
It does not matter whether the original mail was sent in charset windows-1252 or iso-8859-1.
Interestingly, the characters are displayed correctly in the text body. However, as the table structures are missing here, this cannot be used.
Is there any other way to encode the text correctly?

FelipeSeptimo · ‎04-12-2024

Hello @Anonymous

Maybe I wasn't entirely clear with the description of the problem.

I receive an email with the following structured content:

I access these with the Exchange Connector,

Source = Exchange.Contents("somebody@example.com"),
Mail = Source{[Name="Mail"]}[Data],
SelectedFolder = Table.SelectRows(Mail, each ([Folder Path] = "\Somewhere\")),
Body = Record.ToTable(SelectedFolder{0}[Body]),

where I then get the following result for the record of the [Body] field:

The TextBody shows the correct characters without the structure elements and the HTML shows the structural elements but the wrong characters.

Now, when I try to decode the HtmlBody with all conceivable CPI combinations and filter it to the string in the table, I get an empty table:

SelectHtmlBody = Table.SelectRows(Body, each ([Name] = "HtmlBody")),
    AddConnecterColumn = Table.AddColumn(SelectHtmlBody, "connect", each 1),
    CPI =
            let
                Source = Table.TransformColumnTypes(Web.Page(Web.Contents("https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers")){0}[Data],{{"Identifier", Int64.Type}}),
                AddConnecterColumn = Table.AddColumn(Source, "connect", each 1),
                JoinTable = Table.NestedJoin(AddConnecterColumn, {"connect"}, AddConnecterColumn, {"connect"}, "AddConnecterColumn", JoinKind.LeftOuter),
                ExpandIdentifierColumn = Table.ExpandTableColumn(JoinTable, "AddConnecterColumn", {"Identifier"}, {"Identifier.1"})
            in
                ExpandIdentifierColumn,
    JoinCPI = Table.NestedJoin(AddConnecterColumn, {"connect"}, CPI, {"connect"}, "CPI", JoinKind.LeftOuter),
    ExpandIdentifier = Table.ExpandTableColumn(JoinCPI, "CPI", {"Identifier", "Identifier.1"}, {"Identifier", "Identifier.1"}),
    AddEncodingColumn = Table.AddColumn(ExpandIdentifier, "Encoding", each Text.FromBinary(Text.ToBinary([Value], [Identifier]),[Identifier.1])),
    RemoveAndSelect = Table.SelectRows(Table.RemoveRowsWithErrors(AddEncodingColumn, {"Encoding"}), each Text.Contains([Encoding],"Üäöüß"))

Anonymous · ‎04-11-2024

Hi @FelipeSeptimo

Since the characters are displayed correctly in the text body, you may try extracting them from html code through Html.Table function. This may be an alternative.

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!