On 11/06/2020 11:24, Martin Man via mbed-tls wrote:
That code is from an earlier era (mid 2000s, I think) when most systems used an 8-bit encoding, but non-8-bit-clean systems were still common. A '\x80' in text might be transformed to '\x00' with disastrous consequences.
But over a decade later, I don't think non-8-bit-clean systems are a concern anymore. Leaving all non-ASCII characters alone sounds reasonable to me.
We are not going to do Unicode normalization in Mbed TLS: that would be far too complex for a library that runs on systems with ~1e5 bytes available for code. So Unicode strings would only be processed correctly if the application passes normalized strings and CAs only generate certificates with normalized strings. But that would be an improvement on converting non-ASCII characters to '?'.
mbed-tls@lists.trustedfirmware.org