What do HN people have to say about Unicode and UTF-{8,16,32}? Are there parts you've never really understood? Have you had unexpected bugs due to misunderstood properties of text?
Turns out, sometimes changing case changes not only the number of bytes (in UTF-8), but the number of encoded characters! This led to my post "UTF-8 characters that behave oddly when the case is changed" [1], which inspired a lot of conversation that taught me a lot. After that, I started reading Unicode documentation in earnest, and building up an idea of what a new tool should show. I'm trying to make clear things I didn't (and sometimes still don't) understand, so I'd love to know what causes pains in the wild / gaps in people's understanding.
Then came emojis, and now the Unicode Consortium's efforts for Unicode version updates seems to be about adding more different kinds of poop emojis and shades of skin colors. Well, maybe it projects accurately the language and culture of this modern time.
UTF-8 is great because it is a superset of ASCII, but because its byte-width varies, it has more complexity for decoding/encoding it (similar to constant/variable width ISA's in CPUs).
Different languages have different concepts, e.g. text direction==flow (left/right, up/down, characters/logograms, different kind of visual cues etc.). Humans create problems when they want to combine different languages at the same time. E.g. mathematical notation is in my opinion 2D graphics, and it cannot be (usually/always) inlined with text glyphs (to be aesthetically pleasing). Same kind of problems may come when trying to inline e.g. languages with different flow directions. Its like trying to combine native GUI widgets in Win32 and Cocoa/SwiftUI and GTK/Qt/WXwidgets - the (visual) languages doesn't have the same concepts or they are conflicting.