Unicode in Wine's registry files

Leseratte10 · Post by **Leseratte10** » Fri Nov 19, 2021 2:10 pm

Hello everyone,

hoping that this topic fits this forum, didn't know where else I could ask stuff like that.
I'm trying to parse a Wine registry file (user.reg) manually in Python, on a machine where Wine is not installed, so I can't just use regedit. I'm just trying to extract one particular value from that registry file.

If that value contains ASCII-chars only, that's pretty simple. Read file line by line until I find the correct section, then read line by line until I find the correct value, then split on the "=" sign, done.
But I am very confused with how this file encodes non-ASCII characters in the Windows registry.

I have started regedit through Wine and created a new registry entry with the following value:

Code: Select all

aé𝥷aa

In case that's not displayed correctly on the forum (it probably won't), that's one small ascii "a", one small ascii "e" with an acute accent, then the unicode character U+1D977 (Signwriting Movement-Floorplane Check), then another two ascii "a". In UTF-32-BE would be "00000061 000000e9 0001d977 00000061 00000061".
I just picked that unicode character randomly to see how unicode characters work in the registry.

Looking at the user.reg in a hex editor, there is the following ascii text: "a\xe9\xd836\xdd77aa". Looking at that text this looks a bit like C-escaped UTF-16-BE, but corrupted. Considering there's an escape sequence "\xe9" for the "é" character, how is the application reading this thing supposed to know if "\xd836" means "\x{d836}" or "\xd8" followed by a "3" and a "6"?
Parsing that like "normal" escape sequences would lead to the bytes "61 e9 d8 33 36 dd 37 37 61 61" which is not what that's intended to mean...

Am I completely misunderstanding something? Am I missing something? I checked the Wiki, but the only thing I found was a sentence in the FAQ like "don't edit these files manually due to their special encoding" - but what IS that encoding? I don't want to edit them, just read them, so that's unlikely to break stuff. I just don't know how to read them. There has to be a non-ambiguous way to parse these as wine itself seems to handle that just fine ...

julliard · Post by **julliard** » Sat Nov 20, 2021 6:45 am

The \x escape sequences represent 16-bit chars (UTF-16) so they are normally 4 hex digit long. However, like C escapes they terminate at the first non-hex digit. So \xe9z is 00E9 007A, and \xdd77a is DD77 0061.

It also uses standard C escapes for chars < 32. The code to read and write them is in https://source.winehq.org/git/wine.git/ ... /unicode.c

Leseratte10 · Post by **Leseratte10** » Thu Nov 25, 2021 9:37 am

Thanks for that hint, I just reimplemented parse_strW in Python and that seems to work well enough.

Guess I didn't program enough C in the past - I thought the escapes worked like in Bash, where "\xdd77a" would be the one byte DD followed by the ascii test "77a".

Unicode in Wine's registry files

Unicode in Wine's registry files

Re: Unicode in Wine's registry files

Re: Unicode in Wine's registry files