Unicode in Wine's registry files

Questions about Wine on Linux
Locked
Leseratte10
Newbie
Newbie
Posts: 2
Joined: Fri Nov 19, 2021 1:37 pm

Unicode in Wine's registry files

Post by Leseratte10 »

Hello everyone,

hoping that this topic fits this forum, didn't know where else I could ask stuff like that.
I'm trying to parse a Wine registry file (user.reg) manually in Python, on a machine where Wine is not installed, so I can't just use regedit. I'm just trying to extract one particular value from that registry file.

If that value contains ASCII-chars only, that's pretty simple. Read file line by line until I find the correct section, then read line by line until I find the correct value, then split on the "=" sign, done.
But I am very confused with how this file encodes non-ASCII characters in the Windows registry.

I have started regedit through Wine and created a new registry entry with the following value:

Code: Select all

aé𝥷aa
In case that's not displayed correctly on the forum (it probably won't), that's one small ascii "a", one small ascii "e" with an acute accent, then the unicode character U+1D977 (Signwriting Movement-Floorplane Check), then another two ascii "a". In UTF-32-BE would be "00000061 000000e9 0001d977 00000061 00000061".
I just picked that unicode character randomly to see how unicode characters work in the registry.

Looking at the user.reg in a hex editor, there is the following ascii text: "a\xe9\xd836\xdd77aa". Looking at that text this looks a bit like C-escaped UTF-16-BE, but corrupted. Considering there's an escape sequence "\xe9" for the "é" character, how is the application reading this thing supposed to know if "\xd836" means "\x{d836}" or "\xd8" followed by a "3" and a "6"?
Parsing that like "normal" escape sequences would lead to the bytes "61 e9 d8 33 36 dd 37 37 61 61" which is not what that's intended to mean...

Am I completely misunderstanding something? Am I missing something? I checked the Wiki, but the only thing I found was a sentence in the FAQ like "don't edit these files manually due to their special encoding" - but what IS that encoding? I don't want to edit them, just read them, so that's unlikely to break stuff. I just don't know how to read them. There has to be a non-ambiguous way to parse these as wine itself seems to handle that just fine ...
julliard
Level 2
Level 2
Posts: 11
Joined: Sat Mar 30, 2013 12:22 pm

Re: Unicode in Wine's registry files

Post by julliard »

The \x escape sequences represent 16-bit chars (UTF-16) so they are normally 4 hex digit long. However, like C escapes they terminate at the first non-hex digit. So \xe9z is 00E9 007A, and \xdd77a is DD77 0061.

It also uses standard C escapes for chars < 32. The code to read and write them is in https://source.winehq.org/git/wine.git/ ... /unicode.c
Leseratte10
Newbie
Newbie
Posts: 2
Joined: Fri Nov 19, 2021 1:37 pm

Re: Unicode in Wine's registry files

Post by Leseratte10 »

Thanks for that hint, I just reimplemented parse_strW in Python and that seems to work well enough.

Guess I didn't program enough C in the past - I thought the escapes worked like in Bash, where "\xdd77a" would be the one byte DD followed by the ascii test "77a".
Locked