hoping that this topic fits this forum, didn't know where else I could ask stuff like that.
I'm trying to parse a Wine registry file (user.reg) manually in Python, on a machine where Wine is not installed, so I can't just use regedit. I'm just trying to extract one particular value from that registry file.
If that value contains ASCII-chars only, that's pretty simple. Read file line by line until I find the correct section, then read line by line until I find the correct value, then split on the "=" sign, done.
But I am very confused with how this file encodes non-ASCII characters in the Windows registry.
I have started regedit through Wine and created a new registry entry with the following value:
Code: Select all
aé𝥷aa
I just picked that unicode character randomly to see how unicode characters work in the registry.
Looking at the user.reg in a hex editor, there is the following ascii text: "a\xe9\xd836\xdd77aa". Looking at that text this looks a bit like C-escaped UTF-16-BE, but corrupted. Considering there's an escape sequence "\xe9" for the "é" character, how is the application reading this thing supposed to know if "\xd836" means "\x{d836}" or "\xd8" followed by a "3" and a "6"?
Parsing that like "normal" escape sequences would lead to the bytes "61 e9 d8 33 36 dd 37 37 61 61" which is not what that's intended to mean...
Am I completely misunderstanding something? Am I missing something? I checked the Wiki, but the only thing I found was a sentence in the FAQ like "don't edit these files manually due to their special encoding" - but what IS that encoding? I don't want to edit them, just read them, so that's unlikely to break stuff. I just don't know how to read them. There has to be a non-ambiguous way to parse these as wine itself seems to handle that just fine ...