View Issue Details

IDProjectCategoryView StatusLast Update
0002422GNUnetWin32 portpublic2018-06-07 01:19
ReporterLRNAssigned To 
Status confirmedResolutionopen 
PlatformW32OSNTOS Version6.1.7601
Product VersionSVN HEAD 
Target VersionFixed in Version 
Summary0002422: Correct UTF-8 conversion when interacting with environment variables.
DescriptionI've made a sample program that reads a string from three different files (CP1251-, UTF8- and UTF16-encoded), then uses SetEnvironmentVariableA and SetEnvironmentVariableW to put this string into a variable (with different names), then uses GetEnvironmentVariableA/W on each of those variables and writes the result into a file.

In the following list "X->Y" means "Variable was set with SetEnvironmentVariableX and retrieved with GetEnvironmentVariableY". "OK" means that output is byte-identical to the input.

For CP1251-encoded text:
A->A - OK
A->W - OK (converted to wide-string using CP1251->UTF16)
W->A - garbage (0x3f3f61)
W->W - OK (with extra zero byte, which might be my fault, or may be due to the way string contents are interpreted by SetEnvironmentVariableW)

For UTF8-encoded text:
A->A - OK
A->W - misencoded (converted to wide-string using CP1251->UTF16; should look fine after UTF16->CP1251 conversion and re-interpreting as UTF8)
W->A - garbage (filled with 0x3F, which indicates a string converter failure)
W->W - OK

For UTF16-encoded text:
A->A - OK
A->W - misencoded (converted to wide-string using CP1251->UTF16; should look fine after UTF16->CP1251 conversion and re-interpreting as UTF16)
W->A - OK (CP1251-encoded)
W->W - OK

I think it is very likely that W32 stores environment variables in UTF16-encoded form internally (just as it does with filenames). Not lying about string encoding when setting a variable (as long as it's CP* matching your locale or UTF16) will result in the output of GetEvironmentVariable* being OK (with the exception of W->A for strings with characters not representable in CP* that your locale uses; this case was not covered by this test, since i've had hard time producing such a string). And it is possible to pass UTF8 through, as long as the reader of that variable knows to expect UTF8 encoding.

More to the topic: on W32 variables, like PATH, for example, should be read using wide-character-aware functions and converted to UTF8 as necessary (since GNUnet internally uses UTF8 almost exclusively these days). Also, on W32 variables should be set using wide-character-aware functions as well (after performing UTF8->UTF16 conversion for their arguments).
TagsNo tags attached.


There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2012-06-13 13:05 LRN New Issue
2012-06-13 13:05 LRN Status new => assigned
2012-06-13 13:05 LRN Assigned To => LRN
2012-09-29 21:27 Christian Grothoff Severity minor => tweak
2018-06-07 01:19 Christian Grothoff Assigned To LRN =>
2018-06-07 01:19 Christian Grothoff Status assigned => confirmed