View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0002422||GNUnet||Win32 port||public||2012-06-13 13:05||2018-06-07 01:19|
|Product Version||SVN HEAD|
|Target Version||Fixed in Version|
|Summary||0002422: Correct UTF-8 conversion when interacting with environment variables.|
|Description||I've made a sample program that reads a string from three different files (CP1251-, UTF8- and UTF16-encoded), then uses SetEnvironmentVariableA and SetEnvironmentVariableW to put this string into a variable (with different names), then uses GetEnvironmentVariableA/W on each of those variables and writes the result into a file.|
In the following list "X->Y" means "Variable was set with SetEnvironmentVariableX and retrieved with GetEnvironmentVariableY". "OK" means that output is byte-identical to the input.
For CP1251-encoded text:
A->A - OK A->W - OK (converted to wide-string using CP1251->UTF16) W->A - garbage (0x3f3f61) W->W - OK (with extra zero byte, which might be my fault, or may be due to the way string contents are interpreted by SetEnvironmentVariableW)
For UTF8-encoded text:
A->A - OK A->W - misencoded (converted to wide-string using CP1251->UTF16; should look fine after UTF16->CP1251 conversion and re-interpreting as UTF8) W->A - garbage (filled with 0x3F, which indicates a string converter failure) W->W - OK
For UTF16-encoded text:
A->A - OK A->W - misencoded (converted to wide-string using CP1251->UTF16; should look fine after UTF16->CP1251 conversion and re-interpreting as UTF16) W->A - OK (CP1251-encoded) W->W - OK
I think it is very likely that W32 stores environment variables in UTF16-encoded form internally (just as it does with filenames). Not lying about string encoding when setting a variable (as long as it's CP* matching your locale or UTF16) will result in the output of GetEvironmentVariable* being OK (with the exception of W->A for strings with characters not representable in CP* that your locale uses; this case was not covered by this test, since i've had hard time producing such a string). And it is possible to pass UTF8 through, as long as the reader of that variable knows to expect UTF8 encoding.
More to the topic: on W32 variables, like PATH, for example, should be read using wide-character-aware functions and converted to UTF8 as necessary (since GNUnet internally uses UTF8 almost exclusively these days). Also, on W32 variables should be set using wide-character-aware functions as well (after performing UTF8->UTF16 conversion for their arguments).
|Tags||No tags attached.|
|2012-06-13 13:05||LRN||New Issue|
|2012-06-13 13:05||LRN||Status||new => assigned|
|2012-06-13 13:05||LRN||Assigned To||=> LRN|
|2012-09-29 21:27||Christian Grothoff||Severity||minor => tweak|
|2018-06-07 01:19||Christian Grothoff||Assigned To||LRN =>|
|2018-06-07 01:19||Christian Grothoff||Status||assigned => confirmed|