View Issue Details

IDProjectCategoryView StatusLast Update
0002475GNUnetnamestore servicepublic2012-12-21 16:49
Reporterschanzen Assigned ToChristian Grothoff  
PriorityhighSeverityfeatureReproducibilityhave not tried
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.9.5Fixed in Version0.9.5 
Summary0002475: Namestore / GNS record names and IDNs
DescriptionIf GNS names are supposed to support multibyte characters (IDNs) then the code needs to be changed for this. Basically we need guidelines on label length (63 bytes vs 63 characters) and change the maximum name and label length variables as well as any name checking functions. Also I am not sure if the regular expressions in the GNS proxy work well with multibyte (I assume REs only work with ASCII).
TagsNo tags attached.

Activities

Christian Grothoff

2012-07-02 19:14

manager   ~0006191

Can't we for GNS just standardize/define/assume that IDNs are utf-8 encoded? Regex should work with utf-8, and we can use 'strlen' to check for the 63-byte limitation. Now, if DNS IDNs are not in UTF-8 compatible encodings, we'd have to convert back & forth in libgnunetdnsparser. That by itself is not so bad, except if UTF-8 is fine with 63 chars but some other encoding is not (or vice versa). Is that a real problem?

Christian Grothoff

2012-07-02 19:24

manager   ~0006192

Alternatively, we can just do it the way DNS deals with IDN using http://www.gnu.org/software/libidn/

schanzen

2012-07-02 22:32

administrator   ~0006193

I think this can get ugly: http://www.gnu.org/software/libidn/manual/libidn.html#On-Label-Separators

It should be noted that multibyte is not impossible atm per se:
$ gnunet-gns -u 大学.schanzen.fcfs.gnunet
大学.schanzen.fcfs.gnunet:
Got A record: 1.1.1.1
$

It's just that I don't know if the code is utf8 proof. (especially in terms of separators see above)
Also I thought the limitation is 63 characters, not bytes. If it is bytes then there is less of a problem I think.

Regex: Maybe I didn't make myself clear but: How can you write a regex for utf-8 multibyte strings without escaping them beforehand? And even then you have a problem with actually specifying the re string.
See also: http://en.wikipedia.org/wiki/Regular_expression section Unicode esp bulletpoint 3

Christian Grothoff

2012-07-03 16:15

manager   ~0006201

1) the limit is 63 bytes for sure, as DNS uses the length-prefixing on the wire and treats the labels as byte sequences; thus limiting to 63 characters has the problem of potentially creating more than 63 bytes...
2) libidn label seps: doesn't seem like a killer issue to me
3) regex: you're right that the multibyte strings would cause trouble, but as the HTML should contain the domain names in IDN format, the regex just also has to use IDN format (and then we're in ASCII-territory for the matching). The real issue is the theoretical possibility that the entire webpage is encoded in UC-16 or something like that (not UTF-8), in which case the entire HTML would have to first be re-coded by the proxy.

Christian Grothoff

2012-10-17 23:01

manager   ~0006449

Ok, in terms of implementation, I'm thinking of mostly doing stuff in libgnunetdnsparser:

1) dnsparser can specify that names returned from parsing are in UTF-8
2) dnsparser can specify that names given for serialization are in UTF-8 (and then fail if the 63-byte limit in IDN format cannot be met)

Furthermore, it makes sense for dnsparser to have a test-function that applications can use to check if an individual label fits within the 63-character limit. I've adding such a function (and configure tests for GNU libidn) in SVN 24384. Converstion from/to utf-8 in dnsparser is still missing.

Christian Grothoff

2012-10-18 11:43

manager   ~0006450

Ok, so theoretically IDN is implemented as of SVN 24396. However, we don't have a good testcase. Martin, can you test it?

Christian Grothoff

2012-12-04 20:12

manager   ~0006640

Tested, works.

Issue History

Date Modified Username Field Change
2012-07-02 11:20 schanzen New Issue
2012-07-02 19:14 Christian Grothoff Note Added: 0006191
2012-07-02 19:24 Christian Grothoff Note Added: 0006192
2012-07-02 22:32 schanzen Note Added: 0006193
2012-07-03 16:15 Christian Grothoff Note Added: 0006201
2012-07-04 09:27 Christian Grothoff Status new => confirmed
2012-07-04 09:27 Christian Grothoff Product Version => Git master
2012-07-04 09:27 Christian Grothoff Target Version => 0.9.5
2012-09-29 21:24 Christian Grothoff Priority normal => high
2012-10-07 14:19 Christian Grothoff Assigned To => Christian Grothoff
2012-10-07 14:19 Christian Grothoff Status confirmed => assigned
2012-10-17 23:01 Christian Grothoff Note Added: 0006449
2012-10-18 11:43 Christian Grothoff Note Added: 0006450
2012-10-18 11:43 Christian Grothoff Status assigned => feedback
2012-12-04 20:12 Christian Grothoff Note Added: 0006640
2012-12-04 20:12 Christian Grothoff Status feedback => resolved
2012-12-04 20:12 Christian Grothoff Fixed in Version => 0.9.5
2012-12-04 20:12 Christian Grothoff Resolution open => fixed
2012-12-21 16:49 Christian Grothoff Status resolved => closed
2013-10-02 13:56 Christian Grothoff Category namestore => namestore service