View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002475 | GNUnet | namestore service | public | 2012-07-02 11:20 | 2012-12-21 16:49 |
Reporter | schanzen | Assigned To | Christian Grothoff | ||
Priority | high | Severity | feature | Reproducibility | have not tried |
Status | closed | Resolution | fixed | ||
Product Version | Git master | ||||
Target Version | 0.9.5 | Fixed in Version | 0.9.5 | ||
Summary | 0002475: Namestore / GNS record names and IDNs | ||||
Description | If GNS names are supposed to support multibyte characters (IDNs) then the code needs to be changed for this. Basically we need guidelines on label length (63 bytes vs 63 characters) and change the maximum name and label length variables as well as any name checking functions. Also I am not sure if the regular expressions in the GNS proxy work well with multibyte (I assume REs only work with ASCII). | ||||
Tags | No tags attached. | ||||
|
Can't we for GNS just standardize/define/assume that IDNs are utf-8 encoded? Regex should work with utf-8, and we can use 'strlen' to check for the 63-byte limitation. Now, if DNS IDNs are not in UTF-8 compatible encodings, we'd have to convert back & forth in libgnunetdnsparser. That by itself is not so bad, except if UTF-8 is fine with 63 chars but some other encoding is not (or vice versa). Is that a real problem? |
|
Alternatively, we can just do it the way DNS deals with IDN using http://www.gnu.org/software/libidn/ |
|
I think this can get ugly: http://www.gnu.org/software/libidn/manual/libidn.html#On-Label-Separators It should be noted that multibyte is not impossible atm per se: $ gnunet-gns -u 大学.schanzen.fcfs.gnunet 大学.schanzen.fcfs.gnunet: Got A record: 1.1.1.1 $ It's just that I don't know if the code is utf8 proof. (especially in terms of separators see above) Also I thought the limitation is 63 characters, not bytes. If it is bytes then there is less of a problem I think. Regex: Maybe I didn't make myself clear but: How can you write a regex for utf-8 multibyte strings without escaping them beforehand? And even then you have a problem with actually specifying the re string. See also: http://en.wikipedia.org/wiki/Regular_expression section Unicode esp bulletpoint 3 |
|
1) the limit is 63 bytes for sure, as DNS uses the length-prefixing on the wire and treats the labels as byte sequences; thus limiting to 63 characters has the problem of potentially creating more than 63 bytes... 2) libidn label seps: doesn't seem like a killer issue to me 3) regex: you're right that the multibyte strings would cause trouble, but as the HTML should contain the domain names in IDN format, the regex just also has to use IDN format (and then we're in ASCII-territory for the matching). The real issue is the theoretical possibility that the entire webpage is encoded in UC-16 or something like that (not UTF-8), in which case the entire HTML would have to first be re-coded by the proxy. |
|
Ok, in terms of implementation, I'm thinking of mostly doing stuff in libgnunetdnsparser: 1) dnsparser can specify that names returned from parsing are in UTF-8 2) dnsparser can specify that names given for serialization are in UTF-8 (and then fail if the 63-byte limit in IDN format cannot be met) Furthermore, it makes sense for dnsparser to have a test-function that applications can use to check if an individual label fits within the 63-character limit. I've adding such a function (and configure tests for GNU libidn) in SVN 24384. Converstion from/to utf-8 in dnsparser is still missing. |
|
Ok, so theoretically IDN is implemented as of SVN 24396. However, we don't have a good testcase. Martin, can you test it? |
|
Tested, works. |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-07-02 11:20 | schanzen | New Issue | |
2012-07-02 19:14 | Christian Grothoff | Note Added: 0006191 | |
2012-07-02 19:24 | Christian Grothoff | Note Added: 0006192 | |
2012-07-02 22:32 | schanzen | Note Added: 0006193 | |
2012-07-03 16:15 | Christian Grothoff | Note Added: 0006201 | |
2012-07-04 09:27 | Christian Grothoff | Status | new => confirmed |
2012-07-04 09:27 | Christian Grothoff | Product Version | => Git master |
2012-07-04 09:27 | Christian Grothoff | Target Version | => 0.9.5 |
2012-09-29 21:24 | Christian Grothoff | Priority | normal => high |
2012-10-07 14:19 | Christian Grothoff | Assigned To | => Christian Grothoff |
2012-10-07 14:19 | Christian Grothoff | Status | confirmed => assigned |
2012-10-17 23:01 | Christian Grothoff | Note Added: 0006449 | |
2012-10-18 11:43 | Christian Grothoff | Note Added: 0006450 | |
2012-10-18 11:43 | Christian Grothoff | Status | assigned => feedback |
2012-12-04 20:12 | Christian Grothoff | Note Added: 0006640 | |
2012-12-04 20:12 | Christian Grothoff | Status | feedback => resolved |
2012-12-04 20:12 | Christian Grothoff | Fixed in Version | => 0.9.5 |
2012-12-04 20:12 | Christian Grothoff | Resolution | open => fixed |
2012-12-21 16:49 | Christian Grothoff | Status | resolved => closed |
2013-10-02 13:56 | Christian Grothoff | Category | namestore => namestore service |