0006198
Reporterfefe Assigned ToChristian Grothoff  
Status closedResolutionfixed 
Product Version0.7.0 
Target Version0.7.1Fixed in Version0.7.1 
Summary0006198: language_matches appears to be functionally incorrect
Descriptionlanguage_matches comes from exchange/src/mhd/mhd_legal.c:

128 for (char *tok = strtok_r (p, ", ", &sptr);
129 NULL != tok;
130 tok = strtok_r (NULL, ", ", &sptr))

strtok_r takes the second argument as a set of char, not as a substring.
That means it would accept a space separated list where the spec clearly says a comma needs to be.

Christian Grothoff

2020-04-22 15:09

manager   ~0015731

Last edited: 2020-04-22 15:10

Yes, but there is John Postel saying "be generous in what you accept". HTTP clients are way too often _broken_. So this was intentional, to tolerate all kinds of (<space>|<comma>)+-combos. So unless we reject a valid syntax, match a valid syntax badly, or fail to reject a syntax that the spec says we MUST reject, I'm not sure this is 'wrong'.

Christian Grothoff

2020-04-22 18:59

manager   ~0015734

Ich hab mal bei mir rumgegraben und konnte eine Sammlung an HTTP headern finden. Im Anhang ein Auszug von dem, was ich 2013 an HTTP (plaintext) Accept-language headern fuer ein Forschungsminiprojekt gesammelt hatte. Was wir sehen: leerzeichen kommen vor, aber nicht als _exklusive_ Trenner. D.h. nur 'strtok (foo,";")' waere falsch/unzureichend, dann muessten wir mindestens noch zusaetzlich Leerzeichen-skipping oder so was implementieren. Geht, bin aber nicht sicher das es wirklich dadurch _besser_ wuerde.
sample (3,212 bytes)   
Accept-Language: ca-es
Accept-Language: ca-ES,ca;q=0.8
Accept-Language: de
Accept-Language: de-AT
Accept-Language: de-de
Accept-Language: de-DE
Accept-Language: de_DE
Accept-Language: de-de, de;q=0.75, en-us;q=0.50, en;q=0.25
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Language: de-DE,de;q=0.9,en;q=0.8
Accept-Language: de-DE,en,*
Accept-Language: de-DE, en-US
Accept-Language: de-DE, en-US, en, *
Accept-Language: de-de,en-us;q=0.7,en;q=0.3
Accept-Language: de_DE;q=1.00
Accept-Language: de_DE;q=1.00,en_US;q=0.83,ja_JP;q=0.67,fr_FR;q=0.50,es_ES;q=0.33
Accept-Language: de, en, fr, ja, nl, it, es, pt, pt-PT, da, fi, nb, sv, ko, zh-Hans, zh-Hant, ru, pl, tr, uk, ar, hr, cs, el, he, ro, sk, th, id, ms, en-GB, ca, hu, vi, en-us;q=0.8
Accept-Language: de,en;q=0.9
Accept-Language: de,en-US;q=0.9,en;q=0.8
Accept-Language: de;q=1.00,en;q=0.95,ja;q=0.89,fr;q=0.84,es;q=0.79,it;q=0.74,pt;q=0.68,pt-PT;q=0.63,nl;q=0.58,sv;q=0.53,nb;q=0.47,da;q=0.42,fi;q=0.37,ru;q=0.32,pl;q=0.26,zh-Hans;q=0.21,zh-Hant;q=0.16,ko;q=0.11
Accept-Language: de;q=1.00,en;q=0.95,ja;q=0.89,fr;q=0.84,es;q=0.79,it;q=0.74,pt;q=0.68,pt_PT;q=0.63,nl;q=0.58,sv;q=0.53,nb;q=0.47,da;q=0.42,fi;q=0.37,ru;q=0.32,pl;q=0.26,zh-Hans;q=0.21,zh-Hant;q=0.16,ko;q=0.11
Accept-Language: de;q=1.00,en;q=0.96,ja;q=0.91,fr;q=0.87,es;q=0.83,it;q=0.78,pt;q=0.74,pt_PT;q=0.70,nl;q=0.65,sv;q=0.61,nb;q=0.57,da;q=0.52,fi;q=0.48,ru;q=0.43,pl;q=0.39,zh-Hans;q=0.35,zh-Hant;q=0.30,ko;q=0.26,ar;q=0.22,cs;q=0.17,hu;q=0.13,tr;q=0.09
Accept-Language: de;q=1.00,en;q=0.97,ja;q=0.94,fr;q=0.90,es;q=0.87,it;q=0.84,pt;q=0.81,pt_PT;q=0.77,nl;q=0.74,sv;q=0.71,nb;q=0.68,da;q=0.65,fi;q=0.61,ru;q=0.58,pl;q=0.55,zh-Hans;q=0.52,zh-Hant;q=0.48,ko;q=0.45,ar;q=0.42,cs;q=0.39,hu;q=0.35,tr;q=0.32,th;q=0.29,ca;q=0.26,hr;q=0.23,el;q=0.19,he;q=0.16,ro;q=0.13,sk;q=0.10,uk;q=0.06
Accept-Language: de,zh-HK;q=0.8,z
Accept-Language: de,zh-HK;q=0.8,zh-MO;q=0.7,zh-SG;q=0.5,zh-TW;q=0.3,zh-CN;q=0.2
Accept-Language: en
Accept-Language: en-AU, en-US
Accept-Language: en-gb
Accept-Language: en-GB
Accept-Language: en-gb, en
Accept-Language: en-gb,en;q=0.5
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Accept-Language: en-us
Accept-Language: en-US
Accept-Language: en-us,de;q=0.5
Accept-Language: en-us, en
Accept-Language: en-us,en
Accept-Language: en-us,en;q=0.5
Accept-Language: en-US,en;q=0.5
Accept-Language: en-US,en;q=0.7,en;q=0.5
Accept-Language: en-US, en;q=0.8
Accept-Language: en-US,en;q=0.8
Accept-Language: en-US,ja;q=0.5
Accept-Language: es
Accept-Language: es-es
Accept-Language: es-ES
Accept-Language: es-ES,es;q=0.8
Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Language: es-ES,es;q=0.9,en;q=0.8
Accept-Language: es_ES;q=1.00,en_US;q=0.83,ja_JP;q=0.67,fr_FR;q=0.50,de_DE;q=0.33
Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Language: ja-JP
Accept-Language: pl,en-us;q=0.7,en;q=0.3
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Language: sk,cs;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Language: zh
accept-language: zh-cn
Accept-Language: zh-cn
Accept-Language: zh-CN
Accept-Language: zh-CN,zh;q=0.8
Accept-Language: zh-tw
sample (3,212 bytes)   


2020-04-23 15:38

developer   ~0015767

AFAIK the right behavior is to tokenize by , and ; and then skip spaces.

Christian Grothoff

2020-04-23 16:11

manager   ~0015770

Fixed in 8b99abbe..de61e06e

Christian Grothoff

2021-09-02 18:14

manager   ~0018253

Fix committed to master branch.

