View Issue Details

IDProjectCategoryView StatusLast Update
0008620libextractorlibextractor main librarypublic2024-04-17 20:26
Reporterapteryx Assigned ToChristian Grothoff  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version1.13 
Target Version1.14Fixed in Version1.14 
Summary0008620: libextractor searches tidy-html include as <tidy/tidy.h>; is packaged simply as <tidy.h>
DescriptionThe tidy-html detection in configure.ac is flawed; it only looks for a header named <tidy/tidy.h>. A recent version of tidy-html (5.8.0) includes get installed by its cmake build system like so:

$ find /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include
/gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include
/gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidybuffio.h
/gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidy.h
/gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidyenum.h
/gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidyplatform.h

Thus the check/usage should be simply using <tidy.h> rather than <tidy/tidy.h>. See here tidy-html build system here: https://github.com/htacg/tidy-html5/blob/d08ddc2860aa95ba8e301343a30837f157977cba/CMakeLists.txt#L361 to see that the headers are indeed to be installed directly under 'include/', not 'tidy/'.

tidy/tidy.h should still be considered for older versions.
TagsNo tags attached.

Activities

apteryx

2024-03-13 03:40

reporter   ~0021882

Patch sent to bug-libextractor@gnu.org, with Message-ID 20240313023849.16390-1 ... ("html_extractor: Add support for modern tidy-html.")

Christian Grothoff

2024-04-10 23:57

manager   ~0022201

Hmm. I didn't get the patch. Maybe the list / alias is broken? Care to attach it here?

apteryx

2024-04-17 20:05

reporter   ~0022268

Oh, not sure what went wrong. Perhaps my email is stuck in the moderation queue or similar?

Christian Grothoff

2024-04-17 20:26

manager   ~0022270

I had to patch around some more to keep it *also* working on Debian. Result is in a75f40b..d68210a, should now work for *both* types of installations.

Issue History

Date Modified Username Field Change
2024-03-10 03:45 apteryx New Issue
2024-03-13 03:40 apteryx Note Added: 0021882
2024-04-10 23:57 Christian Grothoff Note Added: 0022201
2024-04-17 20:05 apteryx Note Added: 0022268
2024-04-17 20:05 apteryx File Added: 0001-html_extractor-Add-support-for-modern-tidy-html.patch
2024-04-17 20:26 Christian Grothoff Note Added: 0022270
2024-04-17 20:26 Christian Grothoff Assigned To => Christian Grothoff
2024-04-17 20:26 Christian Grothoff Status new => resolved
2024-04-17 20:26 Christian Grothoff Resolution open => fixed
2024-04-17 20:26 Christian Grothoff Fixed in Version => 1.14
2024-04-17 20:26 Christian Grothoff Target Version => 1.14