View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0008620 | libextractor | libextractor main library | public | 2024-03-10 03:45 | 2024-04-17 20:26 |
Reporter | apteryx | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | 1.13 | ||||
Target Version | 1.14 | Fixed in Version | 1.14 | ||
Summary | 0008620: libextractor searches tidy-html include as <tidy/tidy.h>; is packaged simply as <tidy.h> | ||||
Description | The tidy-html detection in configure.ac is flawed; it only looks for a header named <tidy/tidy.h>. A recent version of tidy-html (5.8.0) includes get installed by its cmake build system like so: $ find /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidybuffio.h /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidy.h /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidyenum.h /gnu/store/pbabjp0f5z86dhb064hg2abqkw6wx2r9-tidy-html-5.8.0/include/tidyplatform.h Thus the check/usage should be simply using <tidy.h> rather than <tidy/tidy.h>. See here tidy-html build system here: https://github.com/htacg/tidy-html5/blob/d08ddc2860aa95ba8e301343a30837f157977cba/CMakeLists.txt#L361 to see that the headers are indeed to be installed directly under 'include/', not 'tidy/'. tidy/tidy.h should still be considered for older versions. | ||||
Tags | No tags attached. | ||||
|
Patch sent to bug-libextractor@gnu.org, with Message-ID 20240313023849.16390-1 ... ("html_extractor: Add support for modern tidy-html.") |
|
Hmm. I didn't get the patch. Maybe the list / alias is broken? Care to attach it here? |
|
Oh, not sure what went wrong. Perhaps my email is stuck in the moderation queue or similar? 0001-html_extractor-Add-support-for-modern-tidy-html.patch (2,414 bytes)
From 1fc6daaeaf829fb941a176831c011888a73c43b9 Mon Sep 17 00:00:00 2001 From: Maxim Cournoyer <maxim.cournoyer@gmail.com> Date: Mon, 11 Mar 2024 09:36:26 -0400 Subject: [PATCH] html_extractor: Add support for modern tidy-html. * configure.ac: Use PKG_PROG_PKG_CONFIG to initialize pkg-config detection. <tidy>: Check for library via pkg-config. * src/plugins/html_extractor.c: Standardize tidy include file names. --- configure.ac | 28 +++++++++------------------- src/plugins/html_extractor.c | 4 ++-- 2 files changed, 11 insertions(+), 21 deletions(-) diff --git a/configure.ac b/configure.ac index d17ff39..e89d70c 100644 --- a/configure.ac +++ b/configure.ac @@ -176,6 +176,8 @@ AS_CASE(["$target_os"], AM_ICONV +PKG_PROG_PKG_CONFIG() + # We define the paths here, because MinGW/GCC expands paths # passed through the command line ("-DLOCALEDIR=..."). This would # lead to hard-coded paths ("C:\mingw\mingw\bin...") that do @@ -424,25 +426,13 @@ AC_CHECK_LIB(magic, magic_open, AM_CONDITIONAL(HAVE_MAGIC, false))], AM_CONDITIONAL(HAVE_MAGIC, false)) -AC_MSG_CHECKING(for tidyNodeGetValue -ltidy) -AC_LANG_PUSH(C++) -SAVED_LIBS=$LIBS -LIBS="$LIBS -ltidy" -AC_LINK_IFELSE( - [AC_LANG_PROGRAM([[#include <tidy/tidy.h>]], - [[ Bool b = tidyNodeGetValue (NULL, NULL, NULL); ]])], - [AC_MSG_RESULT(yes) - AM_CONDITIONAL(HAVE_TIDY, true) - AC_DEFINE(HAVE_TIDY,1,[Have tidyNodeGetValue in libtidy])], - [AC_MSG_RESULT(no) - AM_CONDITIONAL(HAVE_TIDY, false)]) -LIBS=$SAVED_LIBS -AC_LANG_POP(C++) - -# restore LIBS -LIBS=$LIBSOLD - - +dnl tidyNodeGetValue was already available in 5.0.0, released in 2015. +PKG_CHECK_MODULES([TIDY], [tidy >= 5.0.0], + [AC_DEFINE(HAVE_TIDY, 1, [Have tidy]) + AM_CONDITIONAL(HAVE_TIDY, true)], + [AM_CONDITIONAL(HAVE_TIDY, false)]) +CFLAGS="$CFLAGS $TIDY_CFLAGS" +LIBS="$LIBS $TIDY_LIBS" # should 'make check' run tests? AC_MSG_CHECKING(whether to run tests) diff --git a/src/plugins/html_extractor.c b/src/plugins/html_extractor.c index 5ebf97b..88100d3 100644 --- a/src/plugins/html_extractor.c +++ b/src/plugins/html_extractor.c @@ -26,8 +26,8 @@ #include "platform.h" #include "extractor.h" #include <magic.h> -#include <tidy/tidy.h> -#include <tidy/tidybuffio.h> +#include <tidy.h> +#include <tidybuffio.h> /** * Mapping of HTML META names to LE types. base-commit: a75f40b64b5868967c95ea214e8eaac4f7088b23 -- 2.41.0 |
|
I had to patch around some more to keep it *also* working on Debian. Result is in a75f40b..d68210a, should now work for *both* types of installations. |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-03-10 03:45 | apteryx | New Issue | |
2024-03-13 03:40 | apteryx | Note Added: 0021882 | |
2024-04-10 23:57 | Christian Grothoff | Note Added: 0022201 | |
2024-04-17 20:05 | apteryx | Note Added: 0022268 | |
2024-04-17 20:05 | apteryx | File Added: 0001-html_extractor-Add-support-for-modern-tidy-html.patch | |
2024-04-17 20:26 | Christian Grothoff | Note Added: 0022270 | |
2024-04-17 20:26 | Christian Grothoff | Assigned To | => Christian Grothoff |
2024-04-17 20:26 | Christian Grothoff | Status | new => resolved |
2024-04-17 20:26 | Christian Grothoff | Resolution | open => fixed |
2024-04-17 20:26 | Christian Grothoff | Fixed in Version | => 1.14 |
2024-04-17 20:26 | Christian Grothoff | Target Version | => 1.14 |