View Issue Details

IDProjectCategoryView StatusLast Update
0010570TalerWeb site(s)public2025-11-09 23:24
Reporterhga3 Assigned Tohga3  
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
Product Versiongit (master) 
Summary0010570: Site gen failed when parsing news in function cut_text
Descriptionenv "BASEURL=" ./inc/build-site
Traceback (most recent call last):
  File "/home/user/www.taler.net/./inc/build-site", line 27, in <module>
    main()
  File "/home/user/www.taler.net/./inc/build-site", line 24, in main
    x.run()
  File "/home/user/www.taler.net/inc/sitegen/site.py", line 307, in run
    self.run_localized(locale, tr)
  File "/home/user/www.taler.net/inc/sitegen/site.py", line 225, in run_localized
    content = tmpl.render(
              ^^^^^^^^^^^^
  File "/home/user/www.taler.net/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 1295, in render
    self.environment.handle_exception()
  File "/home/user/www.taler.net/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 942, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/home/user/www.taler.net/template/rss.xml.j2", line 39, in top-level template code
    {{ get_abstract('news/' + newspostitem['page'], 1000) }}
^^^^^^^^^^^^^^^^
  File "/home/user/www.taler.net/inc/sitegen/site.py", line 152, in get_abstract
    return cut_text(root / "template" / (name + ".j2"), length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/www.taler.net/inc/sitegen/site.py", line 62, in cut_text
    for i in soup.findAll("p")[1]:
             ~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
make: *** [Makefile:11: site] Error 1
TagsNo tags attached.

Activities

hga3

2025-11-09 23:24

developer   ~0026352

Resolved with attached patch. The patch is applied and, after successful local testing, pushed to upstream onto "c76de6d942874529eaba97bc3d065863bf1b27d1" of git://git.gnunet.org/www_shared.git.
sitegen.site.py.patch (1,248 bytes)   
diff --git a/sitegen/site.py b/sitegen/site.py
index b7fbf6b..4aaddd3 100644
--- a/sitegen/site.py
+++ b/sitegen/site.py
@@ -58,13 +58,25 @@ def cut_text(filename, count):
         soup = BeautifulSoup(html, features="lxml")
         for script in soup(["script", "style"]):
             script.extract()
-        k = []
-        for i in soup.findAll("p")[1]:
-            k.append(i)
-        b = "".join(str(e) for e in k)
-        text = html2text(b.replace("\n", " "))
-        textreduced = (text[:count] + " [...]") if len(text) > count else (text)
-        return textreduced
+        paragraphs = soup.find_all("p")
+
+        # No <p> tags at all → return empty string
+        if not paragraphs:
+            return ""
+
+        # If only one <p>, use that one; otherwise use the second
+        target = paragraphs[1] if len(paragraphs) > 1 else paragraphs[0]
+
+        # Convert contents of the <p> to HTML string
+        b = "".join(str(e) for e in target.contents)
+
+        # Convert to text
+        text = html2text(b.replace("\n", " ")).strip()
+
+        # Truncate
+        if len(text) > count:
+            return text[:count] + " [...]"
+        return text
 
 
 def extract_body(text, content_id="newspost-content"):
sitegen.site.py.patch (1,248 bytes)   

Issue History

Date Modified Username Field Change
2025-11-09 23:19 hga3 New Issue
2025-11-09 23:19 hga3 Status new => assigned
2025-11-09 23:19 hga3 Assigned To => hga3
2025-11-09 23:24 hga3 Note Added: 0026352
2025-11-09 23:24 hga3 File Added: sitegen.site.py.patch