View Issue Details

IDProjectCategoryView StatusLast Update
0002701GNUnetfile-sharing servicepublic2012-12-21 16:49
ReporterLRN Assigned ToLRN  
PrioritynormalSeveritycrashReproducibilityhave not tried
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.9.5Fixed in Version0.9.5 
Summary0002701: Crash in fs service
DescriptionFirst crash i've had in, like, a week.

fs service crashed.

There were no warnings/errors from that process in the log.
Additional Information
Reading symbols from D:\Progs\GNUnet\lib\gnunet\libexec\gnunet-service-fs.exe...done.
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 13492.0x1d5c]
GSF_peer_transmit_cancel_ (pth=0xdf0adba) at gnunet-service-fs_cp.c:1447
1447      if (GNUNET_SCHEDULER_NO_TASK != pth->timeout_task)
(gdb) bt
#0  GSF_peer_transmit_cancel_ (pth=0xdf0adba) at gnunet-service-fs_cp.c:1447
#1  0x0040ad07 in schedule_peer_transmission (cls=0x25e4d98, tc=0x28fd34) at gnunet-service-fs_pe.c:450
#2  0x62b7b673 in run_ready (ws=0xc8a000, rs=0xc88fe8) at scheduler.c:597
#3  GNUNET_SCHEDULER_run (task=task@entry=0x62b83ec4 <service_task>, task_cls=task_cls@entry=0x28fdb8) at scheduler.c:785
#4  0x62b86b39 in GNUNET_SERVICE_run (argc=3, argv=0x253be98, service_name=0x418188 <__register_frame_info+4293000> "fs", options=GNUNET_SERVICE_OPTION_NONE, task=0x401a20 <run>, task_cls=0x0) at service.c:1813
#5  0x004163a8 in main (argc=3, argv=0x253be98) at gnunet-service-fs.c:709
(gdb) p *pth
Cannot access memory at address 0xdf0adba
(gdb) up
#1  0x0040ad07 in schedule_peer_transmission (cls=0x25e4d98, tc=0x28fd34) at gnunet-service-fs_pe.c:450
450         GSF_peer_transmit_cancel_ (pp->pth);
TagsNo tags attached.
Attached Files
0001-Don-t-forget-to-cancel-old-tasks-in-FS.patch (1,786 bytes)   
From 1b745661e4c5086e48ca5484814ad2c6220ba7b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=A0=C2=A0=D0=A1=D1=93=D0=A1=D0=83=D0=A0=C2=BB=D0=A0=C2=B0?=
 =?UTF-8?q?=D0=A0=D0=85=20=D0=A0=C2=98=D0=A0=C2=B6=D0=A0=C2=B1=D0=A1=D1=93=D0?=
 =?UTF-8?q?=A0=C2=BB=D0=A0=C2=B0=D0=A1=E2=80=9A=D0=A0=D1=95=D0=A0=D0=86?=
 <lrn1986@gmail.com>
Date: Fri, 14 Dec 2012 15:06:20 +0400
Subject: [PATCH] Don't forget to cancel old tasks in FS

---
 src/fs/gnunet-service-fs_pe.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/fs/gnunet-service-fs_pe.c b/src/fs/gnunet-service-fs_pe.c
index c7cac75..0992d21 100644
--- a/src/fs/gnunet-service-fs_pe.c
+++ b/src/fs/gnunet-service-fs_pe.c
@@ -391,6 +391,9 @@ transmit_message_callback (void *cls, size_t buf_size, void *buf)
   if (NULL == buf)
   {
     /* failed, try again... */
+    if (GNUNET_SCHEDULER_NO_TASK != pp->task)
+      GNUNET_SCHEDULER_cancel (pp->task);
+
     pp->task = GNUNET_SCHEDULER_add_now (&schedule_peer_transmission, pp);
     GNUNET_STATISTICS_update (GSF_stats,
                               gettext_noop
@@ -401,12 +404,16 @@ transmit_message_callback (void *cls, size_t buf_size, void *buf)
   rp = GNUNET_CONTAINER_heap_peek (pp->priority_heap);
   if (NULL == rp)
   {
+    if (GNUNET_SCHEDULER_NO_TASK != pp->task)
+      GNUNET_SCHEDULER_cancel (pp->task);
     pp->task = GNUNET_SCHEDULER_add_now (&schedule_peer_transmission, pp);
     return 0;
   }
   msize = GSF_pending_request_get_message_ (get_latest (rp), buf_size, buf);
   if (msize > buf_size)
   {
+    if (GNUNET_SCHEDULER_NO_TASK != pp->task)
+      GNUNET_SCHEDULER_cancel (pp->task);
     /* buffer to small (message changed), try again */
     pp->task = GNUNET_SCHEDULER_add_now (&schedule_peer_transmission, pp);
     return 0;
-- 
1.7.11

Activities

Matthias Wachs

2012-12-13 17:31

manager   ~0006700

On every free of a GSF_PeerTransmitHandle timeout task is canceled

Matthias Wachs

2012-12-13 17:32

manager   ~0006701

timeout_task is only scheduled in on place: 1430

LRN

2012-12-13 17:33

developer   ~0006702

Is that true for r25337? Because that is what i'm currently running.

Matthias Wachs

2012-12-13 17:38

manager   ~0006703

Last edited: 2012-12-13 17:49

Try to cancel an invalid request (Use after free)?
Possible ...

Matthias Wachs

2012-12-13 17:53

manager   ~0006704

Last edited: 2012-12-13 17:55

Places where transmit_request can be forgotten:

GSF_block_peer_migration_
migration_pth
-> Checked

fs_pe.c
schedule_peer_transmission
-> checked

GSF_push_stop_
-> checked

Matthias Wachs

2012-12-13 18:00

manager   ~0006705

Is that true for r25337? Because that is what i'm currently running.
-> Last change in 24484 10/23/12

LRN

2012-12-14 00:33

developer   ~0006706

I think my gdb log does not describe the problem completely.
The problem is that the cls (pp) given to schedule_peer_transmission() is freed before use (pp->pth pointer itself is poisoned -> pp contents are poisoned).

So the only suspect is GSF_plan_notify_peer_disconnect_ not cancelling a schedule_peer_transmission() before freeing pp.

Or something else in the code does things with pp, and loses track of it.

LRN

2012-12-14 12:27

developer   ~0006709

Uploaded a patch. I think it would fix this issue.

I _THINK_, because i have not been able to reproduce it.

I did run my node with a modification of this patch that does not cancel tasks, but logs an error message instead, and proceeds normally. Despite a large number of error messages logged this way, my node did not crash for 14 hours straight.

Apparently, the condition is rare. From what i can see in the code, schedule_peer_transmission() is almost always scheduled "now", which makes it unlikely that another function would have a chance to free the plan before scheduled schedule_peer_transmission() is called.

Christian Grothoff

2012-12-15 15:28

manager   ~0006714

I've reviewed your patch and found an execution path (GSF_plan_add_ called 2nd time and finds an existing 'pp', resulting in a call to 'plan' which happens to not have a 'task' but an active transmission request (NULL != pp->pth); that one may then be run before the task added in 'plan', and then may overwrite 'task' without checking (which is fixed by LRN's patch); if then the pp is destroyed before the dangling task is executed, we get the crash. Rare indeed.

Issue History

Date Modified Username Field Change
2012-12-13 17:11 LRN New Issue
2012-12-13 17:31 Matthias Wachs Note Added: 0006700
2012-12-13 17:32 Matthias Wachs Note Added: 0006701
2012-12-13 17:33 LRN Note Added: 0006702
2012-12-13 17:38 Matthias Wachs Note Added: 0006703
2012-12-13 17:49 Matthias Wachs Note Edited: 0006703
2012-12-13 17:53 Matthias Wachs Note Added: 0006704
2012-12-13 17:55 Matthias Wachs Note Edited: 0006704
2012-12-13 18:00 Matthias Wachs Note Added: 0006705
2012-12-14 00:33 LRN Note Added: 0006706
2012-12-14 12:21 LRN File Added: 0001-Don-t-forget-to-cancel-old-tasks-in-FS.patch
2012-12-14 12:27 LRN Note Added: 0006709
2012-12-15 15:28 Christian Grothoff Note Added: 0006714
2012-12-15 15:28 Christian Grothoff Status new => resolved
2012-12-15 15:28 Christian Grothoff Fixed in Version => 0.9.5
2012-12-15 15:28 Christian Grothoff Resolution open => fixed
2012-12-15 15:28 Christian Grothoff Assigned To => LRN
2012-12-15 15:28 Christian Grothoff Product Version => Git master
2012-12-15 15:28 Christian Grothoff Target Version => 0.9.5
2012-12-21 16:49 Christian Grothoff Status resolved => closed