View Issue Details

IDProjectCategoryView StatusLast Update
0003021GNUnetset servicepublic2013-12-24 20:54
ReporterLRN Assigned ToLRN  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.10.0Fixed in Version0.10.0 
Summary0003021: W32: set service crashes (bad read/write sets) during test_set_api
Descriptionbacktrace is shown below
Additional Information
Reading symbols from d:\progs\gnunet\lib\gnunet\libexec\gnunet-service-set.exe...done.
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 8356.0x2ae0]
0x62b6baec in GNUNET_NETWORK_fdset_add (dst=0x2449cd8, src=0x1) at network.c:949
949       for (i = 0; i < src->sds.fd_count; i++)
(gdb) bt
#0  0x62b6baec in GNUNET_NETWORK_fdset_add (dst=0x2449cd8, src=0x1) at network.c:949
#1  0x62b796f9 in update_sets (rs=0x2449cd8, ws=0x244acf0, timeout=0x28fc60) at scheduler.c:322
#2  0x62b7a4ba in GNUNET_SCHEDULER_run (task=0x62b873f4 <service_task>, task_cls=0x28fda8) at scheduler.c:749
#3  0x62b88245 in GNUNET_SERVICE_run (argc=5, argv=0x243ca38, service_name=0x412429 <_Jv_RegisterClasses+4269097> "set", options=GNUNET_SERVICE_OPTION_NONE, task=0x404f35 <run>, task_cls=0x0)
    at service.c:1825
#4  0x00405117 in main (argc=5, argv=0x243ca38) at gnunet-service-set.c:1164
(gdb) up
#1  0x62b796f9 in update_sets (rs=0x2449cd8, ws=0x244acf0, timeout=0x28fc60) at scheduler.c:322
322           GNUNET_NETWORK_fdset_add (rs, pos->read_set);
(gdb) p rs
$1 = (struct GNUNET_NETWORK_FDSet *) 0x2449cd8
(gdb) p *rs
$2 = {nsds = 0, sds = {fd_count = 0, fd_array = {620, 640, 4, 8, 8, 580, 580, 580, 0 <repeats 1016 times>}}, handles = 0xcdd7c0}
(gdb) p pos
$3 = (struct Task *) 0x244be88
(gdb) p *pos
$4 = {next = 0x244c038, callback = 0x62b50823 <transmit_ready>, callback_cls = 0xcdda30, read_set = 0x1, write_set = 0x2726162, id = 83, timeout = {abs_value_us = 18446744073709551615},
  reason = GNUNET_SCHEDULER_REASON_STARTUP, priority = GNUNET_SCHEDULER_PRIORITY_DEFAULT, read_fd = -1, write_fd = -1, lifeness = 1}
(gdb) p pos->write_set
$5 = (struct GNUNET_NETWORK_FDSet *) 0x2726162
(gdb) p *pos->write_set
Cannot access memory at address 0x2726162
(gdb) p *pos->next
$6 = {next = 0x244c080, callback = 0x62b4f87f <receive_ready>, callback_cls = 0x2449748, read_set = 0x2454028, write_set = 0x0, id = 67, timeout = {abs_value_us = 18446744073709551615},
  reason = GNUNET_SCHEDULER_REASON_STARTUP, priority = GNUNET_SCHEDULER_PRIORITY_DEFAULT, read_fd = -1, write_fd = -1, lifeness = 1}
(gdb) p *pos->next->read_set
$7 = {nsds = 621, sds = {fd_count = 1, fd_array = {620, 0 <repeats 1023 times>}}, handles = 0xcdd868}
TagsNo tags attached.

Activities

LRN

2013-09-07 16:20

reporter   ~0007417

Sometimes it crashes in _msize() instead:
Reading symbols from d:\progs\gnunet\lib\gnunet\libexec\gnunet-service-set.exe...done.
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 3284.0x5ec]
0x771b30bd in ntdll!RtlSizeHeap () from C:\Windows\SysWOW64\ntdll.dll
(gdb) bt
#0  0x771b30bd in ntdll!RtlSizeHeap () from C:\Windows\SysWOW64\ntdll.dll
#1  0x74ebf480 in msvcrt!_msize () from C:\Windows\syswow64\msvcrt.dll
#2  0x00c50000 in ?? ()
#3  0x62b4672f in GNUNET_xfree_ (ptr=0xc5eab0, filename=0x62ba2440 <__FUNCTION__.97729+60> "scheduler.c", linenumber=530) at common_allocation.c:227
#4  0x62b79b97 in destroy_task (t=0xc5eab0) at scheduler.c:530
#5  0x62b79f06 in run_ready (rs=0x24ea628, ws=0x24eb640) at scheduler.c:606
#6  0x62b7a722 in GNUNET_SCHEDULER_run (task=0x62b873f4 <service_task>, task_cls=0x28fda8) at scheduler.c:786
#7  0x62b88245 in GNUNET_SERVICE_run (argc=5, argv=0x24dc818, service_name=0x412429 <_Jv_RegisterClasses+4269097> "set", options=GNUNET_SERVICE_OPTION_NONE, task=0x404f35 <run>, task_cls=0x0)
    at service.c:1825
#8  0x00405117 in main (argc=5, argv=0x24dc818) at gnunet-service-set.c:1164
(gdb) up
#1  0x74ebf480 in msvcrt!_msize () from C:\Windows\syswow64\msvcrt.dll
(gdb)
#2  0x00c50000 in ?? ()
(gdb)
#3  0x62b4672f in GNUNET_xfree_ (ptr=0xc5eab0, filename=0x62ba2440 <__FUNCTION__.97729+60> "scheduler.c", linenumber=530) at common_allocation.c:227
227         size_t s = M_SIZE (ptr);
(gdb) up
#4  0x62b79b97 in destroy_task (t=0xc5eab0) at scheduler.c:530
530       GNUNET_free (t);
(gdb) p t
$1 = (struct Task *) 0xc5eab0
(gdb) p *t
$2 = {next = 0x0, callback = 0x62b43d95 <receive_task>, callback_cls = 0xc5c490, read_set = 0x0, write_set = 0x0, id = 87, timeout = {abs_value_us = 1378563519781664},
  reason = GNUNET_SCHEDULER_REASON_TIMEOUT, priority = GNUNET_SCHEDULER_PRIORITY_DEFAULT, read_fd = -1, write_fd = -1, lifeness = 1}

LRN

2013-09-07 17:55

reporter   ~0007418

I've added task checking routine that goes through all the tasks and checks for read_set == 0x01. Here are the results:
сен 07 19:41:11-00000000000255026977 util-scheduler-2524 DEBUG Checking tasks sets
сен 07 19:41:11-00000000000255026977 util-scheduler-2524 DEBUG Running task: 83 / 00e7a8e0
сен 07 19:41:11-00000000000255026977 util-2524 DEBUG Received message of type 285 and size 15
сен 07 19:41:11-00000000000255026977 mesh-api-2524 DEBUG 
сен 07 19:41:11-00000000000255026977 mesh-api-2524 DEBUG Received a message: GNUNET_MESSAGE_TYPE_MESH_LOCAL_DATA
сен 07 19:41:11-00000000000255026977 mesh-api-2524 DEBUG Got a data message!
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG   fwd data on tunnel M1RG [B0000000]
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG   payload type 584
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG     checking handler for type 581
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG     checking handler for type 583
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG     checking handler for type 584
сен 07 19:41:11-00000000000255026978 set-2524 DEBUG dispatching mesh message (type: 584)
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG Sending ACK on tunnel B0000000
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG  Sending message to service: GNUNET_MESSAGE_TYPE_MESH_LOCAL_ACK
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG   queued
сен 07 19:41:11-00000000000255026978 mesh-api-2524 DEBUG   calling ntfy tmt rdy for 8 bytes
сен 07 19:41:11-00000000000255026978 util-2524 DEBUG Scheduling transmission (00d99d18).
сен 07 19:41:11-00000000000255026978 util-scheduler-2524 DEBUG Checking tasks sets
сен 07 19:41:11-00000000000255026979 util-scheduler-2524 DEBUG Adding task: 84 / 00d99d18
сен 07 19:41:11-00000000000255026979 set-2524 DEBUG received p2p message (t: 584, s: 7)
сен 07 19:41:11-00000000000255026979 set-2524 DEBUG got element from peer
сен 07 19:41:11-00000000000255026979 set-2524 DEBUG sending element (size 3) to client
сен 07 19:41:11-00000000000255026979 util-2524 DEBUG Scheduling transmission (00d99e38).
сен 07 19:41:11-00000000000255026979 util-scheduler-2524 DEBUG Checking tasks sets
сен 07 19:41:11-00000000000255026979 util-scheduler-2524 DEBUG task 00d9ed98 (68) has read set == 0x01!

The task that was corrupted is one of the pending tasks (set code should not have any references to it, corruption happens purely by (un)luck).
The taskset was ok before task 83 was run, and was ok when 84 was added. It was corrupted between addition of task 84 and addition of task 85 (well, it would have been a task 85, but the code didn't reach that point, as it checks the tasksset at the beginning of the GNUNET_SCHEDULER_add_*)

LRN

2013-09-07 18:06

reporter   ~0007419

Found the source of corruption:

      GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, "got element from peer\n");
      check_tasks (NULL);/* <------------- does not crash here */
      if ( (eo->phase != PHASE_EXPECT_ELEMENTS) &&
           (eo->phase != PHASE_EXPECT_ELEMENTS_AND_REQUESTS) )
      {
        fail_union_operation (eo);
        GNUNET_break (0);
        return;
      }
      element_size = ntohs (mh->size) - sizeof (struct GNUNET_MessageHeader);
      ee = GNUNET_malloc (sizeof *eo + element_size);
      memcpy (&ee[1], &mh[1], element_size);
      ee->element.size = element_size;
      ee->element.data = &ee[1];
      ee->remote = GNUNET_YES;
      check_tasks (NULL); /* <------------- crashes here */

Florian Dold

2013-09-07 18:19

developer   ~0007420

One thing that is definitely wrong here is the argument of GNUNET_malloc, which should take the sizeof *ee instead, but that does only allocate too much memory.

LRN, what exactly does your check_tasks (NULL) do? Would you have any idea how the above could could possibly corrupt a task?

LRN

2013-09-07 19:53

reporter   ~0007421

Fixed in r29091

Issue History

Date Modified Username Field Change
2013-09-05 19:25 LRN New Issue
2013-09-05 19:25 LRN Status new => assigned
2013-09-05 19:25 LRN Assigned To => Florian Dold
2013-09-07 16:20 LRN Note Added: 0007417
2013-09-07 17:55 LRN Note Added: 0007418
2013-09-07 18:06 LRN Note Added: 0007419
2013-09-07 18:19 Florian Dold Note Added: 0007420
2013-09-07 19:53 LRN Note Added: 0007421
2013-09-07 19:53 LRN Status assigned => resolved
2013-09-07 19:53 LRN Fixed in Version => Git master
2013-09-07 19:53 LRN Resolution open => fixed
2013-09-07 19:53 LRN Assigned To Florian Dold => LRN
2013-09-10 00:08 Christian Grothoff Product Version => Git master
2013-09-10 00:08 Christian Grothoff Fixed in Version Git master => 0.10.0
2013-09-10 00:08 Christian Grothoff Target Version => 0.10.0
2013-12-24 20:54 Christian Grothoff Status resolved => closed