View Issue Details

IDProjectCategoryView StatusLast Update
0002090GNUnetcadet servicepublic2012-02-28 11:05
ReporterBart Polot Assigned ToBart Polot  
PriorityurgentSeverityminorReproducibilitysometimes
Status closedResolutionfixed 
Product VersionGit master 
Target Version0.9.2Fixed in Version0.9.2 
Summary0002090: Improper VPN cleanup
DescriptionJan 19 23:22:22-063860 vpn-11574 DEBUG Peer CF6H disconnected from tunnel.
Jan 19 23:22:22-071765 vpn-11574 ERROR Assertion failed at mesh_api.c:1290.
Jan 19 23:22:22-072147 vpn-11574 ERROR Assertion failed at mesh_api.c:1290.
Jan 19 23:22:22-072343 vpn-11574 ERROR Assertion failed at mesh_api.c:1290.
==11574== Invalid write of size 8
==11574== at 0x4032F4: destroy_tunnel_task (gnunet-service-vpn.c:584)
==11574== by 0x5271FA3: GNUNET_SCHEDULER_run (scheduler.c:684)
==11574== by 0x527ABD5: GNUNET_SERVICE_run (service.c:1712)
==11574== by 0x4020C8: main (gnunet-service-vpn.c:2447)
==11574== Address 0x67fc518 is 280 bytes inside a block of size 336 free'd
==11574== at 0x4C28A9E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11574== by 0x402D9D: free_tunnel_state (gnunet-service-vpn.c:567)
==11574== by 0x4033CB: cleanup_tunnel (gnunet-service-vpn.c:2159)
==11574== by 0x5256E25: GNUNET_CONTAINER_multihashmap_iterate (container_multihashmap.c:213)
==11574== by 0x404687: cleanup (gnunet-service-vpn.c:2193)
==11574== by 0x5271FA3: GNUNET_SCHEDULER_run (scheduler.c:684)
==11574== by 0x527ABD5: GNUNET_SERVICE_run (service.c:1712)
==11574== by 0x4020C8: main (gnunet-service-vpn.c:2447)
==11574==
==11574== Invalid read of size 8
==11574== at 0x4032FF: destroy_tunnel_task (gnunet-service-vpn.c:585)
==11574== by 0x5271FA3: GNUNET_SCHEDULER_run (scheduler.c:684)
==11574== by 0x527ABD5: GNUNET_SERVICE_run (service.c:1712)
==11574== by 0x4020C8: main (gnunet-service-vpn.c:2447)
==11574== Address 0x67fc400 is 0 bytes inside a block of size 336 free'd
==11574== at 0x4C28A9E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11574== by 0x402D9D: free_tunnel_state (gnunet-service-vpn.c:567)
==11574== by 0x4033CB: cleanup_tunnel (gnunet-service-vpn.c:2159)
==11574== by 0x5256E25: GNUNET_CONTAINER_multihashmap_iterate (container_multihashmap.c:213)
==11574== by 0x404687: cleanup (gnunet-service-vpn.c:2193)
==11574== by 0x5271FA3: GNUNET_SCHEDULER_run (scheduler.c:684)
==11574== by 0x527ABD5: GNUNET_SERVICE_run (service.c:1712)
==11574== by 0x4020C8: main (gnunet-service-vpn.c:2447)
==11574==
Jan 19 23:22:22-085159 vpn-11574 ERROR Assertion failed at gnunet-service-vpn.c:585.
Steps To ReproduceCtrl-C during an active wget.
Additional InformationAssertion failed at mesh_api.c:1290 means that something was in the transmit queue on mesh disconnect. Looks like this is not mesh_api's fault, but I could be wrong...
TagsNo tags attached.

Relationships

has duplicate 0002094 closedBart Polot aborting gnunet-service-vpn fails (assertion at mesh_api.c:1296). 
child of 0002064 closedChristian Grothoff new VPN service (working with new exit/dns services) needs to be fully implemented and tested 

Activities

Christian Grothoff

2012-01-20 09:36

manager   ~0005318

Partial fix (the valgrind error) is in SVN 19276. The rest is clearly more complicated.

Christian Grothoff

2012-01-20 09:44

manager   ~0005319

I think what is confusing is who is to call whom. On MESH_disconnect, mesh notifies vpn about all tunnels going down (presumably after all peers disconnect?). Then, VPN goes into 'free_tunnel_state' and there tells MESH to destroy the tunnel (which MESH just told us was being destroyed!).

Similarly, it is possible that VPN tell MESH to destroy a tunnel, and then VPN comes back to tell VPN that a tunnel is being destroyed. VPN currently handles this by setting the internal 'tunnel' field in the TunnelState to NULL just before calling MESH_tunnel_destroy on the tunnel (which, for this, is cached in a local variable).

I suspect the assertion failures relate to this mess; maybe mesh_api doesn't handle a MESH_tunnel_destroy being called while it is currently notifying clients about the same tunnel going down. In any case, this is all pretty ugly and I wonder if we cannot have a cleaner MESH API design here.

What about we simply specify that MESH API does *not* notify a client about a tunnel going down if that client did call MESH_tunnel_destroy, and conversely that clients must *not* call MESH_tunnel_destroy if MESH told them that the tunnel went down (via tunnel cleanup callback)? If we then additionally put some assertions to check that this invariant is maintained on both sides, I think we'd have a much cleaner API --- and likely this issue would go away.

Christian Grothoff

2012-01-21 16:47

manager   ~0005330

I believe VPN/Exit now behave properly wrt tunnel destruction (in line with note 5319 above). Is the mesh api fixed as well?

Bart Polot

2012-01-23 16:03

manager   ~0005353

Steps to reproduce:
- start mesh
- start exit
- start vpn
- do gnunet-vpn -t -i 131.159.74.67
- ctrl-c vpn

Christian Grothoff

2012-01-23 21:23

manager   ~0005357

This is totally a bug in mesh_api.c:
$ gnunet-service-vpn
Created tunnel 0x23bc3d0
^CDestroying tunnel 0x23bc3d0
Jan 23 21:18:55-002053 vpn-24698 ERROR Assertion failed at mesh_api.c:1298.
Jan 23 21:18:55-002073 vpn-24698 ERROR Assertion failed at mesh_api.c:1298.
Jan 23 21:18:55-002078 vpn-24698 ERROR Assertion failed at mesh_api.c:1298.

(this is after adding fprintf's for the tunnel PTR before the respective tunnel_create/tunnel_destroy calls in VPN). Note that this all happens in the same task, and you check that "no" messages of certain types are left in your queue. I don't see how that assertion can hold, as you just yourself queued the '_MESH_LOCAL_TUNNEL_DESTROY' notification (and had no chance to send it yet) and there is in fact no guarantee that you had a chance to send anything related to the tunnel yet.

Bart Polot

2012-01-24 03:51

manager   ~0005363

Fixed in r19335.

Issue History

Date Modified Username Field Change
2012-01-19 23:27 Bart Polot New Issue
2012-01-19 23:27 Bart Polot Status new => assigned
2012-01-19 23:27 Bart Polot Assigned To => Christian Grothoff
2012-01-20 09:29 Christian Grothoff Priority normal => urgent
2012-01-20 09:29 Christian Grothoff Target Version => 0.9.2
2012-01-20 09:36 Christian Grothoff Note Added: 0005318
2012-01-20 09:44 Christian Grothoff Note Added: 0005319
2012-01-20 09:44 Christian Grothoff Assigned To Christian Grothoff => Bart Polot
2012-01-21 16:47 Christian Grothoff Note Added: 0005330
2012-01-21 16:47 Christian Grothoff Relationship added child of 0002064
2012-01-23 15:40 Bart Polot Relationship added has duplicate 0002094
2012-01-23 16:03 Bart Polot Assigned To Bart Polot => Christian Grothoff
2012-01-23 16:03 Bart Polot Note Added: 0005353
2012-01-23 21:23 Christian Grothoff Note Added: 0005357
2012-01-23 21:23 Christian Grothoff Category VPN service => mesh service
2012-01-23 21:24 Christian Grothoff Assigned To Christian Grothoff => Bart Polot
2012-01-24 03:51 Bart Polot Note Added: 0005363
2012-01-24 03:51 Bart Polot Status assigned => resolved
2012-01-24 03:51 Bart Polot Fixed in Version => Git master
2012-01-24 03:51 Bart Polot Resolution open => fixed
2012-01-24 09:40 Christian Grothoff Fixed in Version Git master => 0.9.2
2012-02-28 11:05 Christian Grothoff Status resolved => closed
2014-05-09 18:34 Christian Grothoff Category mesh service => cadet service