View Issue Details

IDProjectCategoryView StatusLast Update
0002578GNUnetfile-sharing servicepublic2012-11-05 18:33
Reporterbratao Assigned ToChristian Grothoff  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
PlatformDebian SqueezeOSLinux 
Product VersionGit master 
Target Version0.9.4 
Summary0002578: Busy loop trying to connect on localhost.
DescriptionHello !

Trying to use any application that uses the FS-services, the application keeps on busy loop, consuming 100% of the cpu.
I can reproduce it on W32 and Linux using the gnunet-fs-gtk.

The debug log keep showing non-stop in a infine loop those messages:

Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `127.0.0.1:2094' (0x1b591db0)
Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `[::1]:2094' (0x1b591c00)
...
Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try
Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try
...
TagsNo tags attached.

Relationships

has duplicate 0002432 closedChristian Grothoff gnunet-gtk High CPU load when displaying search results 

Activities

bratao

2012-10-05 19:50

developer   ~0006400

Thank you Christian for looking at it. I can reproduce it on W32 and Linux, with fs_gtk. But is not deterministic, happen randomly.

But r24188 do not fix it.

Both in my client and in gnunet-fs-gtk we have a saved search with hundreds of results to "mp3."

I trying to debug , but my architecture knowledge about GNUNet still very rusty.

From my debug trace, we have a "loop" that is exectuted hundreds of time, and at the end prints this "Failed to establish TCP connection to..." message. For my debug session it look like this:

This message Come from loopback_resolution
--That is a task added by GNUNET_RESOLVER_ip_get
---Called from GNUNET_CONNECTION_create_from_connect
-----Come from fs_search do_reconnect
-------That is a task added by fs_download try_reconnect
---------Called from transmit_download_request
-----------That is a task added by schedule_block_download
-------------Called by GNUNET_FS_download_start_task_
---------------That is a task added by create_download_context
-----------------Called by GNUNET_FS_download_start
-------------------Called by GNUNET_FS_search_start_probe_
---------------------Called by signal_probe_result
-----------------------Called by probe_failure_handler(fs_search)


Another interesting thing is those errors that only happen on Linux, I didn't see it on W32.
I receive a lot of those messages :
Oct 05 09:47:24-379889 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/79AUSZ' at disk.c:517 with error: Too many open files
Oct 05 09:47:24-392335 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/CpHZEN' at disk.c:517 with error: Too many open files
Oct 05 09:47:24-392462 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/J835qB' at disk.c:517 with error: Too many open files
Oct 05 09:47:24-450842 util-3210 WARNING `open' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/2Zihv2' at disk.c:1666 with error: Too many open files
Oct 05 09:47:24-450887 cangote-3210 ERROR Assertion failed at fs_api.c:2009.

bratao

2012-10-06 00:06

developer   ~0006401

I keeped tracing , and something looks suspicious to me:
This message Come from loopback_resolution
--That is a task added by GNUNET_RESOLVER_ip_get
---Called from GNUNET_CONNECTION_create_from_connect
-----Come from fs_search do_reconnect
-------That is a task added by fs_download try_reconnect
---------Called from transmit_download_request
-----------That is a task added by schedule_block_download
-------------Called by GNUNET_FS_download_start_task_
---------------That is a task added by create_download_context
-----------------Called by GNUNET_FS_download_start
-------------------Called by GNUNET_FS_search_start_probe_
---------------------Called by signal_probe_result
-----------------------Called by probe_failure_handler(fs_search)
-------------------------Event added by GNUNET_FS_search_probe_progress_
----------------------------Called by GNUNET_FS_download_make_status_
------------------------------Called by GNUNET_FS_download_make_status_
--------------------------------Called by deactivate_fs_download
----------------------------------Called by stop_job
------------------------------------Called by GNUNET_FS_dequeue_
-------------------------------------Called by GNUNET_FS_download_stop
--------------------------------------Called by probe_failure_handler
----------------------------------------Event added by GNUNET_FS_search_probe_progress_

Look how probe_failure_handler is written, it call GNUNET_FS_download_stop that can get to probe_failure_handler, and we have a loop.

Maybe I'm missing something,as I'm not used to debug GNUNet. But something smells fishy to me.

Christian Grothoff

2012-10-06 14:51

manager   ~0006402

Yes, there is something wrong. Now, there are different ways I can try to fix it, but the best/cleanest way is in a rather unexpected place (util!) and might thus need some special caution. Anyway, can you try again with SVN 24201?

bratao

2012-10-06 19:42

developer   ~0006408

Thanks again !
I just recompiled and tried to run, and the same 100% CPU bug is still happening.
But I didn't debugged yet,

Still appearing those messages, every 5 seconds :
Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `[::1]:2094' (0x1b2ab130)
Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `127.0.0.1:2094' (0x1b2ab130)
Oct 06 10:41:18-00000000000172761689 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try.
Oct 06 10:41:18-00000000000172761707 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try

Christian Grothoff

2012-10-06 20:45

manager   ~0006410

Every 5s doesn't explain 100% CPU...

Christian Grothoff

2012-10-06 21:52

manager   ~0006411

Ok, this time I'm pretty confident I found it for good. Try #24212.

bratao

2012-10-07 01:33

developer   ~0006412

You're amazing !
Is working great now !

Closed.

bratao

2012-10-07 01:33

developer   ~0006413

Working good on both W32 and Linux.

Issue History

Date Modified Username Field Change
2012-10-04 07:13 bratao New Issue
2012-10-05 19:50 bratao Note Added: 0006400
2012-10-06 00:06 bratao Note Added: 0006401
2012-10-06 13:06 Christian Grothoff Assigned To => Christian Grothoff
2012-10-06 13:06 Christian Grothoff Status new => assigned
2012-10-06 14:51 Christian Grothoff Note Added: 0006402
2012-10-06 14:51 Christian Grothoff Status assigned => feedback
2012-10-06 14:51 Christian Grothoff Target Version => 0.9.4
2012-10-06 19:42 bratao Note Added: 0006408
2012-10-06 19:42 bratao Status feedback => assigned
2012-10-06 20:45 Christian Grothoff Note Added: 0006410
2012-10-06 21:52 Christian Grothoff Note Added: 0006411
2012-10-06 21:52 Christian Grothoff Status assigned => feedback
2012-10-07 01:33 bratao Note Added: 0006412
2012-10-07 01:33 bratao Status feedback => assigned
2012-10-07 01:33 bratao Note Added: 0006413
2012-10-07 01:33 bratao Status assigned => resolved
2012-10-07 01:33 bratao Resolution open => fixed
2012-10-07 12:49 Christian Grothoff Relationship added has duplicate 0002432
2012-11-05 18:33 Christian Grothoff Status resolved => closed