0002578: Busy loop trying to connect on localhost.

ID	Project	Category	View Status	Date Submitted	Last Update

0002578	GNUnet	file-sharing service	public	2012-10-04 07:13	2012-11-05 18:33

Reporter	bratao	Assigned To	Christian Grothoff
Priority	normal	Severity	major	Reproducibility	always
Status	closed	Resolution	fixed
Platform	Debian Squeeze	OS	Linux
Product Version	Git master
Target Version	0.9.4

Summary	0002578: Busy loop trying to connect on localhost.
Description	Hello ! Trying to use any application that uses the FS-services, the application keeps on busy loop, consuming 100% of the cpu. I can reproduce it on W32 and Linux using the gnunet-fs-gtk. The debug log keep showing non-stop in a infine loop those messages: Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `127.0.0.1:2094' (0x1b591db0) Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `[::1]:2094' (0x1b591c00) ... Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try ...
Tags	No tags attached.

bratao 2012-10-05 19:50 developer ~0006400	Thank you Christian for looking at it. I can reproduce it on W32 and Linux, with fs_gtk. But is not deterministic, happen randomly. But r24188 do not fix it. Both in my client and in gnunet-fs-gtk we have a saved search with hundreds of results to "mp3." I trying to debug , but my architecture knowledge about GNUNet still very rusty. From my debug trace, we have a "loop" that is exectuted hundreds of time, and at the end prints this "Failed to establish TCP connection to..." message. For my debug session it look like this: This message Come from loopback_resolution --That is a task added by GNUNET_RESOLVER_ip_get ---Called from GNUNET_CONNECTION_create_from_connect -----Come from fs_search do_reconnect -------That is a task added by fs_download try_reconnect ---------Called from transmit_download_request -----------That is a task added by schedule_block_download -------------Called by GNUNET_FS_download_start_task_ ---------------That is a task added by create_download_context -----------------Called by GNUNET_FS_download_start -------------------Called by GNUNET_FS_search_start_probe_ ---------------------Called by signal_probe_result -----------------------Called by probe_failure_handler(fs_search) Another interesting thing is those errors that only happen on Linux, I didn't see it on W32. I receive a lot of those messages : Oct 05 09:47:24-379889 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/79AUSZ' at disk.c:517 with error: Too many open files Oct 05 09:47:24-392335 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/CpHZEN' at disk.c:517 with error: Too many open files Oct 05 09:47:24-392462 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/J835qB' at disk.c:517 with error: Too many open files Oct 05 09:47:24-450842 util-3210 WARNING `open' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/2Zihv2' at disk.c:1666 with error: Too many open files Oct 05 09:47:24-450887 cangote-3210 ERROR Assertion failed at fs_api.c:2009.

bratao 2012-10-06 00:06 developer ~0006401	I keeped tracing , and something looks suspicious to me: This message Come from loopback_resolution --That is a task added by GNUNET_RESOLVER_ip_get ---Called from GNUNET_CONNECTION_create_from_connect -----Come from fs_search do_reconnect -------That is a task added by fs_download try_reconnect ---------Called from transmit_download_request -----------That is a task added by schedule_block_download -------------Called by GNUNET_FS_download_start_task_ ---------------That is a task added by create_download_context -----------------Called by GNUNET_FS_download_start -------------------Called by GNUNET_FS_search_start_probe_ ---------------------Called by signal_probe_result -----------------------Called by probe_failure_handler(fs_search) -------------------------Event added by GNUNET_FS_search_probe_progress_ ----------------------------Called by GNUNET_FS_download_make_status_ ------------------------------Called by GNUNET_FS_download_make_status_ --------------------------------Called by deactivate_fs_download ----------------------------------Called by stop_job ------------------------------------Called by GNUNET_FS_dequeue_ -------------------------------------Called by GNUNET_FS_download_stop --------------------------------------Called by probe_failure_handler ----------------------------------------Event added by GNUNET_FS_search_probe_progress_ Look how probe_failure_handler is written, it call GNUNET_FS_download_stop that can get to probe_failure_handler, and we have a loop. Maybe I'm missing something,as I'm not used to debug GNUNet. But something smells fishy to me.

Christian Grothoff 2012-10-06 14:51 manager ~0006402	Yes, there is something wrong. Now, there are different ways I can try to fix it, but the best/cleanest way is in a rather unexpected place (util!) and might thus need some special caution. Anyway, can you try again with SVN 24201?

bratao 2012-10-06 19:42 developer ~0006408	Thanks again ! I just recompiled and tried to run, and the same 100% CPU bug is still happening. But I didn't debugged yet, Still appearing those messages, every 5 seconds : Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `[::1]:2094' (0x1b2ab130) Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `127.0.0.1:2094' (0x1b2ab130) Oct 06 10:41:18-00000000000172761689 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try. Oct 06 10:41:18-00000000000172761707 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try

Christian Grothoff 2012-10-06 20:45 manager ~0006410	Every 5s doesn't explain 100% CPU...

Christian Grothoff 2012-10-06 21:52 manager ~0006411	Ok, this time I'm pretty confident I found it for good. Try #24212.

bratao 2012-10-07 01:33 developer ~0006412	You're amazing ! Is working great now ! Closed.

bratao 2012-10-07 01:33 developer ~0006413	Working good on both W32 and Linux.

Date Modified	Username	Field	Change
2012-10-04 07:13	bratao	New Issue
2012-10-05 19:50	bratao	Note Added: 0006400
2012-10-06 00:06	bratao	Note Added: 0006401
2012-10-06 13:06	Christian Grothoff	Assigned To	=> Christian Grothoff
2012-10-06 13:06	Christian Grothoff	Status	new => assigned
2012-10-06 14:51	Christian Grothoff	Note Added: 0006402
2012-10-06 14:51	Christian Grothoff	Status	assigned => feedback
2012-10-06 14:51	Christian Grothoff	Target Version	=> 0.9.4
2012-10-06 19:42	bratao	Note Added: 0006408
2012-10-06 19:42	bratao	Status	feedback => assigned
2012-10-06 20:45	Christian Grothoff	Note Added: 0006410
2012-10-06 21:52	Christian Grothoff	Note Added: 0006411
2012-10-06 21:52	Christian Grothoff	Status	assigned => feedback
2012-10-07 01:33	bratao	Note Added: 0006412
2012-10-07 01:33	bratao	Status	feedback => assigned
2012-10-07 01:33	bratao	Note Added: 0006413
2012-10-07 01:33	bratao	Status	assigned => resolved
2012-10-07 01:33	bratao	Resolution	open => fixed
2012-10-07 12:49	Christian Grothoff	Relationship added	has duplicate 0002432
2012-11-05 18:33	Christian Grothoff	Status	resolved => closed

View Issue Details

Relationships

Activities

Issue History