View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002578 | GNUnet | file-sharing service | public | 2012-10-04 07:13 | 2012-11-05 18:33 |
Reporter | bratao | Assigned To | Christian Grothoff | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | Debian Squeeze | OS | Linux | ||
Product Version | Git master | ||||
Target Version | 0.9.4 | ||||
Summary | 0002578: Busy loop trying to connect on localhost. | ||||
Description | Hello ! Trying to use any application that uses the FS-services, the application keeps on busy loop, consuming 100% of the cpu. I can reproduce it on W32 and Linux using the gnunet-fs-gtk. The debug log keep showing non-stop in a infine loop those messages: Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `127.0.0.1:2094' (0x1b591db0) Oct 03 22:00:46-00000000000101731350 util-7372 INFO Trying to connect to `[::1]:2094' (0x1b591c00) ... Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try Oct 03 22:01:03-00000000000101748384 util-7372 INFO Failed to establish TCP connection to `localhost:2094', no further addresses to try ... | ||||
Tags | No tags attached. | ||||
has duplicate | 0002432 | closed | Christian Grothoff | gnunet-gtk | High CPU load when displaying search results |
|
Thank you Christian for looking at it. I can reproduce it on W32 and Linux, with fs_gtk. But is not deterministic, happen randomly. But r24188 do not fix it. Both in my client and in gnunet-fs-gtk we have a saved search with hundreds of results to "mp3." I trying to debug , but my architecture knowledge about GNUNet still very rusty. From my debug trace, we have a "loop" that is exectuted hundreds of time, and at the end prints this "Failed to establish TCP connection to..." message. For my debug session it look like this: This message Come from loopback_resolution --That is a task added by GNUNET_RESOLVER_ip_get ---Called from GNUNET_CONNECTION_create_from_connect -----Come from fs_search do_reconnect -------That is a task added by fs_download try_reconnect ---------Called from transmit_download_request -----------That is a task added by schedule_block_download -------------Called by GNUNET_FS_download_start_task_ ---------------That is a task added by create_download_context -----------------Called by GNUNET_FS_download_start -------------------Called by GNUNET_FS_search_start_probe_ ---------------------Called by signal_probe_result -----------------------Called by probe_failure_handler(fs_search) Another interesting thing is those errors that only happen on Linux, I didn't see it on W32. I receive a lot of those messages : Oct 05 09:47:24-379889 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/79AUSZ' at disk.c:517 with error: Too many open files Oct 05 09:47:24-392335 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/CpHZEN' at disk.c:517 with error: Too many open files Oct 05 09:47:24-392462 util-3210 ERROR `mkstemp' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/QQmaLx.dir/J835qB' at disk.c:517 with error: Too many open files Oct 05 09:47:24-450842 util-3210 WARNING `open' failed on file `/home/bratao/.gnunet/fs/persistence//gnunet-fs-gtk/search/bx4U51.dir/2Zihv2' at disk.c:1666 with error: Too many open files Oct 05 09:47:24-450887 cangote-3210 ERROR Assertion failed at fs_api.c:2009. |
|
I keeped tracing , and something looks suspicious to me: This message Come from loopback_resolution --That is a task added by GNUNET_RESOLVER_ip_get ---Called from GNUNET_CONNECTION_create_from_connect -----Come from fs_search do_reconnect -------That is a task added by fs_download try_reconnect ---------Called from transmit_download_request -----------That is a task added by schedule_block_download -------------Called by GNUNET_FS_download_start_task_ ---------------That is a task added by create_download_context -----------------Called by GNUNET_FS_download_start -------------------Called by GNUNET_FS_search_start_probe_ ---------------------Called by signal_probe_result -----------------------Called by probe_failure_handler(fs_search) -------------------------Event added by GNUNET_FS_search_probe_progress_ ----------------------------Called by GNUNET_FS_download_make_status_ ------------------------------Called by GNUNET_FS_download_make_status_ --------------------------------Called by deactivate_fs_download ----------------------------------Called by stop_job ------------------------------------Called by GNUNET_FS_dequeue_ -------------------------------------Called by GNUNET_FS_download_stop --------------------------------------Called by probe_failure_handler ----------------------------------------Event added by GNUNET_FS_search_probe_progress_ Look how probe_failure_handler is written, it call GNUNET_FS_download_stop that can get to probe_failure_handler, and we have a loop. Maybe I'm missing something,as I'm not used to debug GNUNet. But something smells fishy to me. |
|
Yes, there is something wrong. Now, there are different ways I can try to fix it, but the best/cleanest way is in a rather unexpected place (util!) and might thus need some special caution. Anyway, can you try again with SVN 24201? |
|
Thanks again ! I just recompiled and tried to run, and the same 100% CPU bug is still happening. But I didn't debugged yet, Still appearing those messages, every 5 seconds : Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `[::1]:2094' (0x1b2ab130) Oct 06 10:41:17-00000000000172760300 util-12848 INFO Trying to connect to `127.0.0.1:2094' (0x1b2ab130) Oct 06 10:41:18-00000000000172761689 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try. Oct 06 10:41:18-00000000000172761707 util-12848 INFO Failed to establish TCP connection to `localhost:2091', no further addresses to try |
|
Every 5s doesn't explain 100% CPU... |
|
Ok, this time I'm pretty confident I found it for good. Try #24212. |
|
You're amazing ! Is working great now ! Closed. |
|
Working good on both W32 and Linux. |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-10-04 07:13 | bratao | New Issue | |
2012-10-05 19:50 | bratao | Note Added: 0006400 | |
2012-10-06 00:06 | bratao | Note Added: 0006401 | |
2012-10-06 13:06 | Christian Grothoff | Assigned To | => Christian Grothoff |
2012-10-06 13:06 | Christian Grothoff | Status | new => assigned |
2012-10-06 14:51 | Christian Grothoff | Note Added: 0006402 | |
2012-10-06 14:51 | Christian Grothoff | Status | assigned => feedback |
2012-10-06 14:51 | Christian Grothoff | Target Version | => 0.9.4 |
2012-10-06 19:42 | bratao | Note Added: 0006408 | |
2012-10-06 19:42 | bratao | Status | feedback => assigned |
2012-10-06 20:45 | Christian Grothoff | Note Added: 0006410 | |
2012-10-06 21:52 | Christian Grothoff | Note Added: 0006411 | |
2012-10-06 21:52 | Christian Grothoff | Status | assigned => feedback |
2012-10-07 01:33 | bratao | Note Added: 0006412 | |
2012-10-07 01:33 | bratao | Status | feedback => assigned |
2012-10-07 01:33 | bratao | Note Added: 0006413 | |
2012-10-07 01:33 | bratao | Status | assigned => resolved |
2012-10-07 01:33 | bratao | Resolution | open => fixed |
2012-10-07 12:49 | Christian Grothoff | Relationship added | has duplicate 0002432 |
2012-11-05 18:33 | Christian Grothoff | Status | resolved => closed |