View Issue Details

IDProjectCategoryView StatusLast Update
0003684GNUnetUDP transportpublic2018-06-07 00:25
ReporterellAssigned ToChristian Grothoff 
PriorityhighSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
PlatformOSXubuntu-14.04-64amdOS Version
Product VersionSVN HEAD 
Target Version0.11.0pre66Fixed in Version0.11.0pre66 
Summary0003684: Revision 35254: Overnight run fills memory and swap completely
DescriptionRunning gnunet over night makes RAM and SWAP completely full.
Killing with
 gnunet-arm -e
results in a never ending process.
The tail of gnunet.log is added.
TagsNo tags attached.

Relationships

related to 0003687 closedChristian Grothoff WARNING `accept' failed at connection.c:345 with error: Resource temporarily unavailable 
related to 0003695 closedChristian Grothoff UDP reports multiple session instances to same peer 

Activities

ell

2015-02-14 08:20

reporter  

gnunet.log (93,121 bytes)

ell

2015-02-14 09:47

reporter   ~0008879

Last edited: 2015-02-15 15:00

View 2 revisions

Looks like to be an issue of gnunet-gtk.

--No, it is not an issue of gnunet-gtk, it seems to be an issue of
 gnunet-service-transport as admin-tools indicate.

It is "gnunet-service-transport" that fills in a strictly monoton increasing way memory and swap and never releases any data. So I can not leave the machine really unguided. To get rid of this data, I have to restart gnunet,
 and refilling restarts also again.

Christian Grothoff

2015-02-17 19:35

manager   ~0008884

Ok, sounds like we have a major memory leak. Setting

[transport]
PREFIX = valgrind --leak-check=yes --log-file=/tmp/leak.log

in the configuration file should cause memory leaks to be logged to '/tmp/leak.log' -- but only on 'regular' exit. So running a peer for a few hours with this setting and then grabbing /tmp/leak.log should help speed up our diagnostics. (I'd do so right now, but I'm not at work this week.)

ell

2015-02-18 09:35

reporter   ~0008887

leak.log after 1/2 hour run is uploaded.

The strictly monoton memory consumption of memcheck-amd64- (=disguised gnunet-service-transport) started at approximately 173 972 K and came to about
729 300 K when gnunet was stopped via gnunet-arm -e.

If wanted, I can upload gnunet.log of this gnunet-run.

ell

2015-02-18 09:41

reporter  

leak.log (1,119 bytes)

Christian Grothoff

2015-02-18 11:41

manager   ~0008888

Sorry, no leaks after 1/2h. Now, the increase to just 730k may even be expected, as it is conceivable that the peer creates more connections and thus needs more buffer space for those connections. 600k for buffers is not _that_ much.

OTOH, it is odd that after 1/2h a major leak does not even show. One possible explanation is typically that it is not a classical leak but memory that is freed on exit, just allocated, used, and then not used for a long time before being finally freed. Such a leak is much harder to find, but might be identified using

PREFIX = valgrind --tool=massif --massif-out-file=/tmp/massif.out --alloc-fn=GNUNET_malloc

However, the output of this tool is likely a rather large file and we might have to play some more with the options to get those right. So maybe Matthias can try this?

ell

2015-02-18 13:21

reporter   ~0008889

Last edited: 2015-02-18 14:39

View 3 revisions

Additional observations:

1.) The incremental chunks seem to be of size 264 K (or integer multiples of 264 K)

2.) I have seen datastore-warnings from 1 m to 10 m the like:
    fs-16875 WARNING Datastore lookup already took 1 m!
    Can this warning be connected to the problem?

I will test:
 PREFIX = valgrind --tool=massif --massif-out-file=/tmp/massif.out --alloc-fn=GNUNET_malloc

++++No massif.out-file is created.
Sorry, I was too impatient, I retry.

Christian Grothoff

2015-02-18 15:10

manager   ~0008891

Oh, wait, did you say datastore??? In that case the valgrind commands must be in [datastore] instead of [transport]. (And I need glasses to read more carefully.)

ell

2015-02-18 16:22

reporter  

massif.out (274,559 bytes)

ell

2015-02-18 16:29

reporter   ~0008893

No glasses needed.

massif.out is uploaded.

Point 2.) was an idea, maybe it could be connected to the problem.

Should I test [datastore] with valgrind or is this not needed?

Christian Grothoff

2015-02-18 16:37

manager   ~0008894

Well, if datastore is the process that uses too much memory, then yes, we should analyze that one ;-) (first with the leak checker...).

ell

2015-02-18 17:27

reporter   ~0008895

The consuming process is definitely gnunet-service-transport.

Christian Grothoff

2015-02-21 21:46

manager   ~0008907

Ok, I fixed one particular leak in transport. If this still happens, one question I have is how much memory your system has, and how much bandwidth you have configured for GNUnet (as more bandwidth = higher expected memory consumption).

ell

2015-02-22 10:31

reporter   ~0008911

Really sorry, but the problem still exists.
Gnunet-service-transport consumes very slowly but continously and does never free the consumed memory, so after some hours memory and swap are completely full.
The only way for me to free memory and swap is stop/start gnunet.

Configured memory is 4 GiB and configured swap is also 4 GiB. Filling of
memory by gnunet starts approximately at 0.8 GiB memory.

Bandwidth is not expressly configured for gnunet, I let it work with the built-in standards of gnunet.
What I see in watching network-history is that receiving signals seldom exceed 140 KiB/s and that my sending signals in average are about 200 KiB/s, peaks may come up to 500 KiB/s and very seldom peaks go up to 1000 KiB/s, so it would not be my favorite idea that it is a network problem.

Christian Grothoff

2015-02-22 12:38

manager   ~0008912

Ok, let's look further than. Which transport plugins do you have active (in section [transport], the option PLUGINS)?

ell

2015-02-22 13:31

reporter   ~0008913

In /etc/gnunet.conf:
...
[transport]
PLUGINS = tcp udp http_client https_client http_server https_server
...

amatus

2015-02-22 18:22

developer   ~0008914

Try:

[transport]
NEIGHBOUR_LIMIT = 5

And see if that caps memory usage.

Christian Grothoff

2015-02-22 23:00

manager   ~0008915

Amatus: I'd rather see him try with/without udp and/or http server/client. I suspect it's one of the plugins.

ell

2015-02-22 23:50

reporter   ~0008918

Test result of:
[transport]
NEIGHBOUR_LIMIT = 5

Test was running from about 19:00 to 23:30 and filled nearly the complete memory.

On monday I will test switching off udp, http_server, http_client.
Now I go for sleep and dreams...

ell

2015-02-23 09:46

reporter   ~0008921

svn up ->35298
Yes, the bug is alive in 35298.

Test no udp configured:
"PLUGINS = tcp http_client https_client http_server https_server"

Result:
Sequence of memory consumption begins at 17976 K comes up to 19400 K and then comes down to 18672 K and again up to 19400 K and down to 19356 K.
Well, looks like the bug lives somewhere in udp.

Also seen: lots of
"transport-11133 ERROR Assertion failed at plugin_transport_http_server.c:1666"
but unclear if connected to this bug.

Matthias Wachs

2015-02-24 10:12

manager   ~0008925

Last edited: 2015-02-24 10:17

View 2 revisions

I am running a peer for server quite some days with:

[transport]
PLUGINS = tcp https_client http_client udp

[nat]
BEHIND_NAT=YES

Peer is connected to several other (3-6) peers (core)

I have attached some stats showing the difference between 4 days of running

Transport memory consumption is slightly increasing from 110 to 117 MB ... but not thrashing the system...

Interesting is that ATS is running with 100% CPU load ... see related bug 0003689

Matthias Wachs

2015-02-24 10:13

manager  

stat-feb-20.png (362,684 bytes)

Matthias Wachs

2015-02-24 10:13

manager  

stat-feb-24.png (862,473 bytes)

ell

2015-02-24 11:21

reporter   ~0008927

Last edited: 2015-02-25 09:34

View 3 revisions

I reactivated udp:
[transport]
PLUGINS = tcp http_client https_client http_server https_server udp
and again I see the strictly monotone memory consuming.

I can observe connections using tcp, http_server, https_server to me.
I do not observe udp- and *-client-connections.

CPU-load of gnunet-service-ats is about 1%.
Most cpu-load is for datastore, fs, gtk.

Ok, you can not verify the problem. I can do tests for you, assuming I am equipped with the necessary test utensils and test directives.

Christian Grothoff

2015-03-01 23:55

manager   ~0008954

I found and fixed one (minor) leak in SVN 35330. But didn't seem big enough to justify running out of memory overnight.

ell

2015-03-02 10:23

reporter   ~0008958

Last edited: 2015-03-02 10:45

View 2 revisions

Maybe you can get some helpful info via the uploaded video.

Upload seems not to work, so use
http://srv1.nerdcamp.net/html/Bildschirmaufnahme_2015-03-02_10-06-19.mp4

Christian Grothoff

2015-03-08 21:38

manager   ~0009008

I've been running a peer now for quite some time with TCP + UDP, and I don't see any significant memory utilization by transport. FS takes like 250 MB, but the rest is virtually invisible. Please let me know if you still get this with SVN HEAD.

ell

2015-03-09 08:07

reporter   ~0009009

Two news, the good one first:

Trashing memory and swap is gone with svn 35363 ( disappeared already in 35339)

The bad news:
udp-connections shown by peerinfo-gtk in gnunet-gtk are not displayed in the same
way as tcp, http, https. udp-connections are displayed under 64TF-item, contrasting to tcp-connections listed not under 64TF-peer.

Christian Grothoff

2015-03-09 09:11

manager   ~0009010

Ok, thanks for the confirmation. As for the gnunet-peerinfo-gtk issue, I don't understand from what you write, but please simply file a separate bug and maybe attach a screenshot for illustration ;-).

Issue History

Date Modified Username Field Change
2015-02-14 08:20 ell New Issue
2015-02-14 08:20 ell File Added: gnunet.log
2015-02-14 09:47 ell Note Added: 0008879
2015-02-15 15:00 ell Note Edited: 0008879 View Revisions
2015-02-17 19:35 Christian Grothoff Note Added: 0008884
2015-02-18 09:35 ell Note Added: 0008887
2015-02-18 09:41 ell File Added: leak.log
2015-02-18 11:41 Christian Grothoff Note Added: 0008888
2015-02-18 11:43 Christian Grothoff Assigned To => Matthias Wachs
2015-02-18 11:43 Christian Grothoff Status new => assigned
2015-02-18 13:21 ell Note Added: 0008889
2015-02-18 13:27 ell Note Edited: 0008889 View Revisions
2015-02-18 14:39 ell Note Edited: 0008889 View Revisions
2015-02-18 15:10 Christian Grothoff Note Added: 0008891
2015-02-18 16:22 ell File Added: massif.out
2015-02-18 16:29 ell Note Added: 0008893
2015-02-18 16:37 Christian Grothoff Note Added: 0008894
2015-02-18 17:08 Christian Grothoff Assigned To Matthias Wachs => Christian Grothoff
2015-02-18 17:27 ell Note Added: 0008895
2015-02-18 22:53 Christian Grothoff Assigned To Christian Grothoff => Matthias Wachs
2015-02-18 22:53 Christian Grothoff Category datastore service => transport service
2015-02-19 14:44 Christian Grothoff Relationship added related to 0003687
2015-02-21 21:46 Christian Grothoff Note Added: 0008907
2015-02-21 21:47 Christian Grothoff Status assigned => feedback
2015-02-22 10:31 ell Note Added: 0008911
2015-02-22 10:31 ell Status feedback => assigned
2015-02-22 12:38 Christian Grothoff Note Added: 0008912
2015-02-22 13:31 ell Note Added: 0008913
2015-02-22 18:22 amatus Note Added: 0008914
2015-02-22 23:00 Christian Grothoff Note Added: 0008915
2015-02-22 23:50 ell Note Added: 0008918
2015-02-23 09:46 ell Note Added: 0008921
2015-02-24 10:12 Matthias Wachs Note Added: 0008925
2015-02-24 10:13 Matthias Wachs File Added: stat-feb-20.png
2015-02-24 10:13 Matthias Wachs File Added: stat-feb-24.png
2015-02-24 10:17 Matthias Wachs Note Edited: 0008925 View Revisions
2015-02-24 11:21 ell Note Added: 0008927
2015-02-24 12:14 ell Note Edited: 0008927 View Revisions
2015-02-25 09:34 ell Note Edited: 0008927 View Revisions
2015-02-28 14:44 Christian Grothoff Category transport service => UDP transport
2015-02-28 14:44 Christian Grothoff Priority normal => high
2015-02-28 17:51 Christian Grothoff Relationship added related to 0003695
2015-03-01 23:55 Christian Grothoff Note Added: 0008954
2015-03-02 10:23 ell Note Added: 0008958
2015-03-02 10:45 ell Note Edited: 0008958 View Revisions
2015-03-07 20:39 Christian Grothoff Assigned To Matthias Wachs => Christian Grothoff
2015-03-08 21:38 Christian Grothoff Note Added: 0009008
2015-03-08 21:38 Christian Grothoff Status assigned => feedback
2015-03-09 08:07 ell Note Added: 0009009
2015-03-09 08:07 ell Status feedback => assigned
2015-03-09 09:11 Christian Grothoff Note Added: 0009010
2015-03-09 09:11 Christian Grothoff Status assigned => resolved
2015-03-09 09:11 Christian Grothoff Fixed in Version => 0.11.0pre66
2015-03-09 09:11 Christian Grothoff Resolution open => fixed
2015-03-09 09:11 Christian Grothoff Target Version => 0.11.0pre66
2018-06-07 00:25 Christian Grothoff Status resolved => closed