View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006230||Taler||deployment and operations||public||2020-05-04 09:04||2021-08-24 16:23|
|Priority||normal||Severity||minor||Reproducibility||have not tried|
|Target Version||0.8||Fixed in Version||0.8|
|Summary||0006230: Inferred from Christian: buildbot with weblatetest|
|Description||> You access buildbot under buildbot.taler.net for the Web interface, and|
> $ git clone git+ssh://firstname.lastname@example.org/deployment.git
> (and git commit/push) to configure (buildbot/master.cfg).
> You can use the 'checker-worker' to add additional tasks for experiments.
> Maybe better: why not use the email@example.com user (which AFAIK is
> now useless, right?) to configure/setup your own buildslave and add that
> to the slave list?
|Tags||No tags attached.|
> You access buildbot under buildbot.taler.net for the Web interface
Right but I'm asking what credentials to use?
||Those credentials are also in the Git. But that Web UI is not really used for anything, all the real stuff is done via Git.|
||Okay I'll look online for some buildbot howto instructions so I can start learning this.|
||I'm estimating 32 hours of research/testing/virtual machine work before I'm ready to create a test project. But if there is someone willing to provide some doc on how we handle this, that goes down to about two or three hours. If I'm to understand deployment.git/README.md and deployment.git/buildbot/README, I add 24 hours to my estimate.|
||http://buildbot.net/ really is the documentation for buildbot. the buildbot/README merely says which Debian packages you need for the features of Buildbot we use. The top-level README is only giving the entire deployment.git structure, but you really only need the stuff in buildbot/master.cfg to get started. Basically, that file already contains 'workers' with 'passphrases', so setting up a new buildslave for testing is basically adding another entry to the c["workers"] array and running the buildbot tool to configure and launch the buildslave under the respective account.|
||When I know what buildbot is and how we use it, I am sure that comment will make a lot of sense. :) I will begin my research tomorrow.|
||Small note: this is starting to make sense.|
||Maybe it's still making sense. Without info it is extremely slow going but I continue to read and grep and guess.|
> Basically, that file already contains 'workers' with 'passphrases'
Does it? I don't see those "workers" anywhere else.
> , so setting up a new buildslave for testing is basically adding another entry to the c["workers"] array
Well, I can copypasta with new words like "testing" but I don't know what exactly that would be doing.
> and running the buildbot tool
What buildbot tool? How do I run this?
> to configure and launch the buildslave under the respective account.
What account? I don't know what you mean. I think there are must be some secrets I am not aware of. I've read the Buildbot Manual and the Buildbot Tutorial, and these statements and master.cfg do not indicate steps I can take based on what I have learned from those docs.
To add a new buildslave, you:
1) add a line like:
to the c["workers"] array. This specifies the $NAME and $PASS for the worker. This is how the buildbot *client*
will authenticate to the buildmaster. We don't really care about security here, as the worker simply 'volunteers' to
run code on behalf of the master. These are NOT the username/password you would find in /etc/passwd.
2) You login (say via SSH) as some username (say weblatetest). Here, you use
$ buildbot-worker create-worker DIRECTORY http://buildbot.taler.net:9989/ $NAME $PASS
to create a buildbot worker directory using $NAME and $PASS to log into the buildmaster on port 9989
(or localhost, whatever is applicable)
3) you use
$ buildbot-worker start
to launch the buildslave.
4) You configure which jobs the buildslave should run (and when) in the master.cfg.
Not sure how any of the above CANNOT be documented in the buildbot documentation, we got it from there.
||Aha! (2) actually helps illuminate things a lot. Thank you!|
Here is the result of "$ buildbot-worker create-worker DIRECTORY http://buildbot.taler.net:9989/ $NAME $PASS" when run as "weblatetest" user on taler.net:
Traceback (most recent call last):
File "/usr/local/bin/buildbot-worker", line 6, in <module>
from buildbot_worker.scripts.runner import run
ModuleNotFoundError: No module named 'buildbot_worker
buildbot-worker create-worker buildbot-weblatetest buildbot.taler.net:9989 buildbot-weblatetest <password>
Creating info/admin, you need to edit it appropriately.
Creating info/host, you need to edit it appropriately.
Not creating info/access_uri - add it if you wish
Please edit the files in /home/weblatetest/buildbot-weblatetest/info appropriately.
worker configured in /home/weblatetest/buildbot-weblatetest
There is still something missing. Using for reference: https://docs.buildbot.net/latest/tutorial/firstrun.html#creating-a-worker
When I start the new worker (that I created above) with "buildbot-worker start buildbot-weblatetest", it fails. I am supposed to look at "twistd.log" but it is not in /var/log or in the /home/weblatetest directory.
If you see https://docs.buildbot.net/latest/tutorial/firstrun.html#creating-a-master, there are two locations being discussed but I do not necessarily know what these are. I learned above that one of them (where I create the worker) is the /home/weblatetest directory onn taler.net. The only other location I have access to is my own local terminal. But I am looking for what is referenced here:
"Meanwhile, from the other terminal, in the master log (twisted.log in the master directory)"
I really just need a brief run-down on the setup. It would save so much time and I could focus on buildbot, instead of learning our setup.
1) buildmaster had trouble re-loading its configuration because of the python3.7/3.8 migration
2) your config had buildbot.taler.net, but the master.cfg only binds to loopback (127.0.0.1) for security reasons;
I've fixed those. However, https://bugzilla.redhat.com/show_bug.cgi?id=1557687 also affects Debian, so when restarting the buildmaster I had to disable the Web interface. This should ideally be fixed in Debian, but as a work-around we could pip install buildbot locally under firstname.lastname@example.org. I've added Buck's SSH key to that account so he can read the logs there and bring back the Web interface.
2) excellent. I only used the hostname because of your instructions, and I was concerned about the difference between this and the current docs (https://docs.buildbot.net/latest/tutorial/firstrun.html#creating-a-worker). Probably you built on an older version before localhost was the correct protocol. One question answered.
Regarding buildbot-www, do I need the web interface? I was getting the impression it was not necessary from your earlier comments
The worker still does not load. I do not expect you to debug but if you understand the error please tell me. Also I've posted a follow-up to a similar existing buildbot bug report. Here is the twistd.log output:
2020-06-04 08:15:42+0200 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
2020-06-04 08:15:42+0200 [Broker,client] While trying to connect:
Traceback from remote host -- builtins.RuntimeError: rejecting duplicate worker
2020-06-04 08:15:42+0200 [-] Stopping factory <buildbot_worker.pb.BotFactory object at 0x7f6016beac40>
2020-06-04 08:15:42+0200 [-] Main loop terminated.
2020-06-04 08:15:42+0200 [-] Server Shut Down.
2020-06-04 08:23:34+0200 [-] Loading buildbot.tac...
2020-06-04 08:23:34+0200 [-] Loaded.
2020-06-04 08:23:34+0200 [-] twistd 18.9.0 (/usr/bin/python3 3.8.3) starting up.
2020-06-04 08:23:34+0200 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2020-06-04 08:23:34+0200 [-] Starting Worker -- version: 2.7.0
2020-06-04 08:23:34+0200 [-] recording hostname in twistd.hostname
2020-06-04 08:23:34+0200 [-] Starting factory <buildbot_worker.pb.BotFactory object at 0x7f778f699ca0>
2020-06-04 08:23:34+0200 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
2020-06-04 08:23:34+0200 [Broker,client] While trying to connect:
Traceback from remote host -- builtins.RuntimeError: rejecting duplicate worker
2020-06-04 08:23:34+0200 [-] Stopping factory <buildbot_worker.pb.BotFactory object at 0x7f778f699ca0>
2020-06-04 08:23:34+0200 [-] Main loop terminated.
2020-06-04 08:23:34+0200 [-] Server Shut Down.
Eh, I had already started the slave under the weblatetest account. So you're getting this message because somehow you are trying to start/connect a *second* one (with the same credentials).
The Web interface might have made this easier for you to understand, that's why it is "needed". It's not strictly needed for us to use the buildbot (and certainly not to configure it), but it can be convenient to simply see which workers are running without having to read logs...
> Eh, I had already started the slave under the weblatetest account.
How did you do that? Is it automatic or did you type the same command I typed?
Can you disable it? Should you disable it? I'm trying to learn how the pieces fit together. If it's normal for you to start it, great. Although I'm not sure how I would control it then when I want to make changes?
I don't know which command you used, I used:
weblatetest@gv:~$ cd buildbot-weblatetest/
weblatetest@gv:~/buildbot-weblatetest$ buildbot-worker start
Following twistd.log until startup finished..
The buildbot-worker appears to have (re)started correctly.
You can disable it with 'stop'. Integration with systemd was not done (yet), as this is just for testing.
Now, once it is running, you don't usually ever do anything with the buildslave (modulo integration with systemd to start it on system boot), all actions are controlled via the buildmaster (via deployment.git).
||Okay great. What I'm asking though is why you started it instead of me, since I was testing it. That means that there is some other process/step (whatever you were involved in) that I did not know about, and should know about, to proceed. Of course I can stop | start it but why did this happen in the first place, and will it happen again? And should it, or was it a mistake, etc.?|
I've just run "pip install buildbot.www" and "buildbot.www" (not the full buildbot, just the -www and .www packages) on buildbot-master user account. I did not do "pip install buildbot" as you suggested because I think that will create two buildbot executables (the debian and the pip) which we probably want to avoid. Please correct me if I am wrong.
Lines 1019-1029 ("www" definintion) of master.cfg uncommented and re-committed to deployment.git.
I do not see the web interface at https://buildbot.taler.net/ yet, and netstat does not show use of the port in master.cfg. But I assume it will restart with a fresh buildbot cycle automatically? I still do not understand our config, including where and how often the automatic reads/runs are configured, but I think this is correct.
There is also the possibility that the webUI fails because we are using python2.7: https://bugs.gnunet.org/view.php?id=6369
I did decide to take the risk and run "buildbot restart" but the error was:
buildbot-master@gv:~$ buildbot restart
error reading '/home/buildbot-master/buildbot.tac': No such file or directory
invalid buildmaster directory '/home/buildbot-master'
Maybe I could symlink buildbot-master to master/ (or just specify "master" as the command line option) and re-run the restart but that is way too far to go without checking first that it is ok. But if you confirm OK I will restart buildbot.
||weblatetest job starts | stops fine. I'm not sure yet where to add scripts/instructions but we have cleared the recent problems.|
||weblatetest job starts | stops fine. I'm not sure yet where to add scripts/instructions but we have cleared the recent problems (with the weblatetest build worker)|
||(also did pip install buildbot-waterfall-view buildbot-console-view)|
||Update: after hours of waiting for webUI to show up, I realized there must be a problem with the master.cfg reloading. So I tried it manually ("buildbot restart master") and saw that the webUI authentication was also commented out. Now the master reloads, and the webUI is working: https://buildbot.taler.net/#/|
||The original task (and also the task of setting up the WebUI) has been completed.|
https://bugs.gnunet.org/view.php?id=6230#c16239 => it's just your playground worker, so I think that one we can leave for experiments without too much documentation or scripting.
For 'real' jobs, we should have the worker launched as a systemd user service and create additional, worker-specific accounts. That said, you surely could use that worker to test out new CI/CD jobs.
You already got your first ones: Web site link checking and generating alerts when there are warnings building the docs.git Sphinx documentation (which you may want to directly integrate with the existing doc-builder).
|2020-05-04 09:04||buckE||New Issue|
|2020-05-04 09:04||buckE||Assigned To||=> buckE|
|2020-05-04 09:04||buckE||Status||new => assigned|
|2020-05-06 08:43||buckE||Note Added: 0015859|
|2020-05-06 12:21||Christian Grothoff||Note Added: 0015860|
|2020-05-11 05:14||buckE||Note Added: 0015876|
|2020-05-11 08:27||buckE||Note Added: 0015877|
|2020-05-11 09:48||Christian Grothoff||Note Added: 0015881|
|2020-05-26 08:22||buckE||Note Added: 0015963|
|2020-05-27 09:11||buckE||Note Added: 0015972|
|2020-05-29 06:43||buckE||Note Added: 0015978|
|2020-05-29 09:58||buckE||Note Added: 0015982|
|2020-05-30 12:09||Christian Grothoff||Note Added: 0015984|
|2020-06-01 09:17||buckE||Note Added: 0016217|
|2020-06-01 10:54||buckE||Note Added: 0016218|
|2020-06-02 11:08||buckE||Note Added: 0016224|
|2020-06-03 08:47||buckE||Note Added: 0016225|
|2020-06-03 13:06||Christian Grothoff||Note Added: 0016227|
|2020-06-04 08:57||buckE||Note Added: 0016228|
|2020-06-04 09:58||Christian Grothoff||Note Added: 0016229|
|2020-06-05 06:56||buckE||Note Added: 0016232|
|2020-06-05 17:29||Christian Grothoff||Note Added: 0016233|
|2020-06-08 08:33||buckE||Note Added: 0016236|
|2020-06-09 04:09||buckE||Note Added: 0016237|
|2020-06-09 04:48||buckE||Note Edited: 0016237|
|2020-06-09 05:37||buckE||Note Edited: 0016237|
|2020-06-09 05:52||buckE||Note Added: 0016238|
|2020-06-09 05:57||buckE||Note Added: 0016239|
|2020-06-09 06:16||buckE||Note Added: 0016240|
|2020-06-09 07:37||buckE||Note Edited: 0016237|
|2020-06-09 08:04||buckE||Note Added: 0016241|
|2020-06-09 08:05||buckE||Status||assigned => resolved|
|2020-06-09 08:05||buckE||Resolution||open => fixed|
|2020-06-09 08:05||buckE||Note Added: 0016242|
|2020-06-09 11:56||Christian Grothoff||Note Added: 0016245|
|2020-06-09 11:57||Christian Grothoff||Note Edited: 0016245|
|2020-07-24 11:56||Christian Grothoff||Target Version||=> 0.8|
|2020-07-24 11:56||Christian Grothoff||Fixed in Version||=> 0.8|
|2021-08-24 16:23||Christian Grothoff||Status||resolved => closed|