Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results collection interrupted due to disk capacity limits #629

Closed
jugglinmike opened this issue Nov 13, 2018 · 2 comments

Comments

@jugglinmike
Copy link
Collaborator

commented Nov 13, 2018

After seven months of operation, the Buildbot database has grown beyond the 100 gigabytes we initially provisioned to store it. This prevented the build master from scheduling collection from Chrome and Firefox on 2018-11-11 and 2018-11-12.

I'd like to maintain that data for now, so I've doubled the Elastic Block Storage instance we use as a backing for the database. I then manually resized the partition via SSH:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            991M     0  991M   0% /dev
tmpfs           200M   21M  180M  11% /run
/dev/xvda1      7.8G  4.8G  2.7G  65% /
tmpfs          1000M     0 1000M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
/dev/xvdf        99G   93G  952M 100% /mnt/buildmaster-data
tmpfs           200M     0  200M   0% /run/user/1000

$ sudo resize2fs /dev/xvdf
sudo resize2fs /dev/xvdf
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/xvdf is mounted on /mnt/buildmaster-data; on-line resizing required
old_desc_blocks = 7, new_desc_blocks = 13
The filesystem on /dev/xvdf is now 52428800 (4k) blocks long.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            991M     0  991M   0% /dev
tmpfs           200M   21M  180M  11% /run
/dev/xvda1      7.8G  4.8G  2.7G  65% /
tmpfs          1000M     0 1000M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
/dev/xvdf       197G   93G   96G  50% /mnt/buildmaster-data
tmpfs           200M     0  200M   0% /run/user/1000

(Although we typically prefer to express system configuration via Ansible, this operation does not represent an action that would need to be taken on a new deployment, so it's not useful to express it in terms of Ansible.)

@jugglinmike

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 8, 2019

When we began collecting results from Safari, we deployed an independent Buildbot "master" to manage the sole Mac Mini reserved for this purpose. Although that separation has limited the scope of failures, it has also bifurcated maintenance efforts.

In this case, I increased the disk space reserved for the original Buildbot master (see above), but I did not do the same for the MacOS-dedicated machine. Over the weekend, that machine reached disk capacity and became incapable of managing the Mac Mini worker.

The machine should be provisioned with dedicated EBS storage, and the current database should be transferred into place.

@jugglinmike jugglinmike self-assigned this Jan 30, 2019

@jugglinmike

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 4, 2019

About a week ago, I provisioned additional storage as described above and manually mounted it. We have successfully collected results from both releases of Apple Safari every day since then without fail.

@jugglinmike jugglinmike closed this Feb 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.