Storage Cluster problems

SURFsara strives to be transparent about incidents on the HPC Cloud infrastructure. This report explains an issue that affected the storage cluster on Sunday 19th through Monday 20th of February 2017.

Summary

The HPC Cloud Ceph storage cluster experienced a major incident. We have returned to normal production and no data was lost.

Virtual Machines mounting Ceph datablocks may have been affected.

The story

Issues identified

Lessons learned

Impact

What to do (as a user)

Timeline