Database performance
Incident Report for Kickserv
Postmortem

Over the weekend, Kickserv underwent a planned maintenance update to upgrade the database server software to Amazon Aurora, for increased performance and stability.

Unfortunately, Aurora behaves a bit differently than advertised. It sets the amount of allocated memory for temporary use by certain queries, but the amount of memory allocated depends on the size of the virtual server instance. This was not apparent in the Aurora documentation.

This morning, under full load conditions, that allocated memory ran out, causing database errors in the web application. After consulting Amazon Web Services support, we upgraded our database servers to use a larger instance size. This solved the memory allocation issue and gives Kickserv more powerful database hardware. We also updated our internal alerting, so we can keep an eye on database memory usage during normal operations, and catch any future memory shortages before they affect application availability.

Posted Sep 10, 2018 - 15:54 CDT

Resolved
This incident has been resolved.
Posted Sep 10, 2018 - 15:30 CDT
Monitoring
The Kickserv database is back up and running and we're keeping a close watch on performance. We'll have a more detailed update later today, but everything should be back to normal. Thanks for your patience!
Posted Sep 10, 2018 - 09:41 CDT
Identified
We've identified the issue and are installing a fix now. Stand by.
Posted Sep 10, 2018 - 09:28 CDT
Investigating
You may experience some page load errors this morning while we investigate an unexpected issue with our new database infrastructure. We'll keep you posted.
Posted Sep 10, 2018 - 09:09 CDT
This incident affected: Web Application (app.kickserv.com).