Over the weekend, Kickserv underwent a planned maintenance update to upgrade the database server software to Amazon Aurora, for increased performance and stability.
Unfortunately, Aurora behaves a bit differently than advertised. It sets the amount of allocated memory for temporary use by certain queries, but the amount of memory allocated depends on the size of the virtual server instance. This was not apparent in the Aurora documentation.
This morning, under full load conditions, that allocated memory ran out, causing database errors in the web application. After consulting Amazon Web Services support, we upgraded our database servers to use a larger instance size. This solved the memory allocation issue and gives Kickserv more powerful database hardware. We also updated our internal alerting, so we can keep an eye on database memory usage during normal operations, and catch any future memory shortages before they affect application availability.