Database server out of temporary disk space causing certain sites' recommendations to fail
Incident Report for Retention Science
Resolved
We found a missing configuration that did not carry over from the old version of the database to the new. We added this configuration on Friday around 6pm Pacific, which has fixed the problem with our big data operations on our Subscription service database for all clients.

We monitored over the weekend, and there were no further errors. This issue has been resolved.

We have taken steps to make sure all of our databases have this configuration and will be monitoring any similar issue with other databases going forward.
Posted Apr 22, 2024 - 11:25 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 19, 2024 - 15:32 PDT
Update
We are continuing to investigate. The issue seems to have started on April 14th when we applied a AWS-required MySQL 5.7 => 8 upgrade to our Subscription service database. This has apparently caused some unforeseen performance issues when running multiple sites' machine learning jobs.

We will be upgrading the database instance size in order to sidestep the space issue temporarily. Our hypothesis is that this will buy us some time and (hopefully) allow our big jobs to continue running.

Meanwhile, we will be investigating how to make the disk usage more efficient, or resolve the issue overall.
Posted Apr 19, 2024 - 15:32 PDT
Investigating
We are currently investigating, but it seems one of our databases has run out of temporary disk space to unload large tables to our machine learning algorithm. Not all sites are affected, and it seems to primarily be an issue for larger sites (many millions of users).

We are looking to remediate this, but also we're trying to find out why this suddenly started happening even though we haven't changed much on the database server side. We will update here with more findings as we have them.
Posted Apr 19, 2024 - 14:20 PDT
This incident affected: Cortex Application (Main Dashboard) and Recommendations API.