Identified - The issue has been identified and a fix is being implemented.
Apr 19, 2024 - 15:32 PDT
Update - We are continuing to investigate. The issue seems to have started on April 14th when we applied a AWS-required MySQL 5.7 => 8 upgrade to our Subscription service database. This has apparently caused some unforeseen performance issues when running multiple sites' machine learning jobs.
We will be upgrading the database instance size in order to sidestep the space issue temporarily. Our hypothesis is that this will buy us some time and (hopefully) allow our big jobs to continue running.
Meanwhile, we will be investigating how to make the disk usage more efficient, or resolve the issue overall.
Apr 19, 2024 - 15:32 PDT
Investigating - We are currently investigating, but it seems one of our databases has run out of temporary disk space to unload large tables to our machine learning algorithm. Not all sites are affected, and it seems to primarily be an issue for larger sites (many millions of users).
We are looking to remediate this, but also we're trying to find out why this suddenly started happening even though we haven't changed much on the database server side. We will update here with more findings as we have them.
Apr 19, 2024 - 14:20 PDT
This is ReSci status page, where you can always find updated information on how our systems are doing. We will post here if there are interruptions to service.
As always, if you are experiencing any issues, don't hesitate to get in touch with us at http://help.retentionscience.com and we'll get back to you as soon as we can.
Resolved -
Date Started: March 27th Date Resolved: April 11th
Cause of Incident:
A Migration/Upgrade to MySql v8 per Amazon AWS requirements occured on March 26th. When the migration happened there was a small bug in the code/query around NULL date values. This caused rows in our AI predictions output to go missing. Any row that had a date or timestamp or a 0 or null or date outside the normal range would cause that row to be removed entirely from the output. This was something the new version of sql was sensitive to that the old version of sql was not.
How we will prevent this from happening in future:
A Database migration is not a frequent occurrence, something that happens once every 4 years so so. We will do a better job in QAing by involving the Client Success team to QA a few of their largest clients to make sure no anomalies are occurring with the stages and engagement.
Apr 11, 11:00 PDT