During this past weekend (4/27/2019), NetDocuments had a pre-scheduled Maintenance Window that took place within the US Service. Over the past year, we have been implementing a Couchbase Directory Service as part of our ongoing updates to our global infrastructure. During the recent maintenance window, our engineering teams worked to migrate Repositories, Groups, Cabinets and User Data to the new Couchbase Service. This work had already been completed in both our Australian and EU based data centers. The work was successfully completed, and the US Service was in a normal state following the updates.
At approximately 9:50am EDT on Monday, April 29th, we began to see performance issues with the US Service. Our teams were immediately alerted to the issue and began their investigations, guided by Couchbase Support. The Couchbase Directory capacity was increased twice between 10:15am EDT and 12:15pm EDT based upon input from the Couchbase Support Teams. During this same time, a code review was also undertaken in order to vet the updates that had been put into place over the weekend.
At approximately 1:30pm EDT, a potential code issue was discovered involving a query that was running at an unusual rate relative to its normal function. Once identified and reviewed, it was determined that a code-based update could be developed and safely deployed that would reduce the frequency of the query. At approximately 3:10pm EDT, the engineering team began to implement the code-based patch across the server pools. Performance began to improve as the patch was installed. The Service returned to a normal state and the issue was considered resolved.
No data or inter-process anomalies were identified and NetDocuments meticulously maintained our change control procedures. The Couchbase Directory Service deployment will increase service scalability and availability. This is part of the continuous drive for software-based datacenters which included the successful migration to object store with erasure coding, HSM-based cryptography, hyper-converge technology, and Solr platform. Once the Couchbase service was optimized and fully running in the US Service, our metrics experienced measurable and significant improvements in service scalability. In order to prevent future issues, we will deploy deeper verbosity in our internal logs, to further improve our ability to detect very minor changes in query processes. We apologize for this incident.