The Netflix media empire is built upon their streaming service, a service that lies in Amazon’s cloud. This cloud infrastructure offers many upsides to Netflix, but it’s still not a perfect system, and the engineers at the company have developed a lot of tools to maintain, stress test, and even just wreck havoc on their cloud space. These tools are about to go open source.
Due to the wildly varying demands on the use of Netflix throughout a week, a problem that might not matter on a Monday morning could seriously affect customers on a Friday night. Netflix’s Simian Army helps avoid this by providing ways to emulate instance failure and general outages to make sure that their service still runs regardless.
This tool is named Chaos Monkey, and according to Netflix: “The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.”
There are other Monkey tools, like Doctor Monkey that runs health checks, and Janitor Monkey that frees up resources. Going open source allows Netflix to increase the input on these tools, and allows other companies to get their hands on them and maintain their own systems.
”The big objective for us in going out and talking about this was, we like to hire the very best people in the industry,” Commented Adrian Cockcroft, Director of Cloud Architecture, regarding the story on Wired, “People have to know that you’re doing interesting stuff.”