Greetings fellow chaos engineers and SREs!
I am currently working on adding capabilities to Service Fabric's Chaos engineering and experimentation service and I'm looking for feedback and insights from folks who are already using Service Fabric (but, of course, if you're not, please do check out our offering!).
Service Fabric is a general purpose distributed systems infrastructure and orchestrator (VMs, containers, etc...). It powers a lot of Microsoft Azure today including services like Cosmos DB, SQL Azure, etc...
Service Fabric was originally designed for internal (to Azure) workloads, but has been in the public domain for a few years now - and just recently we open-sourced it.
Service Fabric has a built-in (first class!) chaos experimentation subsystem as part of its Fault Analysis Service. SF Chaos is geared toward injecting faults that typically happen in clustered distributed system environments that are designed for extreme availability (node failure, quorum loss, replication failure, application level start/stops, etc...).
You can experiment with Service Fabric clusters in the cloud by using a free Party Clusters service if you want to deploy your SF application to a cloud environment without having an Azure account.
I'm looking to get more information on who's using our Chaos service today (we know many services are, but we don't know who they are) in order to get your feedback and insights, which will help in furthering the capabilities of the system. Also, if you have any questions, just ask. You can always get to me by emailing ctorre at microsoft... If you didn't know Service Fabric has chaos experimentation built-in and first class, now you do :slightly_smiling_face: Please let me know if you are using it and want to chat.
Cheers and happy chaos!