Task 03 - Identifying bottlenecks, stress tests, and resilience testing
The following high-level notes cover step 1.
For a primer on identifying bottlenecks using Azure Load Testing, this article covers the process.
Because there is no database associated with this application, identifying areas of slowdown will require code analysis.
Learners should notice that there are two operations which consistently lag behind everything else in terms of performance: initial loading of messages and posting a new message. In both cases, the root cause is a thread sleep loop. After removing these “speed loops,” learners should notice a drastic performance improvement when performing those operations.
The following high-level notes cover steps 2-4. There is also a sample JMeter script in the solution directory.
The following high-level notes cover steps 5-8.
Chaos Studio does not offer a GitHub Action at this time, so learners will not be able to integrate this into their CI/CD pipelines.
Azure Chaos studio only supports a limited number of services, such as VMs, AKS, App Services, Cosmos DB, and networking. Because our sample app only contains App Services, we will use App Service faults.
Prior to creating a chaos experiment, learners will want to scale out the App Service Plan to multiple instances, such as 3.
Create Azure Chaos Studio/Experiment
Register Chaos Studio Provider
Go to your subscription
On the left-hand side, select “Resource provider”
In the list of providers, search for “Microsoft.Chaos”
Click on the provider and select “Register”
Go to Azure Chaos Studio
Navigate to the Targets menu on the left-hand side and then select the App Service you will test. From there, select “Enable Targets” then “Enable service-direct targets” to complete enablement.
On the left-hand side menu for Chaos Studio, select the “Experiments” option.
Select “+ Create” and then choose “New experiment” from the dropdown.
Fill in your subscription, resource group, location, and name for this experiment. Keep track of the experiment name as a managed user will be created for you.
Go to the experiment designer on the next tab. Change the name of the step or branch if you wish.
Select “Add Action” and then “Add Fault” to create a new fault.
Select “Stop App Service” as the Fault. You can choose the duration but the minimum duration is 5 minutes and that should be enough. From the target resources tab, choose the App Service you wish to test and then select “Add” to complete the addition.
Save your experiment by selecting “Review + create” and then choosing the “Create” option.
Update App Service Permissions
In the appropriate App Service, select “Access control (IAM)” from the left-hand menu.
Select “Add” followed by “Add Role Assignment”
Select the Contributor role, then select the “Members” tab. Choose the “+ Select members” link.
Search the name of the experiment from the earlier step and then click “Select”.
Review and assign the permissions which will grant the role to the experiment.
Run load test + experiment
Run the load test first, then while the load test is running kick off the chaos experiment. You should notice that the application returns a 403 error and the web app is stopped.