Task 03 - Identifying bottlenecks, stress tests, and resilience testing

The following high-level notes cover step 1.
- For a primer on identifying bottlenecks using Azure Load Testing, this article covers the process.
- Because there is no database associated with this application, identifying areas of slowdown will require code analysis.
- Learners should notice that there are two operations which consistently lag behind everything else in terms of performance: initial loading of messages and posting a new message. In both cases, the root cause is a thread sleep loop. After removing these “speed loops,” learners should notice a drastic performance improvement when performing those operations.
The following high-level notes cover steps 2-4. There is also a sample JMeter script in the solution directory.
The following high-level notes cover steps 5-8.
- Chaos Studio does not offer a GitHub Action at this time, so learners will not be able to integrate this into their CI/CD pipelines.
- Azure Chaos studio only supports a limited number of services, such as VMs, AKS, App Services, Cosmos DB, and networking. Because our sample app only contains App Services, we will use App Service faults.
- Prior to creating a chaos experiment, learners will want to scale out the App Service Plan to multiple instances, such as 3.
- Create Azure Chaos Studio/Experiment
  - Register Chaos Studio Provider
    - Go to your subscription
    - On the left-hand side, select “Resource provider”
    - In the list of providers, search for “Microsoft.Chaos”
    - Click on the provider and select “Register”
  - Go to Azure Chaos Studio
  - Navigate to the Targets menu on the left-hand side and then select the App Service you will test. From there, select “Enable Targets” then “Enable service-direct targets” to complete enablement.
  - On the left-hand side menu for Chaos Studio, select the “Experiments” option.
  - Select “+ Create” and then choose “New experiment” from the dropdown.
  - Fill in your subscription, resource group, location, and name for this experiment. Keep track of the experiment name as a managed user will be created for you.
  - Go to the experiment designer on the next tab. Change the name of the step or branch if you wish.
  - Select “Add Action” and then “Add Fault” to create a new fault.
  - Select “Stop App Service” as the Fault. You can choose the duration but the minimum duration is 5 minutes and that should be enough. From the target resources tab, choose the App Service you wish to test and then select “Add” to complete the addition.
  - Save your experiment by selecting “Review + create” and then choosing the “Create” option.
- Update App Service Permissions
  - In the appropriate App Service, select “Access control (IAM)” from the left-hand menu.
  - Select “Add” followed by “Add Role Assignment”
  - Select the Contributor role, then select the “Members” tab. Choose the “+ Select members” link.
  - Search the name of the experiment from the earlier step and then click “Select”.
  - Review and assign the permissions which will grant the role to the experiment.
- Run load test + experiment
  - Run the load test first, then while the load test is running kick off the chaos experiment. You should notice that the application returns a 403 error and the web app is stopped.