Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Task 03 - Identifying bottlenecks, stress tests, and resilience testing

  • The following high-level notes cover step 1.
    • For a primer on identifying bottlenecks using Azure Load Testing, this article covers the process.
    • Because there is no database associated with this application, identifying areas of slowdown will require code analysis.
    • Learners should notice that there are two operations which consistently lag behind everything else in terms of performance: initial loading of messages and posting a new message. In both cases, the root cause is a thread sleep loop. After removing these “speed loops,” learners should notice a drastic performance improvement when performing those operations.
  • The following high-level notes cover steps 2-4. There is also a sample JMeter script in the solution directory.
  • The following high-level notes cover steps 5-8.
    • Chaos Studio does not offer a GitHub Action at this time, so learners will not be able to integrate this into their CI/CD pipelines.
    • Azure Chaos studio only supports a limited number of services, such as VMs, AKS, App Services, Cosmos DB, and networking. Because our sample app only contains App Services, we will use App Service faults.
    • Prior to creating a chaos experiment, learners will want to scale out the App Service Plan to multiple instances, such as 3.
    • Create Azure Chaos Studio/Experiment
      • Register Chaos Studio Provider
        • Go to your subscription
        • On the left-hand side, select “Resource provider”
        • In the list of providers, search for “Microsoft.Chaos”
        • Click on the provider and select “Register”
      • Go to Azure Chaos Studio
      • Navigate to the Targets menu on the left-hand side and then select the App Service you will test. From there, select “Enable Targets” then “Enable service-direct targets” to complete enablement.
      • On the left-hand side menu for Chaos Studio, select the “Experiments” option.
      • Select “+ Create” and then choose “New experiment” from the dropdown.
      • Fill in your subscription, resource group, location, and name for this experiment. Keep track of the experiment name as a managed user will be created for you.
      • Go to the experiment designer on the next tab. Change the name of the step or branch if you wish.
      • Select “Add Action” and then “Add Fault” to create a new fault.
      • Select “Stop App Service” as the Fault. You can choose the duration but the minimum duration is 5 minutes and that should be enough. From the target resources tab, choose the App Service you wish to test and then select “Add” to complete the addition.
      • Save your experiment by selecting “Review + create” and then choosing the “Create” option.
    • Update App Service Permissions
      • In the appropriate App Service, select “Access control (IAM)” from the left-hand menu.
      • Select “Add” followed by “Add Role Assignment”
      • Select the Contributor role, then select the “Members” tab. Choose the “+ Select members” link.
      • Search the name of the experiment from the earlier step and then click “Select”.
      • Review and assign the permissions which will grant the role to the experiment.
    • Run load test + experiment
      • Run the load test first, then while the load test is running kick off the chaos experiment. You should notice that the application returns a 403 error and the web app is stopped.