Speed up CI through impact analysis

July 28, 2020 2 minute read

Sponsored

Definition

What does impact analysis actually mean in terms of CI?

in brief:

It’s about time-saving thing — we define what was impacted and skip unrelated actions (e.g.: building or testing).

in practice:

We did a change to production code. We do analysis of what was affected. We will make a decision about what should be built and which tests should be executed depending on that analysis.

Surface impact analysis based on filters and triggers

Here we are using keywords in TRIGGER and IGNORE environment variables and running our tests that have been pre-filtered via tags/classes. This can be pretty helpful in monorepo world.

Create a script (check_changes.rb) that will be responsible for which files were impacted:

 require 'json'

 required_keys = ENV['TRIGGER'].split(',').map { |key| key.strip }
 ignored_keys = ENV['IGNORE'].nil? ? [] : ENV['IGNORE'].split(',').map { |key| key.strip }
 response = JSON.parse(`curl -s -H "authorization: Bearer #{ENV['GITHUB_TOKEN']}" -X GET -G #{ENV['PULL_REQUEST']}/files`)
 impacted_files = response.map { |file| file['filename'] }
 impacted_files.select! do |path|
   required_keys.any? do |required_key|
     should_key_be_considered = ignored_keys.none? { |key| path.include?(key) }
     should_key_be_considered && path.include?(required_key)
   end
 end

 puts impacted_files.size

Create CI pipeline that will be responsible for providing any required info to the script above and will work depending on the output:

 name: Sample

 on: [pull_request]

 jobs:
   first_project:
     name: FirstProjectTests
     runs-on: [macos-latest]
     steps:
       - uses: actions/checkout@v2
       - name: Runner
         env:
           PULL_REQUEST: ${{ github.event.pull_request._links.self.href }}
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           TRIGGER: "FirstProjectRelatedFolder/, VeryImportantFolder/, Fastfile"
           IGNORE: "FirstProjectFolder/tests/, .md"
         run: |
           if [ $(ruby ./.github/scripts/check_changes.rb) = 0 ]; then
             echo "Required files were not touched"
           else
             fastlane first_project_tests
           fi

   second_project:
     name: SecondProjectTests
     runs-on: [macos-latest]
     steps:
       - uses: actions/checkout@v2
       - name: Runner
         env:
           PULL_REQUEST: ${{ github.event.pull_request._links.self.href }}
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           TRIGGER: "SecondProjectRelatedFolder/, VeryImportantFolder/, Fastfile"
           IGNORE: "SecondProjectRelatedFolder/tests/, .md"
         run: |
           if [ $(ruby ./.github/scripts/check_changes.rb) = 0 ]; then
             echo "Required files were not touched"
           else
             fastlane second_project_tests
           fi

Deep impact analysis based on code coverage tools

Here we are using code coverage tools to generate an impact map and running our tests that have been matched against the impacted files.

As nothing changed since Paul Hammant has breathtakingly described it, I’ll just leave a link here:

The Rise of Test Impact Analysis

And make a quick guide to action:

Configure code coverage tool (e.g.: iOS example)
Run each test one by one

2.1. Collect code-coverage report for each test

2.2. Extract involved source files from code-coverage report

Create a map (impact_map.json), where test names are the keys and arrays of source files are the values, for example:

 {
   "SampleSuite/SampleClass/testMethod_1": [
     "path/to/a.swift",
     "path/to/b.swift",
     "path/to/c.swift"
   ],
   "SampleSuite/SampleClass/testMethod_2": [
     "path/to/c.swift",
     "path/to/d.swift",
     "path/to/e.swift"
   ]
 }

Create a script (check_changes.rb) that will be responsible for which tests to run:

 require 'json'

 impact_map = JSON.parse(File.read('impact_map.json'))
 response = JSON.parse(`curl -s -H "authorization: Bearer #{ENV['GITHUB_TOKEN']}" -X GET -G #{ENV['PULL_REQUEST']}/files`)
 impacted_files = response.map { |file| file['filename'] }

 impact_map.select! do |test, related_source_files|
   related_source_files.any? { |file| impacted_files.include?(file) }
 end

 puts impact_map.keys.join(',')

Create CI pipeline that will be responsible for providing any required info to the script above and will run the tests depending on the output:

 name: Sample

 on: [pull_request]

 jobs:
   sample:
     name: Tests
     runs-on: [macos-latest]
     steps:
       - uses: actions/checkout@v2
       - name: Runner
         env:
           PULL_REQUEST: ${{ github.event.pull_request._links.self.href }}
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         run: |
           essential_tests=$(ruby ./.github/scripts/check_changes.rb)
           fastlane test only_testing:essential_tests

Conclusion

This was just a superficial analysis/rough idea of how we are able to influence on the speed of CI process. Keep it in mind and build your own pipeline as fast as you wish!

See ya (: