4 minute read

Running iOS Performance Testing on CI

Intro

The proof of concept we’ll discuss today goes a bit beyond the boundaries of what XCTest Performance offers out of the box. And that’s only because the standard tooling has limitations, which we’ll try to detail shortly.

Although if you have a self-hosted runner and a farm of physical devices that you gonna use for Performance testing then all good and you’re covered, just follow the official docs.

Otherwise, grab some 🍿 and enjoy reading.

How it’s supposed to work

  1. Write the test
  2. Run it the first time to capture the baseline
  3. Run it again at some point to compare the current result with the baseline

Existing limitations

The idea behind XCTMetric is great and the API is really gorgeous, but as I mentioned there are quite some limitations. And the most frustrating thing of all is the lack of information about these boundaries.

The impression after reading the docs leaves you with a thought that it’s all rosy, just write these lightweight tests and run them wherever and whenever you wish. But, unfortunately, that’s not entirely true.

  • Target device model

    You can’t use test results with baseline if baseline was recorded on another device model.

  • Running machine

    You can’t use test results with baseline if baseline was recorded on another machine.

  • Metrics availability

    The availability of metrics depends drastically on the architecture of the test device.

    1. Available metrics on iOS Simulator:

      • Duration (a completely useless metric)
    2. Available metrics on Physical Device:

      • Duration
      • Frame Count
      • Frame rate
      • Hitch time ratio
      • Hitches total duration
      • Number of hitches
  • To sum up

    You have to run performance tests on the same machine and on the same physical device.

    Sounds reasonable, isn’t it? But it kind of cuts off at the root the possibility of running these tests on CI. Unless, as I mentioned in the very beginning, you have a self-hosted runner and a physical device connected to it.

Push the boundaries

I assume you already have some Performance tests. And so what I offer is to:

  1. Forget about the native Xcode baseline and create a JSON file with a custom baseline
    • e.g.: {"duration":3,"frame_rate":75,"hitches":0,...}
  2. Set up a Firebase TestLab account
    • it allows to use Physical Devices at no-cost up to 30 min/day or 5 runs/day depending on the plan (which I think is grand for our task)
  3. Choose one device and always run the performance tests on it
  4. Parse the logs and get the metrics’ results
    • the file name is xcodebuild_output.log
  5. Compare the result with the custom baseline from the first step
    • keep in mind that the Xcode’s default baseline threshold was 10%, adjust yours accordingly


This way:

  • Your CI will be waiting for the tests to be completed
  • The tests will always pass on Firebase TestLab due to the absence of a baseline
  • And then you will do your own math and fail the job if required


Here is an example of what it might look like (within fastlane):

lane :run_xctmetric do
  # Get your custom baseline
  expected_performance = JSON.parse(File.read('YOUR_BASELINE_CONFIG.json'))

  # Build the app and xctestrun
  scan(
    project: 'YOUR_PROJECT_PATH',
    scheme: 'YOUR_SCHEME_NAME',
    testplan: 'YOUR_PERFORMANCE_TESTPLAN_NAME',
    result_bundle: true,
    derived_data_path: 'derived_data/',
    sdk: 'iphoneos',
    skip_detect_devices: true,
    build_for_testing: true
  )

  Dir.chdir('../derived_data/Build/Products') do
    # Zip all the test-related schtuff
    sh('zip -r MyTests.zip .')

    # Upload the tests to Firebase TestLab and wait for the completion
    sh('gcloud firebase test ios run --test MyTests.zip --timeout 7m --results-dir test_output --device "model=iphone14pro,version=16.6,orientation=portrait"')

    # Download the logs from Google Cloud
    sh('gsutil cp -r gs://YOUR_FIREBASE_TESTLAB_BUCKET_ID/test_output/iphone14pro-16.6-en-portrait/xcodebuild_output.log xcodebuild_output.log')

    # Parse the logs and extract the performance metrics
    actual_performance = extract_xctmetric_result(log: File.read('xcodebuild_output.log'))

    # Compare the result with the custom baseline
    success = do_your_own_math(expected_performance, actual_performance)

    if success
      UI.success('🟢 Performance benchmark passed.')
    else
      UI.user_error!('🔴 Performance benchmark failed.')
    end
  end
end

private_lane :extract_xctmetric_result do |options|
  metrics = {}

  # Collect the metrics for all performance tests one by one
  ['test_YourFirstTestName', 'test_YourSecondTestName'].each do |test_name|
    hitches_total_duration = options[:log].match(/#{test_name}\]' measured \[Hitches Total Duration \(Scroll_DraggingAndDeceleration\), ms\] average: (\d+\.\d+)/)
    duration = options[:log].match(/#{test_name}\]' measured \[Duration \(Scroll_DraggingAndDeceleration\), s\] average: (\d+\.\d+)/)
    hitch_time_ratio = options[:log].match(/#{test_name}\]' measured \[Hitch Time Ratio \(Scroll_DraggingAndDeceleration\), ms per s\] average: (\d+\.\d+)/)
    frame_rate = options[:log].match(/#{test_name}\]' measured \[Frame Rate \(Scroll_DraggingAndDeceleration\), fps\] average: (\d+\.\d+)/)
    number_of_hitches = options[:log].match(/#{test_name}\]' measured \[Number of Hitches \(Scroll_DraggingAndDeceleration\), hitches\] average: (\d+\.\d+)/)

    metrics[test_name] = {
      'hitches_total_duration' => {
        'value' => hitches_total_duration[1].to_f.round(2),
        'ext' => 'ms'
      },
      'duration' => {
        'value' => duration[1].to_f.round(2),
        'ext' => 's'
      },
      'hitch_time_ratio' => {
        'value' => hitch_time_ratio[1].to_f.round(2),
        'ext' => 'ms per s'
      },
      'frame_rate' => {
        'value' => frame_rate[1].to_f.round(2),
        'ext' => 'fps'
      },
      'number_of_hitches' => {
        'value' => number_of_hitches[1].to_f.round(2),
        'ext' => ''
      }
    }
  end

  metrics
end


And here is an example of the custom baseline JSON file:

{
  "test_YourFirstTestName": {
    "hitches_total_duration": {
      "value": 10,
      "ext": "ms"
    },
    "duration": {
      "value": 2.6,
      "ext": "s"
    },
    "hitch_time_ratio": {
      "value": 4,
      "ext": "ms per s"
    },
    "frame_rate": {
      "value": 75,
      "ext": "fps"
    },
    "number_of_hitches": {
      "value": 1,
      "ext": ""
    }
  },
  "test_YourSecondTestName": {
    "hitches_total_duration": {
      "value": 10,
      "ext": "ms"
    },
    "duration": {
      "value": 2.6,
      "ext": "s"
    },
    "hitch_time_ratio": {
      "value": 4,
      "ext": "ms per s"
    },
    "frame_rate": {
      "value": 75,
      "ext": "fps"
    },
    "number_of_hitches": {
      "value": 1,
      "ext": ""
    }
  }
}

Updated: