FSL CI infrastructure

FSL conda packages are automatically built and published using GitLab CI pipeline rules implemented in the fsl/conda/fsl-ci-rules repository. FSL releases are built and published using CI rules implemented in the fsl/conda/manifest-rules repository. This page describes the infrastructure on which these jobs are executed.

GitLab runners

macOS jobs are executed on physical, manually managed Apple hardware. Linux and platform-agnostic jobs are executed on automatically managed AWS infrastructure.

All GitLab CI jobs are executed on the following infrastructure:

Linux and platform-agnostic jobs are executed on AWS EC2 infrastructure, using a gitlab runner configured for auto-scaling.
Intel macOS jobs are executed on gitlab runners installed on old macbooks managed by Paul.
M1 macOS jobs are executed on a gitlab runner installed on a mac mini, managed by Duncan.
Package publishing jobs are executed on the channel server, using a gitlab runner managed by Duncan.

These runners and any newly added runners MUST have the following tags: - Intel macOS runners have the tags fsl-ci,macOS-64 - Apple Silicon macOS runners have the tags fsl-ci,macOS-M1 - Linux x86/64 runners have the tags fsl-ci,linux,linux-x64 - Linux aarch64 runners have the tags fsl-ci,linux,linux-aarch64 - The package publishing runner has the tag fslconda-channel-host

The Linux runners are automatically configured with Terraform - see below.

More runners (with the same tags as above) can be added as needed. For macOS runners, it is assumed that: - git and git-lfs are installed - A conda environment is installed at ~/micromamba/ (for whichever user the gitlab-runner is running under), which has conda-build and the fsl-ci-rules dependencies installed.

When new Linux/platform-agnostic jobs are scheduled, the auto-scaling runner uses docker+machine to create one or more EC2 instances (provisioned with docker), and dispatches the jobs to those instances. When there are no more jobs to execute, the EC2 instances are destroyed.

Docker images used for the jobs are hosted on the Amazon Elastic Container Registry, at https://gallery.ecr.aws/fsl/. These images are built from Dockerfiles, and using CI jobs, in the fsl/conda/fsl-ci-rules repository.

Jobs submitted to the Linux runners use the fsldevelopment/fsl-almalinux-64 Docker image. This is a multi-platform docker image, so can be used to run platform-agnostic jobs in addition to arm64/amd64 jobs.

CUDA projects are built using the fsldevelopment/fsl-almalinux-64-cuda-11.0 docker image (also a multi-platform image).

AWS infrastructure

Linux and platform-agnostic CI jobs are executed on AWS infrastructure, which is defined and managed using a Terraform configuration located in the fsl/conda/fsl-ci-rules repository. All of the infrastructure runs within the eu-west-2 region.

An AWS IAM user account called fsldevelopment has been created specifically for use within this system. This account has the following permissions:

AmazonS3FullAccess for the gitlab runner S3 cache
AmazonEC2FullAccess for creating/managing EC2 instances
AmazonEC2ContainerRegistryFullAccess for pulling from / pushing to private ECR repositories (not presently used)
AmazonElasticContainerRegistryPublicFullAccess for pushing to public ECR repositories.

A S3 bucket called fsldevelopment-bucket is used for storing Terraform state, and used by the GitLab runners.

Follow these steps whenever the infrastructure needs to be re-configured:

Make sure you have Terraform installed on your local machine.
Register two new GitLab runners on the fsl/ group via the GitLab web UI, giving them these tags:
- fsl-ci,linux,linux-x64
- fsl-ci,linux,linux-aarch64

Make a note of the runner registration tokens.

Delete the old GitLab runners with the same tags via the GitLab web UI.

Set environment variables containing the AWS credentials for the fsldevelopment IAM user:

export AWS_ACCESS_KEY_ID="<access-key-id>"
export AWS_SECRET_ACCESS_KEY="<access-key>"
export TF_VAR_aws_access_key_id=${AWS_ACCESS_KEY_ID}
export TF_VAR_aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}

Clone the fsl-ci-rules repository, and change into the terraform directory:

git clone https://git.fmrib.ox.ac.uk/fsl/conda/fsl-ci-rules/
cd fsl-ci-rules/terraform

Run terraform init (edit backend.config if you need to change the AWS region or S3 bucket name):

terraform init -backend-config=backend.config

Generate a SSH key pair. This will be used to connect to the new runner manager instance. One method of generating a key pair is:
```
ssh-keygen -t ed25519 -f fsldevelopment_key -N ""
```
Fill in the necessary values in configuration.tfvars:
The GitLab runner registration tokens from step 3 above.
Paths to your private and public SSH key files.
AMI IDs if you need to change the host OSes used for running jobs.

Destroy the existing infrastucture by running:

terraform destroy -var-file=configuration.tfvars

Re-create the infrastucture by running:

terraform apply -var-file=configuration.tfvars

Channel host gitlab runner

In order to facilitate automatic deployment of built conda packages:

The Gitlab CI runner which is used to run the deployment jobs must be running on a server which has access to the conda channel directories.
The runner must have the fslconda-channel-host tag.
In the .gitlab-ci.yml file of the fsl/conda/fsl-ci-rules repository, the following variables must be set, denoting the URLs, and locally accessible directories, of the conda channels:
- FSLCONDA_PUBLIC_CHANNEL_URL: https:// URL of the public channel
- FSLCONDA_DEVELOPMENT_CHANNEL_URL: https:// URL of the development channel
- FSLCONDA_INTERNAL_CHANNEL_URL: https:// URL of the internal channel
- FSLCONDA_PUBLIC_CHANNEL_DIRECTORY: Locally accessible directory containing the public channel
- FSLCONDA_DEVELOPMENT_CHANNEL_DIRECTORY: Locally accessible directory containing the development channel
- FSLCONDA_INTERNAL_CHANNEL_DIRECTORY: Locally accessible directory containing the internal channel