BIANCA

Research overview

BIANCA (Brain Intensity AbNormality Classification Algorithm) is a fully automated, supervised method for white matter hyperintensities (WMH) detection, based on the k-nearest neighbour (k-NN) algorithm. BIANCA classifies the image’s voxels based on their intensity and other features, and the output image represents the probability per voxel of being WMH. BIANCA is very flexible in terms of MRI modalities to use and offers different options for including and weighting spatial information, local spatial intensity averaging, and different options for the choice of the number and location of training points.

Citation

If you use BIANCA, please make sure to cite the following reference in your publications:

L. Griffanti, G. Zamboni, A. Khan, L. Li, G. Bonifacio, V. Sundaresan, U. G. Schulz, W. Kuker, M. Battaglini, P. M. Rothwell, M. Jenkinson (2016) BIANCA (Brain Intensity AbNormality Classification Algorithm): a new tool for automated segmentation of white matter hyperintensities. Neuroimage. 141:191-205

User guide

Data Preparation

Images preparation

BIANCA works in single subject's space, but all the MRI modalities need to be registered to a common base image (e.g. FLAIR) and have the same dimension (resolution and FOV). We also recommend running bias field correction on T1 and FLAIR (e.g. using FAST). If you want BIANCA to use spatial information, you will also need to calculate the (linear) transformation from the base image (in single subject's space) to MNI standard space.

Before running BIANCA you need to: - Choose your base image. This will be your reference space for your input and output images (e.g. FLAIR). - Perform brain extraction on (at least) one modality. BIANCA will use this to derive a brain mask within which to detect lesions. If you want to further restrict the area were lesions will be detected (and reduce false positives) you can consider pre-masking as described below (Masking section). - Register all the other modalities (e.g. T1) to your base image for each subject. - (optional) If you want BIANCA to use spatial information, linearly register the base image from single subject's space to MNI and save the transformation matrix (.mat file). This will be used to calculate spatial features from MNI coordinates.

Training dataset preparation

The algorithm requires a training set with pre-classified voxels (i.e. manually segmented images) that is used to create a set of feature vectors for lesion and non-lesion classes.

The lesion masks to be used as part of the training dataset need to be: - binary (1=lesion; 0=non-lesion) - in nifti (nii.gz) format - in the same space as your base image. If the manual segmentation was done on an image that was not the base image, the lesion mask need to be registered to the base image space (and binarised if you applied interpolation).

Pre-Trained datasets

We aim to provide pre-trained datasets within future versions of FSL, but here are some links to training datasets available from other sources: - UK Biobank. The training files generated and used in the UK Biobank project is available here (bianca_class_data and bianca_class_data_labels files). Example useage below and here. - Mixed dataset from UK Biobank and Whitehall II imaging studies. Details in this publication. Training files and useage instructions are available here.

Master file preparation

The master file is a text file that contains one row per subject (training or query) and on each row a list of all files needed for that subject:

The images you want to use for classification (e.g. T1 and FLAIR), all coregistered to the same base space and at least one of them needs to be brain extracted (see Images Preparation section).
The binary manual lesion mask (for query subjects use any "placehold" name to keep the same column order of the training subjects), coregistered to the base space (if needed).
(optional) The transformation matrix from subject space to standard space. Needed to calculate spatial features (from MNI coordinates)

These can be in any (consistent) order, as the options in the BIANCA command line call will specify the meaning of each column.

Here is an example masterfile (e.g. masterfile.txt):

subj01/FLAIR_brain.nii.gz subj01/T1_to_FLAIR.nii.gz subj01/FLAIR_to_MNI.mat subj01/WMHmask.nii.gz
subj02/FLAIR_brain.nii.gz subj02/T1_to_FLAIR.nii.gz subj02/FLAIR_to_MNI.mat subj02/WMHmask.nii.gz
...
subj<N>/FLAIR_brain.nii.gz subj<N>/T1_to_FLAIR.nii.gz subj<N>/FLAIR_to_MNI.mat subj<N>/WMHmask.nii.gz

Running BIANCA

Compulsory arguments

--singlefile=<masterfile> name of the master file (e.g. masterfile.txt)
--querysubjectnum=<num> row number in the master file of the query subject (the one to be segmented)
--brainmaskfeaturenum=<num> column number in the master file containing the name of the image to derive non-zero (brain) mask from (1 in the example above). Note that this does not need to be a binary/mask image - it only needs to have zeros outside the brain (or ROI) and non-zeros inside.
Training dataset specification:
If the training subjects to use are listed in the master file, the following arguments need to be specified:
- --labelfeaturenum=<num> column number in the master file containing the name of the manual lesion mask files (labelled images; 4 in the example above) and
- --trainingnums=<val> subjects to be used in training. List of row numbers (comma separated, no spaces) or all to use all the subjects in the master file. If the query subject is also a training subject, it is automatically excluded from the training dataset and the lesions are estimated from the remaining training subjects
Alternatively load from file (previously saved with --saveclassifierdata, see below): --loadclassifierdata=<name> load training data (and labels) from file Note that all row and column numbers start counting from 1 (not zero).

Optional arguments

-o output (base) file name (default: bianca_output)
--featuresubset=<num>,<num>,... list of column numbers (comma separated, no spaces) in the master file containing the name of the images to use as intensity features (1,2 in the example above to use FLAIR and T1)(default: use all modalities as features). The image used to specify the non-zero (brain) mask (--brainmaskfeaturenum option) must be part of the features subset.
--matfeaturenum=<num> column number in masterlistfile of matrix files (linear transformation matrix from the base space to the MNI space). Needed to extract spatial features (MNI coordinates; 3 in the example above)
--spatialweight=<value> weighting for spatial coordinates (default = 1, i.e. variance-normalised MNI coordinates). Requires --matfeaturenum to be specified. If set to 0 the spatial coordinates will be ignored (and no need to specify --matfeaturenum). Higher value for spatial weighting leads to the neighbouring feature vectors being more likely to come from similar spatial locations (effectively making the training data more local).
--patchsizes=<num>,<num>,... list of patch sizes in voxels (comma separated, no spaces) for local averaging.
--patch3D use 3D patches (default is 2D)
--selectpts=<val> where to select the non-lesion points from the training dataset. Options: any (anywhere outside the lesion - default), noborder (exclude 3 voxels close to the lesion’s edge), surround (preferably within 5 voxels close to the lesion’s edge)
--trainingpts=<val> number (max) of (lesion) points to use (per training subject) or equalpoints to select all lesion points and equal number of non-lesion points (default: 2000)
--nonlespts=<val> number (max) of non-lesion points to use (per training subject). If not specified will be set to the same amount of lesion points (specified in --trainingpts)
--saveclassifierdata=<name> save training data to file. Two files will be saved: and _labels. When loading the training dataset with --loadclassifierdata, just specify and both files will be loaded.
-v use verbose mode

Examples of BIANCA calls

Using manual masks for training

# Run BIANCA using the example masterfile generated above
bianca --singlefile=masterfile.txt --labelfeaturenum=4 --brainmaskfeaturenum=1 --querysubjectnum=1 --trainingnums=1,2,3,4,5,6,7,8,9,10 --featuresubset=1,2 --matfeaturenum=3 --trainingpts=2000 --nonlespts=10000 --selectpts=noborder -o sub001_bianca_output –v

With this command BIANCA will use data from masterfile.txt. It will look for information about pre-labelled images in the 4th column of the master file and will limit the search to the mask derived from the image in the 1st column. The subject to segment is the first subject of the master file (first row). Since this subject is also one of the training subjects, BIANCA will use only the remaining 9 for the training (like the LOO approach, to avoid bias and overfitting). BIANCA will use as spatial features the images in the 1st and 2nd columns of the master file. It will also extract the spatial features (MNI coordinate) using the transformation matrix listed in the 3rd column of the master file. For the training, BIANCA will use, for each training subject, (up to) 2000 points among the voxels labeled as lesion and (up to) 10000 points among the non-lesion voxels, excluding voxels close to the lesion’s edge. The output image will be called sub001_bianca_output. Verbose mode is on.

Using a pre-trained dataset (e.g. from UK Biobank)

# Generate the masterfile to run BIANCA on a query subject using UK Biobank training dataset. Note that:
# - the order of files needs to be the same as the order used to generate the training file
# - in this case T1 is the base space
echo querysubj01/T1_unbiased_brain.nii.gz querysubj01/T2_FLAIR_unbiased_to_T1.nii.gz querysubj01/T1_to_MNI_linear.mat > masterfile_forUKB.txt;

# Run BIANCA
$FSLDIR/bin/bianca --singlefile=masterfile_forUKB.txt --querysubjectnum=1 --brainmaskfeaturenum=1 --loadclassifierdata=bianca_class_data --matfeaturenum=3 --featuresubset=1,2 -o querysubj01_bianca_mask

NOTE: The output from BIANCA will depend critically on the choice of options and the quality of the training data and manual segmentations. The examples provided here are mainly to illustrate the command line and can be used as starting point, but we recommend to carefully check the results and adjust options as needed.

Post-processing

Threshold and binarise

BIANCA’s output is a 'probability' map of voxels to be classified as lesions. In order to obtain a binary mask, a thresholding and binarisation step is needed. This can be easily done with fslmaths (e.g. to threshold at 0.9):

fslmaths sub001_bianca_output –thr 0.9 –bin sub001_bianca_output_thr09bin

Check your own data to establish the best threshold (e.g. by evaluating the overlap with the manual mask on test data – see section Performance Evaluation for more details)

As a potential alternative, LOCATE (LOCally Adaptive Thresholds Estimation) is a supervised method to automatically determine local thresholds in different regions of the brain (details in this publication). LOCATE takes into account the variability in lesion characteristics in different locations. Currently, a beta version of LOCATE is implemented in MATLAB. Details, code and user manual are available here.

Masking

If you see false positives in the output lesion mask in specific locations, it might be useful to apply a mask to exclude the affected region(s). For example, note that BIANCA is not optimized for segmentation of (juxta)cortical, cerebellar and subcortical lesions, hence masking out these areas will likely reduce false positives.

Creating the mask

The script below for example creates a mask from T1 images, which excludes cortical grey matter (GM) and the following structures: putamen, globus pallidus, nucleus accumbens, thalamus, brainstem, cerebellum, hippocampus, amygdala. The cortical GM is excluded from the brain mask by extracting the cortical CSF from single-subject’s CSF pve map (using FAST), dilating it to reach the cortical GM, and excluding these areas. The other structures are identified in MNI space, non-linearly registered to the single-subjects’ images, and removed from the brain mask.

make_bianca_mask <structural_image> <CSF pve> <warp_file_MNI2structural> <keep_intermediate_files>

The first input is the basename of the structural image (e.g. T1_biascorr). The script works under the assumption that the brain extracted image would be called <structural image>_brain.nii.gz. The second input is the CSF pve map (e.g. output from FAST). The third input is the non-linear transformation warp file from standard space to structural image. If you ran fsl_anat, you can use the file named MNI_to_T1_nonlin_field.nii.gz in the fsl_anat output directory. If you have the warp file from structural to MNI, you can calculate the inverse with the command invwarp (invwarp -w warpvol -o invwarpvol -r refvol) If you use 1 for the last command line argument (keep_intermediate_files), the folder containing temporary files will not be deleted.

Main output: <structural image>_bianca_mask.nii.gz is a binary mask with 0 for regions to exclude and 1 to include. In case T1 is not your base space, you need to register the mask to the base space.

Applying the mask

This mask can be applied to the BIANCA output (either before or after thresholding):

fslmaths sub001_bianca_output –mas T1_bianca_mask_to_FLAIR sub001_bianca_output_masked

Alternatively, this can be applied to the input image, creating a tighter brain mask:

fslmaths FLAIR –mas T1_bianca_mask_to_FLAIR FLAIR_masked

where FLAIR_masked.nii.gz can be used instead of FLAIR_brain.nii.gz in the master file and used for the --brainmaskfeaturenum option.

Additional output: the file called <structural image>_vent.nii.gz is a binary mask of segmented ventricles. This can be used to extract periventricular lesions (see Volume Calculation section for details)

Volume Calculation

The script below can be used to calculate the number of clusters (lesions) and volume of lesions in any BIANCA output image.

bianca_cluster_stats <bianca_output_map> <threshold> <min_cluster_size> [<mask>]

This will output the total number of clusters and the total lesion volume after applying (if you have already thresholded and binarised the lesion mask you can simply put 0) and including clusters bigger than , where the size is expressed in number of voxels.

If the optional file is specified, it will also calculate the number of clusters and lesion volume within the specified mask. The mask needs to be in the same space as

Periventricular vs deep WMH volumes

This script below separates the (thresholded and binarised) BIANCA output into periventricular and deep WMHs, saves two separate binary images (perivent_map and deepwm_map) and calculates volume of total and separate WMHs. It uses the 10 mm distance rule: a lesion within 10 mm (included) from the ventricles is classified as periventricular, otherwise as deep (see this publication for further details)

bianca_perivent_deep <thresholded_binarised_WMH_map> <ventricles_mask> <minclustersize> <do_stats 0 1 2> <outputdir>

where is BIANCA output thresholded at the desired threshold and binarised, is a binary mask of the ventricles. If you used make_bianca_mask to create an exlusion mask for BIANCA output, the ventricle mask to use is the file ventmask.nii.gz. If T1 and FLAIR were not in the same space, the ventricle mask needs to be registered to FLAIR (and binarised). is the minimum cluster size (in voxels) to consider (use 0 for no cluster thresholding). is to delect options for output format: - if 0 it will only produce the images and not calculate volumes - if 1 it will calculate volumes for total, periventricular and deep WMH and display on the screen - if 2 it will calculate volumes and save them in the file WMH_tot_pvent_deep_10mm.txt is the directory where the output will be saved

Performance evaluation

The script below can be used to evaluate BIANCA performance against a manual (reference) segmentation:

bianca_overlap_measures <lesionmask> <threshold> <manualmask> <saveoutput>

It extracts the following overlap measures (see reference paper for details):

Dice Similarity Index (SI): calculated as 2*(voxels in the intersection of manual and BIANCA masks)/(manual mask lesion voxels + BIANCA lesion voxels)
Voxel-level false discovery rate (FDR): number of voxels incorrectly labelled as lesion (false positives, FP) divided by the total number of voxels labelled as lesion by BIANCA (positive voxels)
Voxel-level false negative ratio (FNR): number of voxels incorrectly labelled as non-lesion (false negatives, FN) divided by the total number of voxels labelled as lesion in the manual mask (true voxels)
Cluster-level FDR: number of clusters incorrectly labelled as lesion (FP) divided by the total number of clusters found by BIANCA (positive clusters)
Cluster-level FNR: number of clusters incorrectly labelled as non-lesion (FN) divided by the total number of lesions in the manual mask (true clusters)
Mean Total Area (MTA): average number of voxels in the manual mask and BIANCA output (true voxels + positive voxels)/2
Detection error rate (DER): sum of voxels belonging to FP or FN clusters, divided by MTA
Outline error rate (OER): sum of voxels belonging to true positive clusters (WMH clusters detected by both manual and BIANCA segmentation), excluding the overlapping voxels, divided by MTA

In addition it calculates: - Volume of BIANCA segmentation (after applying the specified threshold) - Volume of manual mask

The first input is the lesion mask calculated by BIANCA (e.g. sub001_bianca_output.nii.gz), the second input is the threshold that will be applied to before calculating the overlap measures (if you have already thresholded and binarised the lesion mask you can simply put 0), the third input is the manual mask, used as reference to calculate the overlap measures. If is set to 0 it will output the measures' names and values on the screen with the following order: SI, FDR, FNR, FDR(cluster-level), FNR(cluster-level), DER, OER, MTA, lesion mask's volume, manual mask's volume. If is set to 1 it will save only the values (in the same order) in a file called Overlap_and_Volumes_<lesionmask>_<threshold>.txt in the same folder where the lesion mask is.