BIANCA
Research overview
BIANCA (Brain Intensity AbNormality Classification Algorithm) is a fully automated, supervised method for white matter hyperintensities (WMH) detection, based on the k-nearest neighbour (k-NN) algorithm. BIANCA classifies the image’s voxels based on their intensity and other features, and the output image represents the probability per voxel of being WMH. BIANCA is very flexible in terms of MRI modalities to use and offers different options for including and weighting spatial information, local spatial intensity averaging, and different options for the choice of the number and location of training points.
Citation
If you use BIANCA, please make sure to cite the following reference in your publications:
User guide
Data Preparation
Images preparation
BIANCA works in single subject's space, but all the MRI modalities need to be registered to a common base image (e.g. FLAIR) and have the same dimension (resolution and FOV). We also recommend running bias field correction on T1 and FLAIR (e.g. using FAST). If you want BIANCA to use spatial information, you will also need to calculate the (linear) transformation from the base image (in single subject's space) to MNI standard space.
Before running BIANCA you need to: - Choose your base image. This will be your reference space for your input and output images (e.g. FLAIR). - Perform brain extraction on (at least) one modality. BIANCA will use this to derive a brain mask within which to detect lesions. If you want to further restrict the area were lesions will be detected (and reduce false positives) you can consider pre-masking as described below (Masking section). - Register all the other modalities (e.g. T1) to your base image for each subject. - (optional) If you want BIANCA to use spatial information, linearly register the base image from single subject's space to MNI and save the transformation matrix (.mat file). This will be used to calculate spatial features from MNI coordinates.
Training dataset preparation
The algorithm requires a training set with pre-classified voxels (i.e. manually segmented images) that is used to create a set of feature vectors for lesion and non-lesion classes.
The lesion masks to be used as part of the training dataset need to be: - binary (1=lesion; 0=non-lesion) - in nifti (nii.gz) format - in the same space as your base image. If the manual segmentation was done on an image that was not the base image, the lesion mask need to be registered to the base image space (and binarised if you applied interpolation).
Pre-Trained datasets
We aim to provide pre-trained datasets within future versions of FSL, but here are some links to training datasets available from other sources:
- UK Biobank. The training files generated and used in the UK Biobank project is available here (bianca_class_data
and bianca_class_data_labels
files). Example useage below and here.
- Mixed dataset from UK Biobank and Whitehall II imaging studies. Details in this publication. Training files and useage instructions are available here.
Master file preparation
The master file is a text file that contains one row per subject (training or query) and on each row a list of all files needed for that subject:
- The images you want to use for classification (e.g. T1 and FLAIR), all coregistered to the same base space and at least one of them needs to be brain extracted (see Images Preparation section).
- The binary manual lesion mask (for query subjects use any "placehold" name to keep the same column order of the training subjects), coregistered to the base space (if needed).
- (optional) The transformation matrix from subject space to standard space. Needed to calculate spatial features (from MNI coordinates)
These can be in any (consistent) order, as the options in the BIANCA command line call will specify the meaning of each column.
Here is an example masterfile (e.g. masterfile.txt
):
subj01/FLAIR_brain.nii.gz subj01/T1_to_FLAIR.nii.gz subj01/FLAIR_to_MNI.mat subj01/WMHmask.nii.gz
subj02/FLAIR_brain.nii.gz subj02/T1_to_FLAIR.nii.gz subj02/FLAIR_to_MNI.mat subj02/WMHmask.nii.gz
...
subj<N>/FLAIR_brain.nii.gz subj<N>/T1_to_FLAIR.nii.gz subj<N>/FLAIR_to_MNI.mat subj<N>/WMHmask.nii.gz
Running BIANCA
Compulsory arguments
--singlefile=<masterfile>
name of the master file (e.g. masterfile.txt)--querysubjectnum=<num>
row number in the master file of the query subject (the one to be segmented)--brainmaskfeaturenum=<num>
column number in the master file containing the name of the image to derive non-zero (brain) mask from (1 in the example above). Note that this does not need to be a binary/mask image - it only needs to have zeros outside the brain (or ROI) and non-zeros inside.- Training dataset specification:
- If the training subjects to use are listed in the master file, the following arguments need to be specified:
--labelfeaturenum=<num>
column number in the master file containing the name of the manual lesion mask files (labelled images; 4 in the example above) and--trainingnums=<val>
subjects to be used in training. List of row numbers (comma separated, no spaces) or all to use all the subjects in the master file. If the query subject is also a training subject, it is automatically excluded from the training dataset and the lesions are estimated from the remaining training subjects
- Alternatively load from file (previously saved with
--saveclassifierdata
, see below):--loadclassifierdata=<name>
load training data (and labels) from file Note that all row and column numbers start counting from 1 (not zero).
Optional arguments
-o output
(base) file name (default: bianca_output)--featuresubset=<num>,<num>,...
list of column numbers (comma separated, no spaces) in the master file containing the name of the images to use as intensity features (1,2 in the example above to use FLAIR and T1)(default: use all modalities as features). The image used to specify the non-zero (brain) mask (--brainmaskfeaturenum
option) must be part of the features subset.--matfeaturenum=<num>
column number in masterlistfile of matrix files (linear transformation matrix from the base space to the MNI space). Needed to extract spatial features (MNI coordinates; 3 in the example above)--spatialweight=<value>
weighting for spatial coordinates (default = 1, i.e. variance-normalised MNI coordinates). Requires--matfeaturenum
to be specified. If set to 0 the spatial coordinates will be ignored (and no need to specify--matfeaturenum
). Higher value for spatial weighting leads to the neighbouring feature vectors being more likely to come from similar spatial locations (effectively making the training data more local).--patchsizes=<num>,<num>,...
list of patch sizes in voxels (comma separated, no spaces) for local averaging.--patch3D
use 3D patches (default is 2D)--selectpts=<val>
where to select the non-lesion points from the training dataset. Options: any (anywhere outside the lesion - default), noborder (exclude 3 voxels close to the lesion’s edge), surround (preferably within 5 voxels close to the lesion’s edge)--trainingpts=<val>
number (max) of (lesion) points to use (per training subject) or equalpoints to select all lesion points and equal number of non-lesion points (default: 2000)--nonlespts=<val>
number (max) of non-lesion points to use (per training subject). If not specified will be set to the same amount of lesion points (specified in --trainingpts)--saveclassifierdata=<name>
save training data to file. Two files will be saved:and _labels. When loading the training dataset with --loadclassifierdata, just specify and both files will be loaded. -v
use verbose mode
Examples of BIANCA calls
Using manual masks for training
# Run BIANCA using the example masterfile generated above
bianca --singlefile=masterfile.txt --labelfeaturenum=4 --brainmaskfeaturenum=1 --querysubjectnum=1 --trainingnums=1,2,3,4,5,6,7,8,9,10 --featuresubset=1,2 --matfeaturenum=3 --trainingpts=2000 --nonlespts=10000 --selectpts=noborder -o sub001_bianca_output –v
Using a pre-trained dataset (e.g. from UK Biobank)
# Generate the masterfile to run BIANCA on a query subject using UK Biobank training dataset. Note that:
# - the order of files needs to be the same as the order used to generate the training file
# - in this case T1 is the base space
echo querysubj01/T1_unbiased_brain.nii.gz querysubj01/T2_FLAIR_unbiased_to_T1.nii.gz querysubj01/T1_to_MNI_linear.mat > masterfile_forUKB.txt;
# Run BIANCA
$FSLDIR/bin/bianca --singlefile=masterfile_forUKB.txt --querysubjectnum=1 --brainmaskfeaturenum=1 --loadclassifierdata=bianca_class_data --matfeaturenum=3 --featuresubset=1,2 -o querysubj01_bianca_mask
NOTE: The output from BIANCA will depend critically on the choice of options and the quality of the training data and manual segmentations. The examples provided here are mainly to illustrate the command line and can be used as starting point, but we recommend to carefully check the results and adjust options as needed.
Post-processing
Threshold and binarise
BIANCA’s output is a 'probability' map of voxels to be classified as lesions. In order to obtain a binary mask, a thresholding and binarisation step is needed. This can be easily done with fslmaths (e.g. to threshold at 0.9):
Check your own data to establish the best threshold (e.g. by evaluating the overlap with the manual mask on test data – see section Performance Evaluation for more details)
As a potential alternative, LOCATE (LOCally Adaptive Thresholds Estimation) is a supervised method to automatically determine local thresholds in different regions of the brain (details in this publication). LOCATE takes into account the variability in lesion characteristics in different locations. Currently, a beta version of LOCATE is implemented in MATLAB. Details, code and user manual are available here.
Masking
If you see false positives in the output lesion mask in specific locations, it might be useful to apply a mask to exclude the affected region(s). For example, note that BIANCA is not optimized for segmentation of (juxta)cortical, cerebellar and subcortical lesions, hence masking out these areas will likely reduce false positives.
Creating the mask
The script below for example creates a mask from T1 images, which excludes cortical grey matter (GM) and the following structures: putamen, globus pallidus, nucleus accumbens, thalamus, brainstem, cerebellum, hippocampus, amygdala. The cortical GM is excluded from the brain mask by extracting the cortical CSF from single-subject’s CSF pve map (using FAST
), dilating it to reach the cortical GM, and excluding these areas. The other structures are identified in MNI space, non-linearly registered to the single-subjects’ images, and removed from the brain mask.
The first input is the basename of the structural image (e.g. T1_biascorr). The script works under the assumption that the brain extracted image would be called <structural image>_brain.nii.gz
. The second input is the CSF pve map (e.g. output from FAST
). The third input is the non-linear transformation warp file from standard space to structural image. If you ran fsl_anat
, you can use the file named MNI_to_T1_nonlin_field.nii.gz
in the fsl_anat output directory. If you have the warp file from structural to MNI, you can calculate the inverse with the command invwarp
(invwarp -w warpvol -o invwarpvol -r refvol
) If you use 1 for the last command line argument (keep_intermediate_files
), the folder containing temporary files will not be deleted.
Main output: <structural image>_bianca_mask.nii.gz
is a binary mask with 0 for regions to exclude and 1 to include.
In case T1 is not your base space, you need to register the mask to the base space.
Applying the mask
This mask can be applied to the BIANCA output (either before or after thresholding):
Alternatively, this can be applied to the input image, creating a tighter brain mask:
where FLAIR_masked.nii.gz
can be used instead of FLAIR_brain.nii.gz
in the master file and used for the --brainmaskfeaturenum
option.
Additional output: the file called <structural image>_vent.nii.gz
is a binary mask of segmented ventricles. This can be used to extract periventricular lesions (see Volume Calculation section for details)
Volume Calculation
The script below can be used to calculate the number of clusters (lesions) and volume of lesions in any BIANCA output image.
This will output the total number of clusters and the total lesion volume after applying
If the optional
Periventricular vs deep WMH volumes
This script below separates the (thresholded and binarised) BIANCA output into periventricular and deep WMHs, saves two separate binary images (perivent_map and deepwm_map) and calculates volume of total and separate WMHs. It uses the 10 mm distance rule: a lesion within 10 mm (included) from the ventricles is classified as periventricular, otherwise as deep (see this publication for further details)
bianca_perivent_deep <thresholded_binarised_WMH_map> <ventricles_mask> <minclustersize> <do_stats 0 1 2> <outputdir>
where make_bianca_mask
to create an exlusion mask for BIANCA output, the ventricle mask to use is the file ventmask.nii.gz
. If T1 and FLAIR were not in the same space, the ventricle mask needs to be registered to FLAIR (and binarised).
Performance evaluation
The script below can be used to evaluate BIANCA performance against a manual (reference) segmentation:
It extracts the following overlap measures (see reference paper for details):
- Dice Similarity Index (SI): calculated as 2*(voxels in the intersection of manual and BIANCA masks)/(manual mask lesion voxels + BIANCA lesion voxels)
- Voxel-level false discovery rate (FDR): number of voxels incorrectly labelled as lesion (false positives, FP) divided by the total number of voxels labelled as lesion by BIANCA (positive voxels)
- Voxel-level false negative ratio (FNR): number of voxels incorrectly labelled as non-lesion (false negatives, FN) divided by the total number of voxels labelled as lesion in the manual mask (true voxels)
- Cluster-level FDR: number of clusters incorrectly labelled as lesion (FP) divided by the total number of clusters found by BIANCA (positive clusters)
- Cluster-level FNR: number of clusters incorrectly labelled as non-lesion (FN) divided by the total number of lesions in the manual mask (true clusters)
- Mean Total Area (MTA): average number of voxels in the manual mask and BIANCA output (true voxels + positive voxels)/2
- Detection error rate (DER): sum of voxels belonging to FP or FN clusters, divided by MTA
- Outline error rate (OER): sum of voxels belonging to true positive clusters (WMH clusters detected by both manual and BIANCA segmentation), excluding the overlapping voxels, divided by MTA
In addition it calculates: - Volume of BIANCA segmentation (after applying the specified threshold) - Volume of manual mask
The first input is the lesion mask calculated by BIANCA (e.g. sub001_bianca_output.nii.gz), the second input is the threshold that will be applied to Overlap_and_Volumes_<lesionmask>_<threshold>.txt
in the same folder where the lesion mask is.