PANDORA UKBv1
Population Archive of Neuroimaging Data Organised for Rapid Analysis
Aslan Abivardi1,2 Matthew Webster1 Paul McCarthy1 Fidel Alfaro-Almagro1 Lav Radosavljevic3
Karla Miller1 Saad Jbabdi1 Mark Woolrich4 Weikang Going5 Christian Beckmann6 Lloyd Elliott7
Thomas Nichols2,1 Stephen Smith1
1 FMRIB, OxCIN, NDCN, Oxford University, UK
2 Department of Psychiatry, Psychotherapy and Psychosomatics, University of Zurich, Switzerland
3 Big Data Institute, Oxford University, UK
4 OHBA, OxCIN, Psychiatry, Oxford University, UK
5 School of Data Science, Fudan University, China
6 Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
7 Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
This is the central documentation for the new PANDORA data resource, which UK Biobank have now made available on RAP.
Queries on PANDORA: please email aslan.abivardi@ndcn.ox.ac.uk and stephen.smith@ndcn.ox.ac.uk
Preprint on PANDORA: https://www.medrxiv.org/content/10.64898/2026.01.05.26343425v1
Contents
Sub-modalities available in PANDORA
Extracting non-imaging variables to regress into the PANDORA imaging data
Running regressions using PANDORA imaging data
Genetic associations - extracting data from a SNP to regress into PANDORA
PANDORA UKBv1 is a massive archive of the UK Biobank brain imaging data, containing all subjects’ images output by the core UKB brain image processing pipeline. For each different output from each modality (e.g., FA from dMRI), the images are collated into one big subjectsXvoxels matrix. PANDORA includes a tool for easy voxelwise cross-subject regression against variables such as genetics and lifestyle factors. It also includes a highly efficient supervoxel version of the data - much smaller and faster to work with, but in general losing no signal or spatial detail, while also denoising.
PANDORA is easy and quick to use on the UK Biobank RAP (Research Analysis Platform). Downloading a local copy of one sub-modality’s data from the central data store takes a few minutes, and then running a regression across 80K subjects and 2M voxels takes between a few seconds and a couple of hours, depending on the modality, the number of subjects, and the size of the regression design matrix.
See PANDORA.pdf overview slides for more details and examples.
There are two ways to use PANDORA, and we recommend the first:
Option 1: using the brain imaging docker
We recommend using the imaging-friendly Docker for PANDORA analyses and visualisation, so you should start by getting that running inside a sufficiently powerful RAP compute instance. This takes about 10 minutes to install. See setup instructions at: Imaging-friendly docker for UK Biobank RAP
Then download PANDORA files. The globals.tar and subjectIDs_union.sample files are needed for all PANDORA analyses, and then you can select one of more imaging submodalities to download (e.g., T1_VBM is included here):
mkdir PANDORA ; cd PANDORA ; PANDORA=`pwd`
tar xvf /mnt/project/Bulk/Brain\ MRI/PANDORA/globals.tar
cp /mnt/project/Bulk/Brain\ MRI/PANDORA/subjectIDs_union.sample globals
tar xvf /mnt/project/Bulk/Brain\ MRI/PANDORA/T1_VBM.tar
cd ..
Option 2: using your own RAP compute instance
Alternatively, if you want to get going faster and don’t need image visualisation, you can just install the PANDORA regression tool into any running RAP compute instance unix shell:
conda create -c https://fsl.fmrib.ox.ac.uk/fsldownloads/fslconda/public/ \
-c conda-forge -y -p ./fsl fsl-melodic blas=*=*mkl
source activate ./fsl ; export FSLDIR=$(pwd)/fsl/ ; source $FSLDIR/etc/fslconf/fsl.sh
You will now need to install PANDORA data files from the central UKB data store on RAP. The globals.tar and subjectIDs_union.sample files are needed for all PANDORA analyses, and then you can select one of more imaging submodalities to download (e.g., T1_VBM is included here):
mkdir PANDORA ; cd PANDORA ; PANDORA=`pwd`
dx download Bulk/Brain\ MRI/PANDORA/globals.tar
tar xvf globals.tar
cd globals
dx download Bulk/Brain\ MRI/PANDORA/subjectIDs_union.sample
cd ..
dx download Bulk/Brain\ MRI/PANDORA/T1_VBM.tar
tar xvf T1_VBM.tar ; rm *.tar
cd ..
Key: 1mm voxels - 2mm voxels - CIFTI
T1 | T1 (normalised image intensities), T1_VBM (local GM volume/density), |
T2_FLAIR | T2_FLAIR (normalised image intensities), T2_FLAIR_lesions* (white matter hyperintensities) |
SWI | SWI_T2star, SWI_QSM |
dMRI | DTI_FA, DTI_MO, DTI_MD, DTI_L1-3, |
tfMRI | tfMRI_zstat1 (shapes), tfMRI_zstat2 (faces), tfMRI_zstat5 (faces-shapes), |
rfMRI | { RSN_d25/1, … , RSN_d25/25 }, { RSN_d50/1, … , RSN_d50/50 } |
* T2_FLAIR_lesions is an unusual modality in that the original data is binary. Hence regression statistics should be interpreted with caution, as the residuals may be highly non-Gaussian.
These instructions describe one method of extracting non-imaging UK Biobank data fields for use in an analysis:
The contrasts file should have one column for each regressor (i.e., each column in design.txt). Hence, if you have added confound regressors, they would normally have a “0” in the relevant place in the contrasts file. If you want multiple contrasts, put additional rows into the contrasts file. So for example if you have one regressor of interest followed by 5 confounds, and you want to test both positive and negative associations with the regressor of interest, your confounds file would have two rows and would look like:
1 0 0 0 0 0
-1 0 0 0 0 0
Choose:
Then call fsl_glm. The command below uses the experiment-specific set of regressors and associated files described above (design.txt, contrasts.txt, subjects.txt), uses the maximum number of compute cores available (--pandora_njobs=-1), and uses the full set of confounds (--pandora_confs=all). Make sure that the PANDORA shell variable (set above) points to where you put the PANDORA files.
PANDORA_MODALITY=warpfield_jacobian
PANDORA_REPRESENTATION=ICA1K
fsl_glm -i $PANDORA/$PANDORA_MODALITY --demean --pandora_njobs=-1 \
--pandora_mode=$PANDORA_REPRESENTATION --pandora_confs=all \
-d design.txt -c contrasts.txt --pandora_subs=subjects.txt \
--out_t=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_T
The T-stats will be output in a single image file, with one timepoint per regression contrast. You can also add options to output associated P-values (in -log10(P) form), and F-statistic (across all contrasts), and a corresponding P for the F, by adding one or more of the following options:
--out_p=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_P
--out_f=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_F
--out_pf=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_PF
The outputs from sub-modalities that use NIFTI files (see above) can be viewed with fsleyes, and CIFTI outputs can be viewed using wb_view. To aid in interpretation of results you may also want to load up the population average image (for any given sub-modality), which can be found in $PANDORA/$PANDORA_MODALITY/stats
[will enter this information shortly]