PANDORA UKBv1

Population Archive of Neuroimaging Data Organised for Rapid Analysis

Aslan Abivardi1,2 Matthew Webster1 Paul McCarthy1 Fidel Alfaro-Almagro1 Lav Radosavljevic3
Karla Miller1 Saad Jbabdi1 Mark Woolrich4 Weikang Going5 Christian Beckmann6 Lloyd Elliott7
Thomas Nichols2,1 Stephen Smith1

1 FMRIB, OxCIN, NDCN, Oxford University, UK

2 Department of Psychiatry, Psychotherapy and Psychosomatics, University of Zurich, Switzerland

3 Big Data Institute, Oxford University, UK

4 OHBA, OxCIN, Psychiatry, Oxford University, UK

5 School of Data Science, Fudan University, China

6 Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands

7 Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada

This is the central documentation for the new PANDORA data resource, which UK Biobank have now made available on RAP.

Queries on PANDORA: please email aslan.abivardi@ndcn.ox.ac.uk and stephen.smith@ndcn.ox.ac.uk
Preprint on PANDORA: https://www.medrxiv.org/content/10.64898/2026.01.05.26343425v1
Git link for python code to create and use PANDORA: https://git.fmrib.ox.ac.uk/PANDORA/pandora_proc
Which includes an example of how to use PANDORA in python (and explains the individual files): PANDORA_howtoload.py

Contents

Introduction

Using PANDORA on RAP

Sub-modalities available in PANDORA

Extracting non-imaging variables to regress into the PANDORA imaging data

Running regressions using PANDORA imaging data

Genetic associations - extracting data from a SNP to regress into PANDORA

Introduction

PANDORA UKBv1 is a massive archive of the UK Biobank brain imaging data, containing all subjects’ images output by the core UKB brain image processing pipeline. For each different output from each modality (e.g., FA from dMRI), the images are collated into one big subjectsXvoxels matrix. PANDORA includes a tool for easy voxelwise cross-subject regression against variables such as genetics and lifestyle factors. It also includes a highly efficient supervoxel version of the data - much smaller and faster to work with, but in general losing no signal or spatial detail, while also denoising.

PANDORA is easy and quick to use on the UK Biobank RAP (Research Analysis Platform). Downloading a local copy of one sub-modality’s data from the central data store takes a few minutes, and then running a regression across 80K subjects and 2M voxels takes between a few seconds and a couple of hours, depending on the modality, the number of subjects, and the size of the regression design matrix.

See PANDORA.pdf overview slides for more details and examples.

Using PANDORA on RAP

There are two ways to use PANDORA, and we recommend the first:

Using the pre-built imaging-friendly docker (easiest option - this includes a graphical desktop environment, and pre-installed analysis and visualisation tools).
Inside any other RAP compute instance that you have started (can be quicker to start up, but will not include all the tools that come with the docker).

Option 1: using the brain imaging docker

We recommend using the imaging-friendly Docker for PANDORA analyses and visualisation, so you should start by getting that running inside a sufficiently powerful RAP compute instance. This takes about 10 minutes to install. See setup instructions at: Imaging-friendly docker for UK Biobank RAP

Then download PANDORA files. The globals.tar and subjectIDs_union.sample files are needed for all PANDORA analyses, and then you can select one of more imaging submodalities to download (e.g., T1_VBM is included here):
mkdir PANDORA ; cd PANDORA ; PANDORA=`pwd`
tar xvf /mnt/project/Bulk/Brain\ MRI/PANDORA/globals.tar
cp /mnt/project/Bulk/Brain\ MRI/PANDORA/subjectIDs_union.sample globals
tar xvf /mnt/project/Bulk/Brain\ MRI/PANDORA/T1_VBM.tar
cd ..

Option 2: using your own RAP compute instance

Alternatively, if you want to get going faster and don’t need image visualisation, you can just install the PANDORA regression tool into any running RAP compute instance unix shell:

conda create -c https://fsl.fmrib.ox.ac.uk/fsldownloads/fslconda/public/ \

-c conda-forge -y -p ./fsl fsl-melodic blas=*=*mkl

source activate ./fsl ; export FSLDIR=$(pwd)/fsl/ ; source $FSLDIR/etc/fslconf/fsl.sh

You will now need to install PANDORA data files from the central UKB data store on RAP. The globals.tar and subjectIDs_union.sample files are needed for all PANDORA analyses, and then you can select one of more imaging submodalities to download (e.g., T1_VBM is included here):
mkdir PANDORA ; cd PANDORA ; PANDORA=`pwd`
dx download Bulk/Brain\ MRI/PANDORA/globals.tar
tar xvf globals.tar
cd globals
dx download Bulk/Brain\ MRI/PANDORA/subjectIDs_union.sample
cd ..
dx download Bulk/Brain\ MRI/PANDORA/T1_VBM.tar
tar xvf T1_VBM.tar ; rm *.tar
cd ..

Sub-modalities available in PANDORA

Key: 1mm voxels - 2mm voxels - CIFTI

T1	T1 (normalised image intensities), T1_VBM (local GM volume/density), warpfield_jacobian (expansion/compression from warp of T1 to standard space)
T2_FLAIR	T2_FLAIR (normalised image intensities), T2_FLAIR_lesions* (white matter hyperintensities)
SWI	SWI_T2star, SWI_QSM
dMRI	DTI_FA, DTI_MO, DTI_MD, DTI_L1-3, NODDI_ICVF, NODDI_ISOVF, NODDI_OD, dMRI_tracts (probabilistic tractography summed across multiple tracts)
tfMRI	tfMRI_zstat1 (shapes), tfMRI_zstat2 (faces), tfMRI_zstat5 (faces-shapes), tfMRI_cope1 (shapes), tfMRI_cope2 (faces), tfMRI_cope5 (faces-shapes)
rfMRI	{ RSN_d25/1, … , RSN_d25/25 }, { RSN_d50/1, … , RSN_d50/50 }

* T2_FLAIR_lesions is an unusual modality in that the original data is binary. Hence regression statistics should be interpreted with caution, as the residuals may be highly non-Gaussian.

Extracting non-imaging variables to regress into the PANDORA imaging data

These instructions describe one method of extracting non-imaging UK Biobank data fields for use in an analysis:

Navigate to your RAP project home page
Open the Cohort Browser by clicking on the .dataset file in your project workspace.
If you want to restrict the data to a sub-set of participants, click on Add Filter, and search for the field(s) you wish to filter by. For example, you can select participants with imaging data by creating a filter where Volume of grey matter (instance 2) is between 0 and 1000000000 (this field will be nan for participants without imaging data).
Click on the DATA PREVIEW tab, then click the + Add Column button to add the data fields you are interested in.
When you are happy, click the floppy disk icon at the top right to save your cohort. The cohort will be saved to your project workspace.
Now click on the TOOLS menu at the top, and navigate to -> Tools Library -> Table exporter (note this might not appear on page one of the tools list), and click the Run button at the top.
Set the Output to field to the desired RAP project, select a location in your project workspace to save the output, then click Next.
Set the Dataset or Cohort or Dashboard to the cohort you created above. Leave File containing Field Names blank.
Under OPTIONS set the output file name and format, and set Coding Option to RAW, and Header Style to UKB-FORMAT. Leave the other options at their default settings.
Click Start Analysis at the top right and then Launch Analysis.
Wait for the job to complete. Once it's done, you will have a CSV file containing the selected data in your project workspace, which you can download within an analysis session.
Extract the relevant parts of the CSV. In a compute instance node, first, select the columns needed - for example, column 1 is subjectIDs and column 5 is a variable of interest (e.g. blood pressure), and we exclude rows (subjects) with missing data:
mkdir BP; cd BP
cat Cohort1.csv | tail -n +2 | awk -F , '{print $1 "x" $5 "x"}' | grep -v xx \
| sed 's/x/ /g' > BP.txt
Now split this into a subject list, the regressor of interest, and a contrast file:
cat BP.txt | awk '{print $1}' > subjects.txt
cat BP.txt | awk '{print $2}' > design.txt
echo "1" > contrasts.txt

The contrasts file should have one column for each regressor (i.e., each column in design.txt). Hence, if you have added confound regressors, they would normally have a “0” in the relevant place in the contrasts file. If you want multiple contrasts, put additional rows into the contrasts file. So for example if you have one regressor of interest followed by 5 confounds, and you want to test both positive and negative associations with the regressor of interest, your confounds file would have two rows and would look like:
1 0 0 0 0 0
-1 0 0 0 0 0

Running regressions using PANDORA imaging data

Choose:

The PANDORA sub-modality of interest, e.g. T1 or warpfield_jacobian (see the list above)
The PANDORA representation: voxel or ICA1K or ICA10K

Then call fsl_glm. The command below uses the experiment-specific set of regressors and associated files described above (design.txt, contrasts.txt, subjects.txt), uses the maximum number of compute cores available (--pandora_njobs=-1), and uses the full set of confounds (--pandora_confs=all). Make sure that the PANDORA shell variable (set above) points to where you put the PANDORA files.

PANDORA_MODALITY=warpfield_jacobian

PANDORA_REPRESENTATION=ICA1K

fsl_glm -i $PANDORA/$PANDORA_MODALITY --demean --pandora_njobs=-1 \

--pandora_mode=$PANDORA_REPRESENTATION --pandora_confs=all \

-d design.txt -c contrasts.txt --pandora_subs=subjects.txt \

--out_t=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_T

The T-stats will be output in a single image file, with one timepoint per regression contrast. You can also add options to output associated P-values (in -log10(P) form), and F-statistic (across all contrasts), and a corresponding P for the F, by adding one or more of the following options:

--out_p=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_P

--out_f=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_F

--out_pf=${PANDORA_MODALITY}_${PANDORA_REPRESENTATION}_PF

The outputs from sub-modalities that use NIFTI files (see above) can be viewed with fsleyes, and CIFTI outputs can be viewed using wb_view. To aid in interpretation of results you may also want to load up the population average image (for any given sub-modality), which can be found in $PANDORA/$PANDORA_MODALITY/stats

Genetic associations - extracting data from a SNP to regress into PANDORA

[will enter this information shortly]