Using FUNPACK on the UK Biobank Research Analysis Platform (RAP)

If you are working with UK Biobank data, there is a good chance that you are working on the Research Analysis Platform (RAP). To use FUNPACK on the RAP, you need to:

Prepare your input data.
Install and use FUNPACK in a RAP session.

Prepare your input data

In order to use FUNPACK, your input data needs to be stored in a .csv (or .tsv) file. There are several ways of creating such a file on the RAP - a couple of options are described here.

If you are planning to work on a specific group of participants, your first step should be to use the RAP Cohort Browser. For example, you can follow these steps to define a cohort of participants with brain imaging data:

Open the Cohort Browser by clicking on the appropriate .dataset file in your project workspace.
Add a filter to to restrict the cohort to participants with imaging data:
- Click on + Add Filter at the top.
- Search for Volume of grey matter (instance 2).
- Click on Add Cohort Filter.
- Set the filter to IS NOT NULL.
- Click Apply Filter.
Click the floppy disk icon at the top right to save your cohort. Give your cohort a meaningful name, e.g. brain_imaging_cohort - it will be saved to your project workspace under that name.

Now you can create a CSV file containing data for your cohort in one of two ways:

Using the RAP Table Exporter.

Using dx extract_dataset.

Using the RAP Table Exporter

You can use the RAP Table Exporter tool to create your CSV file. Before using the Table Exporter, you need to create a text file which contains the list of all data-fields that you wish to extract. To do this, start a JupyterLab terminal via the Tools ⇒ JupyterLab menu item in the RAP web interface.

Once your JupyterLab session has started, click on the Terminal icon to open a new terminal, and run the following command, replacing brain_imaging_cohort with the name of your cohort or .dataset file:

dx extract_dataset brain_imaging_cohort \
    --entities participant              \
    --list-fields                     | \
  cut -f 1                            | \
  cut -d . -f 2 > datafields.txt

This will produce a file called datafields.txt which contains a list of all data-fields available under your UK Biobank application. Save this file to your RAP project workspace:

dx upload datafields.txt

Note

It is possible to create a datafields.txt file by hand, but getting the format correct can be difficult. If you are only interested in a sub-set of data-fields, a good option is to use the above command to generate the full list, then to edit the datafields.txt file, removing the data-fields that you are not interested in.

Now you are ready to run the Table Exporter:

Click on TOOLS ⇒ Tools Library at the top, find and open the Table exporter, then click the Run button at the top.
Set the Output to field to your RAP project, select a location in your project workspace to save the output, then click Next.
Change the following settings:
- Set the Dataset or Cohort or Dashboard to your cohort (e.g. brain_imaging_cohort). Or, if you want to extract data for all participants, set it to your .dataset file.
- Set File containing Field Names to the datafields.txt file you created above.
- Under OPTIONS:
  Enter an output file name in the Output Prefix section.
  
  Set Coding Option to RAW
  
  Leave all of the other options at their default settings.
- Under ADVANCED OPTIONS, enter participant in the Entity section.
Click Start Analysis at the top right and then Launch Analysis.
Wait for the job to complete. Once it’s done, you will have a CSV file containing the selected data in your project workspace, which you can pass to fmrib_unpack.

Using `dx extract_dataset`

The dx extract_dataset command can be used directly to extract data into a CSV file. However, it can often fail when extracting large amounts of data - the RAP Table Exporter is a more reliable option. But it is described here for posterity.

To use the dx extract_dataset command-line tool to extract data for your cohort, start a JupyterLab session via Tools ⇒ JupyterLab and, when it has started, click on the Terminal icon to open a new terminal.

First, run this command to generate a list of all UK Biobank data-field IDs that are available to you:

dx extract_dataset brain_imaging_cohort \
    --entities participant              \
    --list-fields                     | \
  cut -f 1  > datafields.txt

Note

Note that the format of the datafields.txt file here is slightly different to that required by the RAP Table Exporter above.

Then you can run dx extract_dataset again to create a CSV file containing the data for all of those data-fields for your cohort (or .dataset):

dx extract_dataset brain_imaging_cohort \
    --entities participant              \
    --fields-file datafields.txt        \
    --out brain_imaging_cohort_data.csv

This will produce a file called brain_imaging_cohort_data.csv, which you can then pass to fmrib_unpack.

You may wish to save these files to your RAP project workspace, so you can use them in multiple sessions:

dx upload datafields.txt brain_imaging_cohort_data.csv

Install and use FUNPACK

The easiest way to use FUNPACK on the RAP is via a JupyterLab Terminal. You can start a JupyterLab session via the Tools ⇒ JupyterLab menu item in the RAP interface and, when it has started, click on the Terminal icon to open a new terminal.

Make sure that you choose an instance type with enough CPU/RAM to run FUNPACK - you may need up to 60GB of RAM if you are running FUNPACK on a full dataset (e.g. 500k participants, and 25k data fields).

The RAP JupyterLab environment has conda pre-installed, so you can use it to install FUNPACK by running this command:

conda install -y -c conda-forge fmrib-unpack

Once this command has finished, you should be able to use the fmrib_unpack command on the CSV file you created above.

Using FUNPACK on the UK Biobank Research Analysis Platform (RAP)

Prepare your input data

Using the RAP Table Exporter

Using dx extract_dataset

Install and use FUNPACK

Using `dx extract_dataset`