.. |right_arrow| unicode:: U+21D2
Using FUNPACK on the UK Biobank Research Analysis Platform (RAP)
================================================================
If you are working with UK Biobank data, there is a good chance that you are
working on the `Research Analysis Platform (RAP)
`_.
To use FUNPACK on the RAP, you need to:
1. :ref:`Prepare your input data `.
2. :ref:`Install and use FUNPACK in a RAP session `.
.. _rap_prepare:
Prepare your input data
-----------------------
In order to use FUNPACK, your input data needs to be stored in a ``.csv`` (or
``.tsv``) file. There are several ways of creating such a file on the RAP - a
couple of options are described here.
If you are planning to work on a specific group of participants, your first
step should be to use the RAP Cohort Browser. For example, you can follow
these steps to define a cohort of participants with brain imaging data:
1. Open the **Cohort Browser** by clicking on the appropriate ``.dataset``
file in your project workspace.
2. Add a filter to to restrict the cohort to participants with imaging data:
- Click on **+ Add Filter** at the top.
- Search for *Volume of grey matter (instance 2)*.
- Click on **Add Cohort Filter**.
- Set the filter to **IS NOT NULL**.
- Click **Apply Filter**.
3. Click the floppy disk icon at the top right to save your cohort. Give your
cohort a meaningful name, e.g. ``brain_imaging_cohort`` - it will be saved
to your project workspace under that name.
Now you can create a CSV file containing data for your cohort in one of two
ways:
- :ref:`rap_table_exporter`.
- |dx_extract_dataset|_.
.. _rap_table_exporter:
Using the RAP Table Exporter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can use the *RAP Table Exporter* tool to create your CSV file. Before
using the Table Exporter, you need to create a text file which contains the
list of all data fields that you wish to extract. To do this, start a
JupyterLab terminal via the *Tools* |right_arrow| *JupyterLab* menu item in
the RAP web interface.
Once your JupyterLab session has started, click on the *Terminal* icon to open
a new terminal, and run the following command, replacing
``brain_imaging_cohort`` with the name of your cohort or ``.dataset`` file::
dx extract_dataset brain_imaging_cohort \
--entities participant \
--list-fields | \
cut -f 1 | \
cut -d . -f 2 > datafields.txt
This will produce a file called ``datafields.txt`` which contains a list of
all data fields available under your UK Biobank application. Save this file to
your RAP project workspace::
dx upload datafields.txt
.. note:: It is possible to create a ``datafields.txt`` file by hand, but
getting the format correct can be difficult. If you are only
interested in a sub-set of data fields, a good option is to use the
above command to generate the full list, then to edit the
``datafields.txt`` file, removing the data fields that you are not
interested in.
Now you are ready to run the Table Exporter:
1. Click on *TOOLS* |right_arrow| *Tools Library* at the top, find and open
the `Table exporter
`_, then click
the **Run** button at the top.
2. Set the *Output to* field to your RAP project, select a location in your
project workspace to save the output, then click **Next**.
3. Change the following settings:
- Set the *Dataset or Cohort or Dashboard* to your cohort
(e.g. ``brain_imaging_cohort``). Or, if you want to extract data for all
participants, set it to your ``.dataset`` file.
- Set *File containing Field Names* to the ``datafields.txt`` file you
created above.
- Under **OPTIONS**:
- Enter an output file name in the *Output Prefix* section.
- Set *Coding Option* to **RAW**
- Leave all of the other options at their default settings.
- Under **ADVANCED OPTIONS**, enter ``participant`` in the *Entity*
section.
4. Click **Start Analysis** at the top right and then **Launch Analysis**.
5. Wait for the job to complete. Once it's done, you will have a CSV file
containing the selected data in your project workspace, which you can
pass to ``fmrib_unpack``.
.. |dx_extract_dataset| replace:: Using ``dx extract_dataset``
.. _dx_extract_dataset:
Using ``dx extract_dataset``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``dx extract_dataset`` command can be used directly to extract data into a
CSV file. However, it can often fail when extracting large amounts of data -
the *RAP Table Exporter* is a more reliable option. But it is described here
for posterity.
To use the ``dx extract_dataset`` command-line tool to extract data for your
cohort, start a JupyterLab session via *Tools* |right_arrow| *JupyterLab* and,
when it has started, click on the *Terminal* icon to open a new terminal.
First, run this command to generate a list of all UK Biobank data field IDs
that are available to you::
dx extract_dataset brain_imaging_cohort \
--entities participant \
--list-fields | \
cut -f 1 > datafields.txt
.. note:: Note that the format of the ``datafields.txt`` file here is slightly
different to that required by the RAP Table Exporter above.
Then you can run ``dx extract_dataset`` again to create a CSV file containing
all of those data fields for your cohort (or ``.dataset``)::
dx extract_dataset brain_imaging_cohort \
--entities participant \
--fields-file datafields.txt \
--out brain_imaging_cohort_data.csv
This will produce a file called ``brain_imaging_cohort_data.csv``, which you
can then pass to ``fmrib_unpack``.
You may wish to save these files to your RAP project workspace, so you can use
them in multiple sessions::
dx upload datafields.txt brain_imaging_cohort_data.csv
.. _rap_installation:
Install and use FUNPACK
-----------------------
The easiest way to use FUNPACK on the RAP is via a JupyterLab Terminal. You
can start a JupyterLab session via the *Tools* |right_arrow| *JupyterLab* menu
item in the RAP interface and, when it has started, click on the *Terminal*
icon to open a new terminal.
Make sure that you choose an instance type with enough CPU/RAM to run
FUNPACK - you may need up to 60GB of RAM if you are running FUNPACK on a full
dataset (e.g. 500k participants, and 25k data fields).
The RAP JupyterLab environment has ``conda`` pre-installed, so you can use it
to install FUNPACK by running this command::
conda install -y -c conda-forge fmrib-unpack
Once this command has finished, you should be able to use the ``fmrib_unpack``
command on the CSV file you created above.