.. |right_arrow| unicode:: U+21D2

Using FUNPACK on the UK Biobank Research Analysis Platform (RAP)
================================================================


If you are working with UK Biobank data, there is a good chance that you are
working on the `Research Analysis Platform (RAP)
<https://www.ukbiobank.ac.uk/use-our-data/research-analysis-platform/>`_.
To use FUNPACK on the RAP, you need to:


1. :ref:`Prepare your input data <rap_prepare>`.
2. :ref:`Install and use FUNPACK in a RAP session <rap_installation>`.


.. _rap_prepare:

Prepare your input data
-----------------------


In order to use FUNPACK, your input data needs to be stored in a ``.csv`` (or
``.tsv``) file. There are several ways of creating such a file on the RAP - a
couple of options are described here.


If you are planning to work on a specific group of participants, your first
step should be to use the RAP Cohort Browser. For example, you can follow
these steps to define a cohort of participants with brain imaging data:


1. Open the **Cohort Browser** by clicking on the appropriate ``.dataset``
   file in your project workspace.

2. Add a filter to to restrict the cohort to participants with imaging data:

    - Click on **+ Add Filter** at the top.
    - Search for *Volume of grey matter (instance 2)*.
    - Click on **Add Cohort Filter**.
    - Set the filter to **IS NOT NULL**.
    - Click **Apply Filter**.

3. Click the floppy disk icon at the top right to save your cohort. Give your
   cohort a meaningful name, e.g. ``brain_imaging_cohort`` - it will be saved
   to your project workspace under that name.


Now you can create a CSV file containing data for your cohort in one of two
ways:


 - :ref:`rap_table_exporter`.
 - |dx_extract_dataset|_.


.. _rap_table_exporter:

Using the RAP Table Exporter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^


You can use the *RAP Table Exporter* tool to create your CSV file. Before
using the Table Exporter, you need to create a text file which contains the
list of all data fields that you wish to extract. To do this, start a
JupyterLab terminal via the *Tools* |right_arrow| *JupyterLab* menu item in
the RAP web interface.


Once your JupyterLab session has started, click on the *Terminal* icon to open
a new terminal, and run the following command, replacing
``brain_imaging_cohort`` with the name of your cohort or ``.dataset`` file::

    dx extract_dataset brain_imaging_cohort \
        --entities participant              \
        --list-fields                     | \
      cut -f 1                            | \
      cut -d . -f 2 > datafields.txt


This will produce a file called ``datafields.txt`` which contains a list of
all data fields available under your UK Biobank application. Save this file to
your RAP project workspace::

    dx upload datafields.txt


.. note:: It is possible to create a ``datafields.txt`` file by hand, but
          getting the format correct can be difficult. If you are only
          interested in a sub-set of data fields, a good option is to use the
          above command to generate the full list, then to edit the
          ``datafields.txt`` file, removing the data fields that you are not
          interested in.


Now you are ready to run the Table Exporter:

1. Click on *TOOLS* |right_arrow| *Tools Library* at the top, find and open
   the `Table exporter
   <https://ukbiobank.dnanexus.com/panx/tool/app/table-exporter>`_, then click
   the **Run** button at the top.

2. Set the *Output to* field to your RAP project, select a location in your
   project workspace to save the output, then click **Next**.

3. Change the following settings:

    - Set the *Dataset or Cohort or Dashboard* to your cohort
      (e.g. ``brain_imaging_cohort``). Or, if you want to extract data for all
      participants, set it to your ``.dataset`` file.

    - Set *File containing Field Names* to the ``datafields.txt`` file you
      created above.

    - Under **OPTIONS**:

       - Enter an output file name in the *Output Prefix* section.
       - Set *Coding Option* to **RAW**
       - Leave all of the other options at their default settings.

    - Under **ADVANCED OPTIONS**, enter ``participant`` in the *Entity*
      section.

4. Click **Start Analysis** at the top right and then **Launch Analysis**.

5. Wait for the job to complete. Once it's done, you will have a CSV file
   containing the selected data in your project workspace, which you can
   pass to ``fmrib_unpack``.


.. |dx_extract_dataset| replace:: Using ``dx extract_dataset``
.. _dx_extract_dataset:

Using ``dx extract_dataset``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^


The ``dx extract_dataset`` command can be used directly to extract data into a
CSV file. However, it can often fail when extracting large amounts of data -
the *RAP Table Exporter* is a more reliable option. But it is described here
for posterity.


To use the ``dx extract_dataset`` command-line tool to extract data for your
cohort, start a JupyterLab session via *Tools* |right_arrow| *JupyterLab* and,
when it has started, click on the *Terminal* icon to open a new terminal.


First, run this command to generate a list of all UK Biobank data field IDs
that are available to you::

    dx extract_dataset brain_imaging_cohort \
        --entities participant              \
        --list-fields                     | \
      cut -f 1  > datafields.txt


.. note:: Note that the format of the ``datafields.txt`` file here is slightly
          different to that required by the RAP Table Exporter above.


Then you can run ``dx extract_dataset`` again to create a CSV file containing
all of those data fields for your cohort (or ``.dataset``)::

    dx extract_dataset brain_imaging_cohort \
        --entities participant              \
        --fields-file datafields.txt        \
        --out brain_imaging_cohort_data.csv


This will produce a file called ``brain_imaging_cohort_data.csv``, which you
can then pass to ``fmrib_unpack``.


You may wish to save these files to your RAP project workspace, so you can use
them in multiple sessions::

    dx upload datafields.txt brain_imaging_cohort_data.csv


.. _rap_installation:

Install and use FUNPACK
-----------------------


The easiest way to use FUNPACK on the RAP is via a JupyterLab Terminal. You
can start a JupyterLab session via the *Tools* |right_arrow| *JupyterLab* menu
item in the RAP interface and, when it has started, click on the *Terminal*
icon to open a new terminal.

Make sure that you choose an instance type with enough CPU/RAM to run
FUNPACK - you may need up to 60GB of RAM if you are running FUNPACK on a full
dataset (e.g. 500k participants, and 25k data fields).


The RAP JupyterLab environment has ``conda`` pre-installed, so you can use it
to install FUNPACK by running this command::

    conda install -y -c conda-forge fmrib-unpack


Once this command has finished, you should be able to use the ``fmrib_unpack``
command on the CSV file you created above.