.. |right_arrow| unicode:: U+21D2 Using FUNPACK on the UK Biobank Research Analysis Platform (RAP) ================================================================ If you are working with UK Biobank data, there is a good chance that you are working on the `Research Analysis Platform (RAP) `_. To use FUNPACK on the RAP, you need to: 1. :ref:`Prepare your input data `. 2. :ref:`Install and use FUNPACK in a RAP session `. .. _rap_prepare: Prepare your input data ----------------------- In order to use FUNPACK, your input data needs to be stored in a ``.csv`` (or ``.tsv``) file. There are several ways of creating such a file on the RAP - a couple of options are described here. If you are planning to work on a specific group of participants, your first step should be to use the RAP Cohort Browser. For example, you can follow these steps to define a cohort of participants with brain imaging data: 1. Open the **Cohort Browser** by clicking on the appropriate ``.dataset`` file in your project workspace. 2. Add a filter to to restrict the cohort to participants with imaging data: - Click on **+ Add Filter** at the top. - Search for *Volume of grey matter (instance 2)*. - Click on **Add Cohort Filter**. - Set the filter to **IS NOT NULL**. - Click **Apply Filter**. 3. Click the floppy disk icon at the top right to save your cohort. Give your cohort a meaningful name, e.g. ``brain_imaging_cohort`` - it will be saved to your project workspace under that name. Now you can create a CSV file containing data for your cohort in one of two ways: - :ref:`rap_table_exporter`. - |dx_extract_dataset|_. .. _rap_table_exporter: Using the RAP Table Exporter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can use the *RAP Table Exporter* tool to create your CSV file. Before using the Table Exporter, you need to create a text file which contains the list of all data fields that you wish to extract. To do this, start a JupyterLab terminal via the *Tools* |right_arrow| *JupyterLab* menu item in the RAP web interface. Once your JupyterLab session has started, click on the *Terminal* icon to open a new terminal, and run the following command, replacing ``brain_imaging_cohort`` with the name of your cohort or ``.dataset`` file:: dx extract_dataset brain_imaging_cohort \ --entities participant \ --list-fields | \ cut -f 1 | \ cut -d . -f 2 > datafields.txt This will produce a file called ``datafields.txt`` which contains a list of all data fields available under your UK Biobank application. Save this file to your RAP project workspace:: dx upload datafields.txt .. note:: It is possible to create a ``datafields.txt`` file by hand, but getting the format correct can be difficult. If you are only interested in a sub-set of data fields, a good option is to use the above command to generate the full list, then to edit the ``datafields.txt`` file, removing the data fields that you are not interested in. Now you are ready to run the Table Exporter: 1. Click on *TOOLS* |right_arrow| *Tools Library* at the top, find and open the `Table exporter `_, then click the **Run** button at the top. 2. Set the *Output to* field to your RAP project, select a location in your project workspace to save the output, then click **Next**. 3. Change the following settings: - Set the *Dataset or Cohort or Dashboard* to your cohort (e.g. ``brain_imaging_cohort``). Or, if you want to extract data for all participants, set it to your ``.dataset`` file. - Set *File containing Field Names* to the ``datafields.txt`` file you created above. - Under **OPTIONS**: - Enter an output file name in the *Output Prefix* section. - Set *Coding Option* to **RAW** - Leave all of the other options at their default settings. - Under **ADVANCED OPTIONS**, enter ``participant`` in the *Entity* section. 4. Click **Start Analysis** at the top right and then **Launch Analysis**. 5. Wait for the job to complete. Once it's done, you will have a CSV file containing the selected data in your project workspace, which you can pass to ``fmrib_unpack``. .. |dx_extract_dataset| replace:: Using ``dx extract_dataset`` .. _dx_extract_dataset: Using ``dx extract_dataset`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``dx extract_dataset`` command can be used directly to extract data into a CSV file. However, it can often fail when extracting large amounts of data - the *RAP Table Exporter* is a more reliable option. But it is described here for posterity. To use the ``dx extract_dataset`` command-line tool to extract data for your cohort, start a JupyterLab session via *Tools* |right_arrow| *JupyterLab* and, when it has started, click on the *Terminal* icon to open a new terminal. First, run this command to generate a list of all UK Biobank data field IDs that are available to you:: dx extract_dataset brain_imaging_cohort \ --entities participant \ --list-fields | \ cut -f 1 > datafields.txt .. note:: Note that the format of the ``datafields.txt`` file here is slightly different to that required by the RAP Table Exporter above. Then you can run ``dx extract_dataset`` again to create a CSV file containing all of those data fields for your cohort (or ``.dataset``):: dx extract_dataset brain_imaging_cohort \ --entities participant \ --fields-file datafields.txt \ --out brain_imaging_cohort_data.csv This will produce a file called ``brain_imaging_cohort_data.csv``, which you can then pass to ``fmrib_unpack``. You may wish to save these files to your RAP project workspace, so you can use them in multiple sessions:: dx upload datafields.txt brain_imaging_cohort_data.csv .. _rap_installation: Install and use FUNPACK ----------------------- The easiest way to use FUNPACK on the RAP is via a JupyterLab Terminal. You can start a JupyterLab session via the *Tools* |right_arrow| *JupyterLab* menu item in the RAP interface and, when it has started, click on the *Terminal* icon to open a new terminal. Make sure that you choose an instance type with enough CPU/RAM to run FUNPACK - you may need up to 60GB of RAM if you are running FUNPACK on a full dataset (e.g. 500k participants, and 25k data fields). The RAP JupyterLab environment has ``conda`` pre-installed, so you can use it to install FUNPACK by running this command:: conda install -y -c conda-forge fmrib-unpack Once this command has finished, you should be able to use the ``fmrib_unpack`` command on the CSV file you created above.