Saturation mutagenesis of the TERT promoter

This example runs the saturation mutagenesis workflow on saturation mutagenesis data of the TERT promoter from Kircher et al. 2019. The same saturation mutagenesis library was used in four different experiments. We will use the experiments in HEK293T and in glioblastoma SF7996 (GBM) cells in this workflow to see differences between the two cell lines (conditions).

Prerequirements

This example depends on the following data and software:

Installation of MPRAflow

Please install conda, the MPRAflow environment and clone the actual MPRAflow master branch. You will find more help under Installation.

Assignment file

This file is a tab separated files that assigns variants to barcodes. We will create a new working folder and download the file into it

mkdir -p SatMut_TERT/data
cd SatMut_TERT/data
wget http https://github.com/shendurelab/MPRAflow/raw/master/examples/saturationMutagenesis/TERT.variants.txt.gz
cd ..

It is also possible to get using the workflow :ref:`Association for Saturation mutagenesis of TERT example`.

Count tables

We need the count tables of the count workflow. Please go to the Count for Saturation Mutagenesis of the TERT promoter and run it first. Afterwards copy the count tables into the data folder or use symbolic links:

ln -s ../Count_TERT/output/TERT-GBM/*/TERT-GBM_{1,2,3}_counts.tsv.gz data/
ln -s ../Count_TERT/output/TERT-HEK/*/TERT-HEK_{1,2,3}_counts.tsv.gz data/

Now the data folder should have the following files:

tree data
data
├── TERT-GBM_1_counts.tsv.gz
├── TERT-GBM_2_counts.tsv.gz
├── TERT-GBM_3_counts.tsv.gz
├── TERT-HEK_1_counts.tsv.gz
├── TERT-HEK_2_counts.tsv.gz
├── TERT-HEK_3_counts.tsv.gz
└── TERT.variants.txt.gz

0 directories, 7 files

MPRAflow

Now we are close to start MPRAflow and find out individual variant effects. But before we need to generate an environment.csv file to tell nextflow the conditions, replicates and the count files.

Create environment.csv

Our experiment file looks exactly like this:

Condition,Replicate,COUNTS
TERT-GBM,1,TERT-GBM_1_counts.tsv.gz
TERT-GBM,2,TERT-GBM_2_counts.tsv.gz
TERT-GBM,3,TERT-GBM_3_counts.tsv.gz
TERT-HEK,1,TERT-HEK_1_counts.tsv.gz
TERT-HEK,2,TERT-HEK_2_counts.tsv.gz
TERT-HEK,3,TERT-HEK_3_counts.tsv.gz

Save it into the SatMut_TERT/data folder under experiment.csv.

Run nextflow

Now we have everything at hand to run the saturation mutagenesis MPRAflow pipeline. Therefore we have to be in the cloned MPRAflow folder. But we will change the working and output directory to the SatMut_TERT folder. The MPRAflow saturation mutagenesis command is:

cd <path/to/MPRAflow>/MPRAflow
conda activate MPRAflow
nextflow run -resume -w <path/to/TERT>/SatMut_TERT/work  saturationMutagenesis.nf --experiment-file "<path/to/TERT>/SatMut_TERT/data/experiment.csv" --assignment "<path/to/TERT>/SatMut_TERT/data/TERT.variants.txt.gz" --dir "<path/to/TERT>/SatMut_TERT/data" --outdir "<path/to/TERT>/SatMut_TERT/output"

Note

Please check your conf/cluster.config file if it is correctly configured (e.g. with your SGE cluster commands).

If everything works fine the following 11 processes will run: calc_assign_variantMatrix calc_assign_variantMatrixWith1bpDel, fitModel, summarizeVariantMatrix, statsWithCoefficient, plotCorrelation, plotStatsWithCoefficient, fitModelCombined, combinedStats, statsWithCoefficientCombined, and plotStatsWithCoefficientCombined.

[3c/835d00] process > calc_assign_variantMatrix (1)           [100%] 6 of 6 ✔
[7a/887135] process > calc_assign_variantMatrixWith1bpDel (1) [100%] 6 of 6 ✔
[ca/a90b00] process > fitModel (8)                            [100%] 12 of 12 ✔
[67/3a3e8a] process > summarizeVariantMatrix (12)             [100%] 12 of 12 ✔
[56/846670] process > statsWithCoefficient (12)               [100%] 12 of 12 ✔
[74/466bfb] process > plotCorrelation (1)                     [100%] 12 of 12 ✔
[a5/baf1ef] process > plotStatsWithCoefficient (12)           [100%] 12 of 12 ✔
[ac/d38378] process > fitModelCombined (3)                    [100%] 4 of 4 ✔
[0b/600d8b] process > combinedStats (2)                       [100%] 4 of 4 ✔
[32/80f6a6] process > statsWithCoefficientCombined (2)        [100%] 4 of 4 ✔
[2f/817e76] process > plotStatsWithCoefficientCombined (1)    [100%] 4 of 4 ✔
Completed at: 07-Jan-2020 11:31:00
Duration    : 22m 41s
CPU hours   : 1.0
Succeeded   : 88

Results

All needed output files will be in the SatMut_TERT/output folder.