Association for Saturation mutagenesis of TERT example

This example runs the association saturation mutagenesis workflow of the TERT promoter from Kircher et al. 2019. The same saturation mutagenesis library was used in four different experiments.

Prerequirements

This example depends on the following data and software:

Installation of MPRAflow

Please install conda, the MPRAflow environment and clone the actual MPRAflow master branch. You will find more help under Installation.

Reference file

To know where to map to we have to use a reference sequence. Here we will download the used reference sequence of TERT. It was generated by sanger sequencing of the original template (before error-prone PCR).

mkdir -p satMut_assoc/data
cd satMut_assoc/data
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3604nnn/GSM3604154/suppl/GSM3604154%5FTERT%2Efa%2Egz
gzip -dc GSM3604154_TERT.fa.gz > TERT.fa
cd ..

Reads

There is one set of association sequencing for this data, which contains a forward, reverse, and index (barcode) reads. Forward and reverse contains the reference sequence with mutations. Th ebarcode read is the associated barcode with the sequence. These data must be downloaded. All data is publically available on the short read archive (SRA). We will use SRA-toolkit to obtain the data.

Note

You need 2 GB disk space to download the data!

conda install sra-tools
cd satMut_assoc/data
prefetch SRR8646911
fastq-dump --gzip --split-files SRR8646911
cd ..

Note

Please be sure that all files are downloaded completely without errors! Depending on your internet connection this can take a while.

With

tree data

the folder should look like this:

data/
├── SRR8646911_1.fastq.gz
├── SRR8646911_2.fastq.gz
├── SRR8646911_3.fastq.gz
└── TERT.fa

MPRAflow

Now we are ready to run MPRAflow for mapping reads, calling variants and associate them with barcodes.

Run nextflow

Now we have everything at hand to run the association saturation mutagenesis MPRAflow pipeline. Therefore we have to be in the cloned MPRAflow folder. But we will change the working and output directory to the satMut_assoc folder. The MPRAflow association saturation mutagenesis command is:

cd <path/to/MPRAflow>/MPRAflow
conda activate MPRAflow
nextflow run -resume -w <abolute/path>/satMut_assoc/work association_saturationMutagenesis.nf  --fastq-insert <abolute/path>/satMut_assoc/data/SRR8646911_1.fastq.gz --fastq-insertPE <abolute/path>/satMut_assoc/data/SRR8646911_2.fastq.gz --fastq-bc <abolute/path>/satMut_assoc/data/SRR8646911_3.fastq.gz  --design <abolute/path>/satMut_assoc/data/TERT.fa --name TERT --outdir <abolute/path>/satMut_assoc/output --split 200000 --bc-length 20

Note

Please check your conf/cluster.config file if it is correctly configured (e.g. with your SGE cluster commands).

If everything works fine the following 10 processes will run: clean_design create_BWA_ref, get_name, create_BAM, collect_chunks, PE_mapping, get_count, extract_reads, call_variants, and combine_variants.

[e4/cf3353] process > clean_design (count)      [100%] 1 of 1, cached: 1 ✔
[70/31c1b2] process > create_BWA_ref (make ref) [100%] 1 of 1, cached: 1 ✔
[83/a75010] process > get_name                  [100%] 1 of 1, cached: 1 ✔
[4e/8bd490] process > create_BAM (28)           [100%] 28 of 28, cached: 28 ✔
[0b/ff7cd0] process > collect_chunks            [100%] 1 of 1, cached: 1 ✔
[7c/64c374] process > PE_mapping (align)        [100%] 1 of 1, cached: 1 ✔
[f3/fa5fed] process > get_count                 [100%] 1 of 1, cached: 1 ✔
[a5/d3aac2] process > extract_reads             [100%] 1 of 1, cached: 1 ✔
[67/70c6e0] process > call_variants (1024)      [100%] 1024 of 1024, cached: 1024 ✔
[85/2cb7af] process > combine_variants          [100%] 1 of 1 ✔
Completed at: 26-März-2021 08:43:10
Duration    : 22m 53s
CPU hours   : 43.0 (100% cached)
Succeeded   : 1
Cached      : 1'059

Results

All needed output files will be in the satMut_assoc/output/TERT/ folder.