Association for Saturation mutagenesis of TERT example¶
This example runs the association saturation mutagenesis workflow of the TERT promoter from Kircher et al. 2019. The same saturation mutagenesis library was used in four different experiments.
Prerequirements¶
This example depends on the following data and software:
Installation of MPRAflow¶
Please install conda, the MPRAflow environment and clone the actual MPRAflow master branch. You will find more help under Installation.
Reference file¶
To know where to map to we have to use a reference sequence. Here we will download the used reference sequence of TERT. It was generated by sanger sequencing of the original template (before error-prone PCR).
mkdir -p satMut_assoc/data
cd satMut_assoc/data
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3604nnn/GSM3604154/suppl/GSM3604154%5FTERT%2Efa%2Egz
gzip -dc GSM3604154_TERT.fa.gz > TERT.fa
cd ..
Reads¶
There is one set of association sequencing for this data, which contains a forward, reverse, and index (barcode) reads. Forward and reverse contains the reference sequence with mutations. Th ebarcode read is the associated barcode with the sequence. These data must be downloaded. All data is publically available on the short read archive (SRA). We will use SRA-toolkit to obtain the data.
Note
You need 2 GB disk space to download the data!
conda install sra-tools
cd satMut_assoc/data
prefetch SRR8646911
fastq-dump --gzip --split-files SRR8646911
cd ..
Note
Please be sure that all files are downloaded completely without errors! Depending on your internet connection this can take a while.
With
tree data
the folder should look like this:
data/
├── SRR8646911_1.fastq.gz
├── SRR8646911_2.fastq.gz
├── SRR8646911_3.fastq.gz
└── TERT.fa
MPRAflow¶
Now we are ready to run MPRAflow for mapping reads, calling variants and associate them with barcodes.
Run nextflow¶
Now we have everything at hand to run the association saturation mutagenesis MPRAflow pipeline. Therefore we have to be in the cloned MPRAflow folder. But we will change the working and output directory to the satMut_assoc
folder. The MPRAflow association saturation mutagenesis command is:
cd <path/to/MPRAflow>/MPRAflow
conda activate MPRAflow
nextflow run -resume -w <abolute/path>/satMut_assoc/work association_saturationMutagenesis.nf --fastq-insert <abolute/path>/satMut_assoc/data/SRR8646911_1.fastq.gz --fastq-insertPE <abolute/path>/satMut_assoc/data/SRR8646911_2.fastq.gz --fastq-bc <abolute/path>/satMut_assoc/data/SRR8646911_3.fastq.gz --design <abolute/path>/satMut_assoc/data/TERT.fa --name TERT --outdir <abolute/path>/satMut_assoc/output --split 200000 --bc-length 20
Note
Please check your conf/cluster.config
file if it is correctly configured (e.g. with your SGE cluster commands).
If everything works fine the following 10 processes will run: clean_design
create_BWA_ref
, get_name
, create_BAM
, collect_chunks
, PE_mapping
, get_count
, extract_reads
, call_variants
, and combine_variants
.
[e4/cf3353] process > clean_design (count) [100%] 1 of 1, cached: 1 ✔
[70/31c1b2] process > create_BWA_ref (make ref) [100%] 1 of 1, cached: 1 ✔
[83/a75010] process > get_name [100%] 1 of 1, cached: 1 ✔
[4e/8bd490] process > create_BAM (28) [100%] 28 of 28, cached: 28 ✔
[0b/ff7cd0] process > collect_chunks [100%] 1 of 1, cached: 1 ✔
[7c/64c374] process > PE_mapping (align) [100%] 1 of 1, cached: 1 ✔
[f3/fa5fed] process > get_count [100%] 1 of 1, cached: 1 ✔
[a5/d3aac2] process > extract_reads [100%] 1 of 1, cached: 1 ✔
[67/70c6e0] process > call_variants (1024) [100%] 1024 of 1024, cached: 1024 ✔
[85/2cb7af] process > combine_variants [100%] 1 of 1 ✔
Completed at: 26-März-2021 08:43:10
Duration : 22m 53s
CPU hours : 43.0 (100% cached)
Succeeded : 1
Cached : 1'059
Results¶
All needed output files will be in the satMut_assoc/output/TERT/
folder.