Association saturation mutagenesis

This workflow is about assocation variant calls with barcodes. Variants are introduced by an error-prone PCR. The workflow takes the sequencing of the region, with barcodes in index read and the reference sequence and maps the reads to the reference, calls variants and associates them with the corresponding barcode.

Input files

Fastq Files

3 Fastq files from library association sequencing –Reference sequencing, 1 forward and 1 reverse read –Barcode sequence, 1 read covering the barcode

Reference file

Fasta file of the referencesequence describing the mutated sequence

Example file:

>TERT
GATCTGCGATCTAAGTAAGCCCAGGACCGCGCTTCCCACGTGGCGGAGGGACTGGGGACCCGGGCACCCGTCCTGCCCCT
TCACCTTCCAGCTCCGCCTCCTCCGCGCGGACCCCGCCCCGTCCCGACCCCTCCCGGGTCCCCGGCCCAGCCCCCTCCGG
GCCCTCCCAGCCCCTCCCCTTCCTTTCCGCGGCCCCGCCCTCTCCTCGCGGCGCGAGTTTCAGGCAGCGCTGCGTCCTGC
TGCGCACGTGGGAAGCCCTGGCCCCGGCCACCCCCGCGAAAGCTTGCATGCCCTGCAGG

association_saturationMutagenesis.nf

Options

With --help or --h you can see the help message.

Usage:

Mandatory arguments:
--fastq-insert Full path to library association fastq for insert
--fastq-insertPE
 Full path to library association fastq for read2
--fastq-bc Full path to library association fastq for bc
--design Full path to fasta of reference sequence (only one reference sequence)
--name Name of the association. Files will be named after this.
Optional:
--bc-length Barcode length (default 15)
--clipping-penalty
 bwa mem clipping penalty (default 80)
--min-ireads minimum number gapped reads for indel candidates (default: 3)
--split Split up the fastq read into chunks with max limit of reads (default: 2000000)
--outdir The output directory where the results will be saved and what will be used as a prefix (default outs)
--h, --help Print the help message

Processes

Processes run by nextflow in the Association Saturation Mutagenesis Utility.

clean_design
Removes any illegal characters (defined by Piccard) in the reference file.
create_BWA_ref
Creates a BWA reference based on the design file
get_name
Recieves the name written in the header of the reference fasta file
create_BAM
Merges the forward and reverse reads and stores them in BAM format. This is run for each chunk.
collect_chunks
combine the chunks (merge bams)
PE_mapping
Map the reads to the reference.
get_count
Get the barcodes and count for each barcode. Filter them
extract_reads
Create a BAM file for each barcode with the corresponding reads in it. This create a large number of files!
call_variants:
Call variants for each barcode seperately.
combine_variants:
Combine barcode/variant calls to one final output file.

Output

The output can be found in the folder defined by the option --outdir and the subfolder definedby option --name. It is structured in folders of the condition as

Files

design_rmIllegalChars.fa
Reference file with remove illegal characters
${datasetID}.variants.txt.gz
Barcode to variant association. Named by the header in the reference.
counts_${datasetID}.filtered.tsv.gz
Filtered barcode counts. Named by the header in the reference.
counts_${datasetID}.tsv.gz
All barcode counts. Named by the header in the reference.