Association

_images/Association_util.png

Input files

Fastq Files

2-3 Fastq files from library association sequencing –Candidate regulatory sequence (CRS) sequencing, 1 forawrd read and an optional reverse read if paired end sequencing was used –Barcode sequence, 1 read covering the barcode

Design File

Fasta file of of CRS sequences with unique headers describing each tested sequence

Example file:

>CRS1
GACGGGAACGTTTGAGCGAGATCGAGGATAGGAGGAGCGGA
>CRS2
GGGCTCTCTTATATTAAGGGGGTGTGTGAACGCTCGCGATT
>CRS3
GGCGCGCTTTTTCGAAGAAACCCGCCGGAGAATATAAGGGA
>CRS4
TTAGACCGCCCTTTACCCCGAGAAAACTCAGCTACACACTC

Label File (Optional)

Tab separated file (TSV) of desired labels for each tested sequence

Example file:

CRS1  Positive_Control
CRS2  Negative_Control
CRS3  Test
CRS4  Positive_Control

Note

If you provide a label file, the first column of the label file must exactly match the FASTA file or the files will not merge properly in the pipeline.

association.nf

Options

With --help or --h you can see the help message.

Mandatory arguments:
--fastq-insert Full path to library association fastq for insert (must be surrounded with quotes)
--fastq-bc Full path to library association fastq for bc (must be surrounded with quotes)
--design Full path to fasta of ordered oligo sequences (must be surrounded with quotes)
--name Name of the association. Files will be named after this.
Optional:
--fastq-insertPE
 Full path to library association fastq for read2 if the library is paired end (must be surrounded with quotes)
--min-cov minimum coverage of bc to count it (default 3)
--min-frac minimum fraction of bc map to single insert (default 0.5)
--mapq map quality (default 30)
--baseq base quality (default 30)
--cigar require exact match ex: 200M (default none)
--outdir The output directory where the results will be saved and what will be used as a prefix (default outs)
--split Number read entries per fastq chunk for faster processing (default: 2000000)
--labels tsv with the oligo pool fasta and a group label (ex: positive_control) if no labels desired a file will be automatically generated

Processes

Processes run by nextflow in the Association Utility. Some Processes will be run only if certain options used and are marked below.

count_bc or count_bc_nolab (if no label file is provided)
Removes any illegal characters (defined by Piccard) in the label file and design file. Counts the number of reads in the fastq file.
create_BWA_ref
Creates a BWA reference based on the design file
PE_merge (if paired end fastq files provided)
Merges the forward and reverse reads covering the CRS using fastq-join
align_BWA_PE or align_BWA_S (if single end mode)
Uses BWA to align the CRS fastq files to the reference created from the Design File. This will be done for each fastq file chunk based on the split option.
collect_chunks
merges all bamfiles from each separate alignment
map_element_barcodes
Assign barcodes to CRS and filters barcodes by user defined parameters for coverage and mapping percentage
filter_barcodes
Visualize results

Output

The output can be found in the folder defined by the option --outdir. It is structured in folders of the condition as

Files

count_fastq.txt
number of barcode reads
count_merged.txt
number of aligned CRS reads
design_rmIllegalChars.fa
Design file with illegal characters removed
label_rmIllegalChars.txt
Label file with illegal characters removed
s_merged.bam
sorted bamfile for CRS alignment
${name}_coords_to_barcodes.pickle
pickle file containing a python dictionary of CRS/barcode mappings
*.png
Visualization of number of barcodes mapping to enhancers