Association¶

Input files¶

Fastq Files¶

2-3 Fastq files from library association sequencing –Candidate regulatory sequence (CRS) sequencing, 1 forawrd read and an optional reverse read if paired end sequencing was used –Barcode sequence, 1 read covering the barcode

Design File¶

Fasta file of of CRS sequences with unique headers describing each tested sequence

Example file:

>CRS1
GACGGGAACGTTTGAGCGAGATCGAGGATAGGAGGAGCGGA
>CRS2
GGGCTCTCTTATATTAAGGGGGTGTGTGAACGCTCGCGATT
>CRS3
GGCGCGCTTTTTCGAAGAAACCCGCCGGAGAATATAAGGGA
>CRS4
TTAGACCGCCCTTTACCCCGAGAAAACTCAGCTACACACTC

Label File (Optional)¶

Tab separated file (TSV) of desired labels for each tested sequence

Example file:

CRS1  Positive_Control
CRS2  Negative_Control
CRS3  Test
CRS4  Positive_Control

Note

If you provide a label file, the first column of the label file must exactly match the FASTA file or the files will not merge properly in the pipeline.

association.nf¶

Options¶

With --help or --h you can see the help message.

Mandatory arguments:

`--fastq-insert`	Full path to library association fastq for insert (must be surrounded with quotes)
`--fastq-bc`	Full path to library association fastq for bc (must be surrounded with quotes)
`--design`	Full path to fasta of ordered oligo sequences (must be surrounded with quotes)
`--name`	Name of the association. Files will be named after this.

Optional:

`--fastq-insertPE`
	Full path to library association fastq for read2 if the library is paired end (must be surrounded with quotes)
`--min-cov`	minimum coverage of bc to count it (default 3)
`--min-frac`	minimum fraction of bc map to single insert (default 0.5)
`--mapq`	map quality (default 30)
`--baseq`	base quality (default 30)
`--cigar`	require exact match ex: 200M (default none)
`--outdir`	The output directory where the results will be saved and what will be used as a prefix (default outs)
`--split`	Number read entries per fastq chunk for faster processing (default: 2000000)
`--labels`	tsv with the oligo pool fasta and a group label (ex: positive_control) if no labels desired a file will be automatically generated

Processes¶

Processes run by nextflow in the Association Utility. Some Processes will be run only if certain options used and are marked below.

count_bc or count_bc_nolab (if no label file is provided): Removes any illegal characters (defined by Piccard) in the label file and design file. Counts the number of reads in the fastq file.
create_BWA_ref: Creates a BWA reference based on the design file
PE_merge (if paired end fastq files provided): Merges the forward and reverse reads covering the CRS using fastq-join
align_BWA_PE or align_BWA_S (if single end mode): Uses BWA to align the CRS fastq files to the reference created from the Design File. This will be done for each fastq file chunk based on the split option.
collect_chunks: merges all bamfiles from each separate alignment
map_element_barcodes: Assign barcodes to CRS and filters barcodes by user defined parameters for coverage and mapping percentage
filter_barcodes: Visualize results

Output¶

The output can be found in the folder defined by the option --outdir. It is structured in folders of the condition as

Files¶

count_fastq.txt: number of barcode reads
count_merged.txt: number of aligned CRS reads
design_rmIllegalChars.fa: Design file with illegal characters removed
label_rmIllegalChars.txt: Label file with illegal characters removed
s_merged.bam: sorted bamfile for CRS alignment
${name}_coords_to_barcodes.pickle: pickle file containing a python dictionary of CRS/barcode mappings
*.png: Visualization of number of barcodes mapping to enhancers