Welcome to MPRAflow’s documentation!

This pipeline processes sequencing data from Massively Parallel Reporter Assays (MPRA) to create count tables for candidate sequences tested in the experiment.

Installation & Getting Started
Instructions for the Installation of the program and some examples to get you started.
MPRAflow Workflows
An overview of how MPRAflow works and documentation for the MPRAflow sub workflows.
MPRAflow Examples
Muliple examples from the literature are listed for every sub workflow in MPRAflow.
Project Information
More information on the project, including the changelog, list of contributing authors, and contribution instructions.

Quick Example

nextflow run count.nf --dir "fastQFolder" --experiment-file "experiment.csv" --design "design.fa" --association "bc_map.pickle" --m 1 -resume --mpranalyze --outdir "output"


Association:This utility takes in library association sequencing data (FASTQ) and a design file (FASTA) to assign barcodes to the corresponding elements tested. Functionality includes filtering for quality and coverage of barcodes. This utility must be run before the COUNT utility.
Count:This utility processes sequence data (FASTQ) of barcodes from the DNA and RNA fractions of the MPRA experiment and outputs count tables labeled with the element tested and a label provided in the design file. This utility can process multiple replicates and conditions in a parallelized manner. Based on a user specified flag, the pipeline will either output normalized activity for each tested sequence, or will combine the results into a single count matrix compatible with MPRAnalyze.
Association Saturation mutagenesis:
 This workflow is about assocation variant calls with barcodes. Variants are introduced by an error-prone PCR. The workflow takes the sequencing of the region, with barcodes in index read and the reference sequence and maps the reads to the reference, calls variants and associates them with the corresponding barcode.
Saturation mutagenesis:
 This workflow is about getting single variant effect from a target with multiple mutations generated by error-prone PCR. The workflow takes counts (e.g. from the count workflow), combines them with an association file (variants to barcodes, e.g. from saturation mutagenesis assocation) and uses a generalized linear model to to detect single variant effects.


The best place to leave feedback, ask questions, and report bugs is the MPRAflow Issue Tracker.

Indices and tables