The simulate_mm utility provides a simple way to generate simulated mutation map (MM) files for transcripts forming any number of conformations, mixed at arbitrary stoichiometries.

Usage

To list the required parameters, simply type:

$ simulate_mm -h
Parameter Type Description
-o or --outProfiles string Output file with structure profiles updated according to the simulation
-p or --stoichiometry string Comma-separated list of % conformation stoichiometries
Note #1: the stoichiometries must sum to approx. 100 (tollerance: 97-103)
Note #2: When no stoichiometry is specified, the conformations are assumed to be equimolar
-c or --meanCoverage int Mean sequencing depth (coverage) per base
Note: this parameter and --nReads are mutually exclusive
-n or --nReads int Number of reads mapping to each transcript
Note: this parameter and --meanCoverage are mutually exclusive
--probability float Sets the p value for generation of the binomial distribution of mutations (Default: 0.01927)
Note: the default value has been learnt empirically from Homan et al., 2014
-s or --readLen int Length (in bp) of the simulated reads
-t or --text string Output MM file's "human-readable" version


Input structure profile file

RNAs to be generated can be provided in the form of structure profile files.

TCTATTCTACATTGATAGAAC...ACCGCTAGAGCACTCGGTGATTGCA
x.xxxxxxxx.xxxxx.xxxx...xxxxxx.xxx.xxxx.xxx.xxx.x
xxx.xx.x.xxxxx.xxx.x....xxxxxx.x.xx.xxxxxxxxxxxx.
0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,0
0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,...,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1


TATCTTATCACTTGCTCGCCA...CAAGATCGCGACATAGGTGCTTGAC
(((.)).).(((((.(((.(....)))))).).)).xxxxxxxxxxxx.
(.((((((((.(((((.((((...)))))).))).)))).))).))).)
0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,...,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,0


These files contain entries composed of four parts:

  1. The transcript's sequence
  2. A textual representation of the possible structures formed by the RNA
  3. A numeric profile representing the structures indicated in 2
  4. An empty line, marking the end of the entry

The textual representation uses dots (".") to represent unpaired bases, and "x" or parantheses ("(" and ")") to represent paired bases. No check is made on the proper balancing of parentheses in dot-bracket structures.
The numeric profile must match the textual representation of the structure. In these profiles, 0 indicates a paired base, while any value ≥1 represents an upaired base (the used numeric value is not relevant to the simulation).

Besides generating an MM file, a new structure profile file will also be generated (controlled via the -o or --outProfiles parameter), identical to the one provided as input, but with the numeric profiles updated to the actual mutation counts from the simulation.