The simulate_mm
utility provides a simple way to generate simulated mutation map (MM) files for transcripts forming any number of conformations, mixed at arbitrary stoichiometries.
Usage
To list the required parameters, simply type:
$ simulate_mm -h
Parameter | Type | Description |
---|---|---|
-o or --outProfiles | string | Output file with structure profiles updated according to the simulation |
-p or --stoichiometry | string | Comma-separated list of % conformation stoichiometries Note #1: the stoichiometries must sum to approx. 100 (tollerance: 97-103) Note #2: When no stoichiometry is specified, the conformations are assumed to be equimolar |
-c or --meanCoverage | int | Mean sequencing depth (coverage) per base Note: this parameter and --nReads are mutually exclusive |
-n or --nReads | int | Number of reads mapping to each transcript Note: this parameter and --meanCoverage are mutually exclusive |
--probability | float | Sets the p value for generation of the binomial distribution of mutations (Default: 0.01927) Note: the default value has been learnt empirically from Homan et al., 2014 |
-s or --readLen | int | Length (in bp) of the simulated reads |
-t or --text | string | Output MM file's "human-readable" version |
Input structure profile file
RNAs to be generated can be provided in the form of structure profile files.
TCTATTCTACATTGATAGAAC...ACCGCTAGAGCACTCGGTGATTGCA
x.xxxxxxxx.xxxxx.xxxx...xxxxxx.xxx.xxxx.xxx.xxx.x
xxx.xx.x.xxxxx.xxx.x....xxxxxx.x.xx.xxxxxxxxxxxx.
0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,0
0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,...,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
TATCTTATCACTTGCTCGCCA...CAAGATCGCGACATAGGTGCTTGAC
(((.)).).(((((.(((.(....)))))).).)).xxxxxxxxxxxx.
(.((((((((.(((((.((((...)))))).))).)))).))).))).)
0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,...,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,0
These files contain entries composed of four parts:
- The transcript's sequence
- A textual representation of the possible structures formed by the RNA
- A numeric profile representing the structures indicated in 2
- An empty line, marking the end of the entry
The textual representation uses dots (".") to represent unpaired bases, and "x" or parantheses ("(" and ")") to represent paired bases. No check is made on the proper balancing of parentheses in dot-bracket structures.
The numeric profile must match the textual representation of the structure. In these profiles, 0 indicates a paired base, while any value ≥1 represents an upaired base (the used numeric value is not relevant to the simulation).
Besides generating an MM file, a new structure profile file will also be generated (controlled via the -o
or --outProfiles
parameter), identical to the one provided as input, but with the numeric profiles updated to the actual mutation counts from the simulation.