DRACO uses a sliding window approach to analyze short reads tiling long target transcripts. The default size of the window is equal to 90% the median length of the reads, slid with an offset equal to 5% the window's size.
For each window, the following analysis steps are performed:

Reads covering the entire window are used to build a graph. In the graph, each base is a vertex; when two bases are observed to comutate within the same read, they are connected by an edge in the graph. Each edge, is weighted according to the comutation frequency of the two bases (or essentialy, the number of reads in which both bases are simultaneously mutated).

From the adjacency matrix of the graph, the normalized Laplacian matrix is calculated.

Spectral clustering is applied to this matrix, to identify the number of coexisting structural conformations (clusters) for the window being analyzed.

GraphCut is then used to weight each vertex in the graph according to its affinity to each conformation.

Reads are assigned to each window

The window is slid by the chosen offset, and steps 14 are repeated, until the entire transcript has been analyzed.

Consecutive windows forming the same number of conformations are grouped into window sets and merged.

For each window set, the relative conformation stoichiometries, and the individual reactivity profiles are reconstructed. Window sets found to form different numbers of conformations, are reported separately.
When analyzing a transcript, DRACO keeps in memory all the reads mapping to that transcript. At extreme sequencing depths (>500,000X), the memory consumption can become prohibitive on some system. If your system does not have sufficient RAM, the reads can be randomly downsampled along the transcript, to achieve a lower mean coverage. Usually, a coverage of 20,000X is sufficient for DRACO to efficiently deconvolve the underlying structural heterogeneity.
For additional information concerning downsampling, please consult the documentation for the rfcount module of the RNA Framework.
Usage
To list the required parameters, simply type:
$ draco help
Parameter  Type  Description 

mm  string  Input mutation map (MM) file 
output  string  Output JSON file (Default: draco_deconvolved.json) 
processors  int  Number of processors to use (Default: 0) Note #1: when set to 0, all available processors will be used Note #2: only the analysis of multiple transcripts can be parallelized. If analyzing a single transcript, setting this parameter to a value > 1 will not speed up the execution 
whitelist  string  A whitelist file, containing the IDs of the transcripts to be analyzed, one per raw (Default: all transcripts will be analyzed) 
shape  Enables spectral analysis on all four bases (default is only A/C bases) Note: this feature is highly experimental 

Mutation filtering options  
minBaseCoverage  int  Minimum coverage per base (Default: 100) 
minBaseMutations  int  Minimum mutations per base (Default: 2) 
minMutationFreq  float  Bases below this mutation frequency will be discarded as noise (Default: 0.005) 
minReadMutations  float  Reads with less than these mutations will be discarded as noninformative (Default: 2) 
Spectral deconvolution options  
maxClusters  int  Maximum allowed number of clusters/conformations (Default: unlimited) 
minFilteredReads  int  Minimum number of reads (postfiltering) to perform spectral analysis (Default: 5) 
minPermutations  int  Minimum number of permutations performed to build the null model (Default: 8) 
maxPermutations  int  Maximum number of permutations performed to build the null model (Default: 400) 
firstEigengapShift  float  How much the mean of the first eigengap's null model should be shifted by (Default: 0.9) 
eigengapCumRelThresh  float  Minimum relative difference between the eigengap and the null model, as a fraction of the cumulative difference between the previous eigengaps and their respective null models (Default: 0.1) Note: this does not apply to the first eigengap 
eigengapAbsThresh  float  Minimum absolute difference between the eigengap and the null model (Default: 0.03) 
alpha  float  Below this pvalue, the null hypothesis is rejected and the eigengap is marked as informative (Default: 0.01) 
beta  float  Above this pvalue, the alternative hypothesis is rejected and the eigengap is marked as noninformative (Default: 0.2) Note: this threshold does not apply to the first eigengap 
firstEigengapBeta  float  Beta pvalue threshold for the first eigengap (Default: 0.4) 
minNullStdev  float  Minimum standard deviation for the null model (Default: 0.025) Note: when this threshold is not met, extraPermutations additional permutations will be performed 
extraPermutations  int  Additional permutations to perform when the standard deviation of the null model is < minNullStdev (Default: 50) 
minWinBases  int  Minimum number of bases in window (postfiltering) to perform spectral analysis (Default: 10) 
lookaheadEigengaps  int  Number of eigengaps to look ahead after a noninformative eigengap is encountered (Default: 3) 
saveEigengapData  Saves eigengap data for plotting (Default: off)  
eigengapDataOut  string  Eigengap data output folder (Default: ./eigengap_data) 
GraphCut options  
minClusterFraction  float  Minimum fraction of reads assigned to each cluster/conformation (Default: 0.05) Note: if this threshold is not met, the number of clusters is automatically decreased 
softClusteringInits  float  Number of iterations for the initialization process of the graphcut. Note: The initialization with the lowest score is picked (Default: 500) 
softClusteringIters  float  Number of iterations performed on graphcut Note: The cut with the lowest score is picked (Default: 20) 
softClusteringWeightModule  float  The module of the weight that is used to change the cluster weights in order to find the lowest score (Default: 0.005) 
Windowed analysis options  
winLenFraction  float  Length of the window as a fraction of the median read length (Default: 0.9) Note: this parameter and absWinSize are mutually exclusive 
absWinLen  int  Absolute length of the window (Default: 0) Note: this parameter and winSizeFraction are mutually exclusive 
winOffsetFraction  float  Slide offset as fraction of the size of the window (Default: 0.05) Note: this parameter and absWinOffset are mutually exclusive 
absWinOffset  int  Absolute slide offset (Default: 0) Note: this parameter and winOffsetFraction are mutually exclusive 
maxIgnoreWins  int  Maximum number of internal windows with a different number of clusters to ignore when merging two external sets of windows (Default: 0) 
minExtWins  int  Minimum number of external windows, having the same number of clusters, needed to trigger merging (Default: 6) 
nonInformativeToSurround  int  Noninformative windows (windows with 0 detected clusters) are set to the same number of clusters of surrounding windows (Default: false) 
allNonInformativeToOne  int  If all windows in the transcript are noninformative (0 clusters), the number of clusters is forced to 1 (Default: off) 
reportNonInformative  int  Reports also noninformative windows in the output JSON file 
Understanding the algorithm
While it is advisable for most users to run DRACO with its default parameters, as these are the results of a careful and thorough calibration, it might be useful to adjust the analysis on a casebycase basis.
DRACO analysis is performed in windows. The size of the window is by default winLenFraction
× the median read length; an arbitrary length (in bp) for the window can also be used, by setting absWinLen
. Similarly, the slide offset is by default winOffsetFraction
× the length of the window, but an arbitrary offset (in bp) can be specified via absWinOffset
. When long reads are used, reducing the size of the window might allow the identification of small highly dynamic regions of the target transcript. For instance, if 300 bplong reads are being used, and the structurallydynamic region within the target transcript is only 90 ntlong, the spectral analysis might not be able to detect it (especially if probing with DMS, as only ~50% of the bases will be informative); in these situations, reducing the size of the window can increase the sensitivity of the analysis.
The spectral deconvolution represents the most critical step of the algorithm, as this is responsible of determining the number of conformations formed by each window. During this step, the eigengaps, representing the distance between consecutive eigenvalues, are compared to a null model, built by performing random permutations over the original data matrix. The distribution of the null model is well approximated by a Weibull distribution. The number of permutations performed varies for each eigengap, and it is comprised between minPermutations
and maxPermutations
. Permutations are performed as long as the standard deviation (σ) of the null model is < minNullStdev
, increasing by extraPermutations
.
Starting with the first eigengap, DRACO compares each eigengap to its respective null model. Aim of this comparison is to determine which eigengaps can be considered to be informative, as this directly translates into the number of conformations/clusters formed by the analyzed window. For instance, if the first two eigengaps are informative, two conformations (clusters) are present; if the first three eigengaps are informative, three conformations are present, and so on.
In order to be considered informative, each eigengap must fulfill a number of criteria:

The distance (d) between the mean (μ) of the null model and the eigengap must be ≥
eigengapAbsThresh
. 
The distance between the μ and the eigengap must be greater than
egengapCumRelThreshold
× the cumulative sum of the previous eigengaps. For instance, d_{4}, the distance between the 4^{th} eigengap and its null model, must be ≥egengapCumRelThreshold
× (d_{1} + d_{2} + d_{3}). This, of course, does not apply to the first eigengap 
A ttest is used to assess whether the distance between the eigengap and the null model is significant. If the pvalue is ≤
alpha
, the null hypothesis is rejected, and the eigengap is marked as informative. If the pvalue is ≥beta
(or in the case of the first eigengap, ≥firstEigengapBeta
), the alternative hypothesis is rejected, and the eigengap is marked as noninformative. If the pvalue is comprised between these two thresholds, neither of the two hypotheses can be rejected, so additional permutations are performed. If, after having performedmaxPermutations
permutations the pvalue is still comprised between the two thresholds, then the alternative hypothesis is rejected and the eigengap is marked as noninformative.
The analysis terminates when a noninformative eigengap is encountered as, normally, the subsequent eigengaps are noninformative as well. In some particular cases, however, eigengaps have been observed to behave unexpectedly. In such situations, although an eigengap is marked as noninformative, one or more eigengaps immediately downstream of it are again informative. To account for these situations, whenever a noninformative eigengap is encountered, DRACO can perform a lookahead evaluation of the lookaheadEigengaps
subsequent eigengaps. If one of these is marked as informative, then the analysis of the eigengaps is allowed to continue.
Once the number of conformations has been determined, the algorithm uses a normalized GraphCut approach to weight each base in the window, according to its affinity to each of the different conformations. This approach tries to find the best way to partition the previously built graph. It is composed of two steps:
 Initialization. During this step, each base of each cluster is assigned a random weight. The step is repeated
softClusteringInits
times and the solution with the lowest score is picked.  Refinement. During this step, the weights of the bases are dynamically altered by
softClusteringWeightModule
in an attempt to further minimize the score. This step is repeatedsoftClusteringIters
and the solution with the lowest score is picked.
Note: in the original implementation of DRACO, this step was carried out only once. While being significantly faster, the downside was a higher risk of finding suboptimal graph partitioning solutions, corresponding to local score minima. Lowering the softClusteringWeightModule
and increasing the softClusteringIters
can slow down the analysis, but it significantly increases the likelihood that the true optimal solution is identified, further ensuring reproducibility of the analysis results.
Following identification of the optimal graph partitioning, reads are assigned to each conformation. If the fraction of reads assigned to a conformation is < minClusterFraction
, the number of conformations for the window is decreased by one, and the step is repeated.
Final step of the analysis involves merging consecutive windows, found to form the same number of conformations, into window sets. Let's however consider the following case:
In this situation, two sets of windows found to form two conformations, are interrupted by one window forming one conformation. This often happens with short reads (<100 bp), as the analyzed window might not contain enough information to identify the coexisting conformations. By default (left), DRACO would report three window sets. It is possible to account for these cases, hence allowing DRACO to ignore at most maxIgnoreWins
internal windows forming a discordant number of conformations. For the merging to occur, the total number of external windows on either sides of the discordant internal windows, must be ≥ minExtWins
.
It is however possible for the windows on the left and on the right sides of the discordant internal windows, to form a different number of conformations:
In such a case, only if the number of windows on one side is > than the number of windows on the other side, as well as ≥ minExtWins
, the internal discordant windows will be merged into a single set with the windows on that side.
Output JSON files
DRACO produces a JSON file, with the following structure:
{
"filename":"sample.mm",
"transcripts":[
{
"id":"RNA_1",
"sequence":"TCTATTCTACATTGATAGAACAACTCT...TCTGTCTATCACCGCTAGAGCACTCGGTGATTGCA",
"nReads":20000,
"windows":[
{
"start":0,
"end":294,
"stoichiometries":[ 0.388,0.304,0.308 ],
"counts":[
[ 0,1,0,23,0,0,33,0,54,0,5,0,0,...,301,0,334,0,3,1,0,0,0,0,0,0 ],
[ 0,6,0,0,0,0,0,0,0,0,36,0,0,2,...,0,0,0,0,0,0,134,0,86,0,0,62,63 ],
[ 0,1,0,17,0,0,57,0,0,0,75,0,11,...,181,0,0,0,0,0,0,0,0,118,0,0,0,0 ]
],
"coverage":[
[ 40,69,121,157,194,245,297,337,378,416,465,...,521,482,462,430,404,359,311,287,250,211 ],
[ 37,73,108,135,163,198,238,279,317,340,369,...,534,512,481,463,440,414,380,240,204,173 ],
[ 35,66,107,141,168,206,250,292,337,380,415,...,428,394,364,336,309,271,240,199,181,111 ]
],
"weights":[
[ 0.000,1.000,0.000,0.086,0.000,0.055,0.000,0.000,...,0.110,0.110,0.000,0.000,0.000,0.000,0.000,0.000 ],
[ 0.039,0.035,0.000,0.000,0.000,0.000,0.000,0.000,...,0.000,0.922,0.000,0.000,0.035,0.925,0.000,0.000 ],
[ 0.165,0.000,0.000,0.855,0.000,0.020,0.000,0.855,...,0.000,0.000,0.000,0.055,0.000,0.055,0.000,0.835 ]
],
"preCoverage":[ 23979,24261,24506,24751,24996,25257,25539,...,25813,26078,26340,26616,26889,27182,27441,27719,28009 ]
}
]
}
]
}
The transcripts array contains all the transcripts processed by DRACO. Each transcript includes the following keys:
Key  Description 

id  Transcript's ID 
sequence  Transcript's sequence 
nReads  Total number of reads mapping to the transcript 
windows  An array containing the deconvolution results for each window set 
The windows array is further structured as follows:
Key  Description 

start  Start position of the window (0based) 
end  End position of the window (1based) 
stoichiometries  An array containing the relative abundances for the reconstructed conformations 
counts  An array of arrays, each one containing the number of mutations per base for each of the reconstructed conformations 
coverage  An array of arrays, each one containing the coverage per base for each of the reconstructed conformations, calculated using all the assigned reads postfiltering 
weights  An array of arrays, each one containing the weight per base from the GraphCut analysis 
preCoverage  An array containing the total coverage per base along the window, calculated using only the reads prefiltering, covering the entire window, that have been used to perform the spectral analysis (therefore, it can be < than the sum of the coverage arrays) 
Postprocessing of DRACO output
DRACO JSON output files can be somehow hard to interpret and process. To facilitate this operation, we developed the rfjson2rc
module, that has been included in the RNA Framework. The module allows converting the reactivity profiles deconvolved by DRACO into RNA Count (RC) files, that can be subsequently normalized with the rfnorm
module, and then fed into the rffold
module for secondary structure modeling. For additional details, please refer to the rfjson2rc
docs.