I. Installation

1) RDDpred do not require any installation in general.
2) However, in some systems, it is needed to re-compile the external packages, sucha as samtools/bcftools, and bamtools.
3) We provide "" script, which includes a test code running with test-data.
4) Thus, a user should run the script first, then solve the rising errors (SEE IV. Troubleshooting).

II. Usage

Usage: python [arguments]

A. Essential Arguments:
-rbl, --RNA_Bam_List [FILE] List of sorted_bam. Make sure of having identical bam-headers.
-rsf, --Reference_Sequence_Fasta [FILE] Fasta file used for alignments
-tdp, --Tools_Dir_Path [DIR] Tools directory path
-ops, --Out_Prefix [STR] Prefix string for intermediate files (ex: [prefix].RDD.RawList.txt)
-psl, --Positive_Sites_List [FILE] Published true-sites (see README)
-nsl, --Negative_Sites_List [FILE] Calcualted false-sites (see README)

B. Optional Arguments:
-gsl, --Genomic_SNPs_List [FILE] List of Genomic SNPs wanted to be excluded
-pni, --Process_Num [INT] Process numbers to use (default:1)
-tml, --Train_Max_Limit [INT] Training data size upper-limits (default:100000)
-mul, --Memory_Usage_Limit [GB] Maximum usage of memory (default:10G)
-sud, --Storage_Usage_Degree [FLOAT] EX: -sud 1 means using 1-bamfile size of storage (default:10)

III. Input/Output

1. Input
a) RNA_Bam_List(-rbl): A list of sorted BAM-paths, which are considered as replicated data by RDDpred

b) Reference_Sequence_Fasta(-rsf): Fasta of reference genome used in alignment

c) Tools_Dir_Path(-tdp): Directory for "ToolBox.Dir", which includes some external software packages

d) Positive_Sites_List(-psl): A list of pre-known sites (For hg19, we provided in "PriorData/hg19.PublicSites.txt")
chr10 100814530 T C
chr10 100814536 T C
chr10 100814559 T C

e) Negative_Sites_list(-nsl): A list of MES-sites (For hg19, we provided in "PriorData/hg19.MES_Sites.txt")
chr10 100003848 A C
chr10 100003849 G A
chr10 100003850 A T
f) Genomic_SNPs_List(-gsl): A list of sites which a user wants to be excluded in prediction (Optional, same format as the other lists)

2. Output
a) Argument.Log: List of arguments, given by user

b) Step5.Results.Summary.txt: Summary of prediction

c) Prediction.ResultList.txt: List of prediction results (of each candidates)

d) RDDpred.results_report.txt: List of editing-ratio (of accepted candidates)

-------------- Definition of Prediction_Likelihood_Ratio ----------------
** If "Predicted_As" == "True_Editing"
"Prediction_Likelihood_Ratio" = p(true) / (p(true) + p(artefact))
** If "Predicted_As" == "Artefact"
"Prediction_Likelihood_Ratio" = p(artefact) / (p(true) + p(artefact))

#p(true) = likelihood of that the candidate is actually a true-editing
#p(artefact) = likelihood of that candidate is actually an artefact

e) RDD.RawList.txt: List of raw candidates

f) Predictor.Training.Log: Training result of RDDpred classifier (Including 10-fold CV)

g) Attribute.Evaluation.Log: Evaluation result of variable importance (with InfoGain)

h) Model.Dir/: Including training data and model file of WEKA.

IV. Troubleshooting

To maximize reproducibility, RDDpred imports the specific version of external packages, such as, samtools/bcftools(1.2.1), bamtools(2.4.0), WEKA(3.6.2). Therefore, in some systems it can be required to re-compile the source codes of them. Thus, if you observe the listed error-messages when executing RDDpred, the following commands should be done first.

Categories of Errors
1) [Error_1] Samtools is not working properly...
(in ToolBox.Dir)
rm -rf samtools-1.2/
tar xjf samtools-1.2.tar.bz2
cd samtools-1.2

2) [Error_2] Bcftools is not working properly...
(in ToolBox.Dir)
rm -rf bcftools-1.2/
tar xjf bcftools-1.2.tar.bz2
cd bcftools-1.2

3) [Error_3] Bamtools is not working properly...
(in ToolBox.Dir)
rm -rf bamtools-2.4.0/
tar xjf bamtools-2.4.0.tar.bz2
cd bamtools-2.4.0
mkdir build
cd build
cmake ..

4) [Error_4] Weka is not working properly...
=> Make sure that your system has JVM >1.7.

5) ImportError: No module named ...
=> Install the python module, specified in the error messasge

V. Contact

1) If there are any other questions, please contact to us through this email:
2) Citation
=> Kim et al., "RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data.", BMC Genomics 2016, 17(Suppl 1):5