README for RPMCMC motif sampler Current version(0.0) of RPMCMC has some limitations as follows: maximum number of input sequences is 100000 maximum length of input sequences is 3000 maximum iteration is 10000 ##Installation GCC version 4.4 or above is required since this software uses OpenMP v3.0. For Windows, installing linux in a virtual machine might be preferable. % cd [download directory] % tar xvfz rpmcmc-xxx.tar.gz % cd ./rpmcmc-xxx/src % make % make install ##stack size Before executing the application, it is necessary to increase the stack size as following command: % ulimit -s unlimited ##Start searching motifs % rpmcmc-xxx/bin/multi_motif_sampler -d [input fasta file] -od [output directory] ##Option % rpmcmc-xxx/bin/multi_motif_sampler [-key value] od:output directory path: d:input fasta file path: ps:input position specific prior file path:(optional; uniform prior by default) p:order of markov background model(0<=p<=3)[int] (optional; 0 by default) ks:motif length min(ks<=kl<=30)[int] (optional; 6 by default) kl:motif length max(ks<=kl<=30)[int] (optional; 14 by default) n:the number of replicas(1<=n<=100)[int] (optional; 20 by default) r:strength of repulsive force(0<=r<=100)[float] (optional; 10.0 by default) i:the number of iteration(u<=i<=10000)[int] (optional; 520 by default) u:burnin step(0<=u=0)[float] (optional; 0.3 by default) l:threshold for clustering(0<=l<=1)[float] (optional; 0.3 by default) ##Sample data test_sequence.fasta is a sample dataset placed in src directory (rpmcmc-xxx/src/test_sequence.fasta). ##Example % rpmcmc-xxx/bin/multi_motif_sampler -d rpmcmc-xxx/src/test_sequence.fasta -od ./ -n 10 -i 120 RPMCMC will start searching motifs in the sample dataset with 10 replicas and 120 iterations. Output files will be placed in the current directory (option key -od). ##Output filename:pfm This file shows the Position Frequency Matrix(PFM). *Example Motif1 ## motif index A C G T ## order of letters 335 500 73 1976 ## frequency of each letters 21 29 63 2771 16 7 26 2835 1 2801 58 24 31 2067 24 762 37 430 16 2401 Motif2 ## next motif A C G T 121 2663 149 0 2620 294 4 15 8 2488 47 390 41 188 2591 113 103 190 47 2593 54 15 2863 1 filename:locate This file shows where motifs are embedded. *Example Motif1 seq_no start end direction motifSeq 1 1 6 - TTTCCC 2 378 383 - GTTCTT 3 53 58 - TTTCCT 4 34 39 - TTTCCC 6 84 89 - ATTCCT 7 389 394 - CTTCCT 8 364 369 - CTTCTT 9 73 78 + ATTCCT 10 233 238 + TTTCTT 11 408 413 - TTTCCT 12 196 201 + CTTCCT 13 455 460 - CTTCCT 14 82 87 - TTTCCT 15 111 116 - TTTCCT 16 349 354 + TTTCTT 17 64 69 - TTTCCT First column is the index number of sequences in fasta sequence set. Second and third columns are the start and the end positions, respectively. Fourth column is the direction ('+' and '-' mean that they locate from 5' to 3' and 3' to 5', respectively). Fifth column shows each sequence. ###contact ikebata.hisaki@ism.ac.jp Feel free to asking if you do not understand how to use RPMCMC.