International Science Index


10007311

CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Abstract:The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.
References:
[1] D. Gusfield, Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, 1997.
[2] D. Sankoff, “Simultaneous solution of the rna folding, alignment and protosequence problems,” SIAM Journal on Applied Mathematics, vol. 45, no. 5, pp. 810–825, 1985.
[3] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
[4] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of molecular biology, vol. 147, no. 1, pp. 195–197, 1981.
[5] J. Gorodkin, L. J. Heyer, and G. D. Stormo, “Finding the most significant common sequence and structure motifs in a set of rna sequences,” Nucleic Acids Research, vol. 25, no. 18, pp. 3724–3732, 1997.
[6] J. H. Havgaard, R. B. Lyngsø, G. D. Stormo, and J. Gorodkin, “Pairwise local structural alignment of rna sequences with sequence similarity less than 40%,” Bioinformatics, vol. 21, no. 9, pp. 1815–1824, 2005.
[7] J. H. Havgaard, R. B. Lyngsø, and J. Gorodkin, “The foldalign web server for pairwise structural rna alignment and mutual motif search,” Nucleic acids research, vol. 33, no. suppl 2, pp. W650–W653, 2005.
[8] J. H. Havgaard, E. Torarinsson, and J. Gorodkin, “Fast pairwise structural rna alignments by pruning of the dynamical programming matrix,” PLOS computational biology, vol. 3, no. 10, p. e193, 2007.
[9] E. Torarinsson, J. H. Havgaard, and J. Gorodkin, “Multiple structural alignment and clustering of rna sequences,” Bioinformatics, vol. 23, no. 8, pp. 926–932, 2007.
[10] M. Bauer, G. W. Klau, and K. Reinert, “Accurate multiple sequence-structure alignment of rna sequences using combinatorial optimization,” BMC bioinformatics, vol. 8, no. 1, p. 271, 2007.
[11] S. Heyne, S. Will, M. Beckstette, and R. Backofen, “Lightweight comparison of rnas based on exact sequence-structure matches,” Bioinformatics, p. btp065, 2009.
[12] Y. Jiang, W. Xu, L. P. Thompson, R. R. Gutell, and D. P. Miranker, “R-pass: A fast structure-based rna sequence alignment algorithm,” in Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on. IEEE, 2011, pp. 618–622.
[13] Y. Tabei, K. Tsuda, T. Kin, and K. Asai, “Scarna: fast and accurate structural alignment of rna sequences by matching fixed-length stem fragments,” Bioinformatics, vol. 22, no. 14, pp. 1723–1729, 2006.
[14] T. K. Wong, K.-L. Wan, B.-Y. Hsu, B. W. Cheung, W.-K. Hon, T.-W. Lam, and S.-M. Yiu, “Rnasalign: Rna structural alignment system,” Bioinformatics, vol. 27, no. 15, pp. 2151–2152, 2011.
[15] M. Hochsmann, T. Toller, R. Giegerich, and S. Kurtz, “Local similarity in rna secondary structures,” in Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE. IEEE, 2003, pp. 159–168.
[16] M.-Y. Wu, C.-B. Yang, and K.-S. Huang, “Rna secondary structure alignment based on stem representation,” in Proceedings of the 21st Workshop on Combinational Mathematics and Computation Theory. Citeseer, pp. 60–69.
[17] J. Liu, J. T. Wang, J. Hu, and B. Tian, “A method for aligning rna secondary structures and its application to rna motif detection,” BMC bioinformatics, vol. 6, no. 1, p. 89, 2005.
[18] C. Zhong and S. Zhang, “Efficient alignment of rna secondary structures using sparse dynamic programming,” BMC bioinformatics, vol. 14, no. 1, p. 269, 2013.
[19] G. Badr and M. Turcotte, “Component-based matching for multiple interacting rna sequences,” in Bioinformatics Research and Applications. Springer, 2011, pp. 73–86.
[20] H. Arraqibah and G. Badr, “Extended component-based motif localization for interacting rna structures,” in ISBRA 2013, Charlotte, US, May 20 2013.
[21] B. Hunter, “Visualization of secondary rna structure prediction algorithms,” 2006.
[22] I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker, and P. Schuster, “Fast folding and comparison of rna secondary structures,” Monatshefte f¨ur Chemie/Chemical Monthly, vol. 125, no. 2, pp. 167–188, 1994.
[23] S. W. Burge, J. Daub, R. Eberhardt, J. Tate, L. Barquist, E. P. Nawrocki, S. R. Eddy, P. P. Gardner, and A. Bateman, “Rfam 11.0: 10 years of rna families,” Nucleic acids research, p. gks1005, 2012.
[24] D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers, “Genbank,” Nucleic acids research, p. gks1195, 2012.
[25] A. Alturki, “Component based pair-wise rna secondary structure alignment algorithm,” Master’s thesis, King Saud University, 2014.
[26] T. M. Lowe and S. R. Eddy, “trnascan-se: a program for improved detection of transfer rna genes in genomic sequence,” Nucleic acids research, vol. 25, no. 5, pp. 0955–964, 1997.