Enhanced fold recognition using efficient short fragment clustering

Evgeny Krissinel


The main structure aligner in the CCP4 Software Suite, SSM (Secondary Structure Matching) has a limited applicability on the intermediate stages of the structure solution process, when the secondary structure cannot be reliably computed due to structural incompleteness or a fragmented mainchain. In this study, we describe a new algorithm for the alignment and comparison of protein structures in CCP4, which was designed to overcome SSM's limitations but retain its quality and speed. The new algorithm, named GESAMT (General Efficient Structural Alignment of Macromolecular Targets), employs the old idea of deriving the global structure similarity from a promising set of locally similar short fragments, but uses a few technical solutions that make it considerably faster. A comparative sensitivity and selectivity analysis revealed an unexpected significant improvement in the fold recognition properties of the new algorithm, which also makes it useful for applications in the structural bioinformatics domain. The new tool is included in the CCP4 Software Suite starting from version 6.3.


structure alignment; protein fold recognition; structure superposition; GESAMT


Brenner SE, Chothia C & Hubbard TJP 1998 Assessing sequence comparison methods with reliable structurally-identified distant evolutionary relationships. Proc Natl Acad Sci 95 6073-6078.

Diamond R 1992 On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Sci 1 1279-1287.

Friedberg I, Harder T, Kolodny R, Sitbon E, Li Z & Godzik A 2007 Using an alignment of fragmented strings for comparing protein structures. Bioinformatics 23 219-224.

Gerstein M & Levitt M 1996 Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. In Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology, pp 59-67. Menlo Park, CA: AAAI Press.

Guerra C & Istrail S 2000 Mathematical methods for protein structure analysis and design: Advanced Lectures. Berlin: Springer Verlag.

Kabsch W 1976 A solution of the best rotation to relate two sets of vectors. Acta Crystallogr A32 922-923.

Kolodny R, Koehl P & Levitt M 2005 Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346 1173-1188.

Krissinel E & Henrick K 2004 Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D60 2256-2268.

Krissinel E & Henrick K 2005 Multiple alignment of protein structures in three dimensions. In Lecture Notes In Bioinformatics. First International Symposium, CompLife 2005, pp 67-78. Eds MR Berthold, R Glen, K Diederichs, O Kohlbacher & I Fischer. Berlin: Springer-Verlag.

Mayr G, Domingues FS & Lackner P 2007 Comparative Analysis of Protein Structure Alignments. BMC Struct Biol 7 50-65.

Micheletti C & Orland H 2009 MISTRAL: a tool for energy-based multiple structural alignment of proteins. Bioinformatics 25 2663-2669.

Murzin AG, Brenner SE, Hubbard T & Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247 536-540.

Shatsky M, Nussinov R & Wolfson HJ 2004 A method for simultaneous alignment of multiple protein structures. Protein Struct Fun Bioinform 56 143-156.

Shindyalov IN & Bourne PE 1998 Protein Structure Alignment by Incremental Combinatorial Extension of the Optimum Path. Protein Eng 11 739-747.

Smith TF & Waterman MS 1981 Identification of common molecular subsequences. J Mol Biol 147 195-197.

Vagin A & Teplyakov A 1997 MOLREP: an Automated Program for Molecular Replacement. J Appl Crystallogr 30 1022-1025.

Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR, Keegan RM, Krissinel EB, Leslie AGW, McCoy A, McNicholas SJ, Murshudov GN, Pannu NS, Potterton EA, Powell HR, Read RJ, Vagin A & Wilson KSW 2011 Overview of the CCP4 suite and current developments. Acta Crystallogr D67 235-242.

Yang AS & Honig B 2000 An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J Mol Biol 301 665-678.

Ye YZ & Godzik A 2003 Flexible structure alignment by changing aligned fragment pairs allowing twists. Bioinformatics 19 246-255.

Full Text: PDF


  • There are currently no refbacks.

Copyright © 2017 Lorem Ipsum Press