Based on the pairwise distances between protein sequences (FASTA 3 tools) the Markov Cluster (MCL) algorithm has been applied to cluster protein sequences. The datasets for the clusters are represented in convenient expandable / collapsable lists *). The cluster view includes links to a cladogram and a table view of homologous protein sequences (see below). For each of the resulting ortholog clusters the conserved sequence domains occurring in the cluster protein sequences are determined (PFAM 33.1).
Cladogram of homologous protein sequences
The protein sequences of a cluster are aligned (Muscle 3.6 multiple sequence alignment), conserved blocks are selected (Gblocks 0.91b), and a distance matrix is calculated (ProtDist from PHYLIP 3.6). The distances are feeded to the neighbour-joining procedure BioNJ (Gascuel, 1997) to depict relationships between member proteins of the cluster. Finally, a cladogram is generated and displayed.
Table of homologous protein sequences
The protein sequences of a cluster are aligned to each other by the Smith-Waterman algorithm (FASTA 3 tools, ssearch) yielding a measure for pairwise distances. A list of related proteins is generated. Both the identity values (green for higher identity) and the coverage values (red to warn of low coverage) are shown.
Multiple alignment of related membrane proteins
This feature performs a multiple protein sequence alignment (Muscle 3.6) on the custom selected proteins. After adding the corresponding consensus TM segment predictions a multiple TM segment alignment is created.
Based on that alignment it is possible to strike a balance of all involved TM predictions. The balance accumulates the scores of all consensus predictions for the proteins selected. The maximal sum is set to 100%. The balance diagram shows all sums above 25%. The center and edge positions for each balance TM segment are calculated by the average positions. The number of sequences in a multiple alignment is limited to 10 sequences .