What can I do with the CNAnalysis web server?
The CNAnalysis web server allows performing flexibility and rigidity analysis of a protein at global and local levels of detail. Accordingly, the output from the server will be global indices characterizing the overall rigidity of a protein and local indices characterizing flexibility and rigidity characteristics at a residue level.
How does the CNAnalysis approach work?
CNAnalysis is a rigidity analysis approach that models biomolecules as a network of sites (atoms) and constraints (covalent bonds and non-covalent interactions). For characterizing the rigidity of such a network, CNAnalysis performs a rigid cluster decomposition, i.e., it decomposes the network into clusters of atoms with no internal degrees of freedom and flexible links in between such clusters. By consecutively removing hydrogen bond constraints from (and, if requested, adding hydrophobic tether constraints to) the network with increasing temperature, CNAnalysis simulates the melting of the protein. Carrying out the rigid cluster decomposition on all such intermediate networks, CNAnalysis identifies one (or multiple) phase transition(s) where the network switches from an overall rigid state to a floppy one. Then weak spots (unfolding nuclei) are identified in the biomolecule from where the unfolding starts. For carrying out the rigid cluster decomposition, CNAnalysis uses the FIRST program. FIRST has been developed by M. Thorpe and coworkers and builds on ideas by D. Jacobs, L. Kuhn, and M. Thorpe. For an introductory read into how FIRST and CNAnalysis works, see Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke, H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P. Comba (ed.), S. 281-299, Wiley-VCH, Weinheim, 2011.
Where do I get more detailed information about the CNA approach and what should I cite when reporting the use of this service?
For general information on CNA, the CNA web server, and the FIRST approach,
please refer to / cite:
- Pfleger, C., Rathi, P.C., Klein D.L., Radestock S., Gohlke H. J. Chem. Inf. Model. 2013,
doi: 10.1021/ci400044m.
- Krüger, D.M., Rathi, P.C., Pfleger, C., Gohlke, H. Nucleic Acids Res. 2013, doi: 10.1093/nar/gkt292.
- Jacobs, D., Rader, A.J., Kuhn, L.A., Thorpe, M.F. Proteins 2001, 44, 150-165.
For information on the global and local indices, please refer to / cite:
- Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122.
For information on weak spots, please refer to / cite:
- Radestock, S., Gohlke, H. Eng. Life Science 2008, 8, 507-522.
- Rathi, P.C., Radestock, S., Gohlke, H. J. Biotechnology 2012, 159, 135-144.
For information on how to link structure, (thermo)stability and function, please refer to / cite:
- Radestock, S., Gohlke, H. Proteins 2011, 79, 1089-1108.
For information on ensemble-based CNA, please refer to / cite:
- Gohlke, H., Kuhn, L.A., Case, D.A. Proteins 2004, 56, 322-327.
- Rathi, P.C., Radestock, S., Gohlke, H. J. Biotechnology 2012, 159, 135-144.
- Pfleger, C., Gohlke, H. Structure 2013, 21, 1725-1734.
What do I need to perform a CNAnalysis run?
You need one (or multiple, see below) PDB file(s) (*.pdb) in standard format or a PDB-ID.
What structure preparations does CNAnalysis do on input?
In the case a PDB-ID is supplied, the protein structure is prepared by fixing incomplete residues, adding hydrogen atoms, and removing all hetero atoms including water molecules. In contrast, for a user-provided PDB file, no such preparation is done by CNAnalysis. Therefore, uploaded PDB file(s) must be conforming to PDB format without any missing atoms, all hydrogen atoms must have been added, and hetero atoms including water molecules are retained. In the case of multi-model PDB structures, only the first model is considered. An uploaded PDB file(s) with disordered atom(s) is not allowed. A guideline on adding hydrogen atoms and on keeping ligands, ions, and water molecules in a protein structure can be found here.
How can I perform a CNAnalysis run?
Upload one or more (as .zip, or .tgz file) PDB file(s) and choose the analysis type accordingly (please see the next FAQ for an explanation of the different analysis types). Selecting an analysis type will automatically fill all further required parameters with default values. If you want to change the default values, please have a look here as to their meaning and allowed ranges of the parameters. Pressing the "Submit" button will start the simulation. A link to the results page will appear immediately. The results page is updating itself every minute providing the status of your job; the results will appear after completion of the calculations. Please note that this may take from minutes to hours depending on the load of the server and the size of the protein, the analysis type, and the chosen parameters. If you provide an (optional) email address, the link to the results page will be sent to this address; please check the junk folder of your email if you do not see the mail in your inbox.
Which analysis type to choose?
The CNAnalysis can be run either on a single network or on an ensemble of networks. The first analysis type performs a thermal unfolding simulation on a single network generated from a single input structure. In contrast, the second and third analyses types are ensemble-based approaches. In the second type, the configurations of the network ensemble are generated from multiple input structures, e.g., from a structural ensemble generated by MD simulations or an NMR ensemble or multiple crystal structures of one biomolecule. Alternatively, in the third type, the configurations of the network ensemble are generated from a single input structure using fuzzy-constraint definitions. The ensemble-based approaches have been developed because rigidity analyses can be sensitive to the input structures. Hence, in general, the ensemble-based approaches tend to give more consistent results with respect to different input structures than the single network approach.
What do the different parameters mean?
Please see here for a detailed description of the parameters.
I got an error message when I try to submit a job. What can I do?
All error messages come along with a description that should help the user to fix the problem. Please also check that your PDB file fulfills the input conditions (see PLEASE NOTE). However, if a problem can not be solved, please do not hesitate to contact support[at]cnanalysis.de.
I got a logfile as a result. Where is the problem?
Please open the logfile and search for the keyword "ERROR". Here you will find a detailed description of the problem. Please also check that your PDB file fulfills the input conditions (see PLEASE NOTE). However, if a problem can not be solved, please do not hesitate to contact support[at]cnanalysis.de.
What cannot be done with the CNAnalysis web server?
The local flexibility and rigidity characteristics of residues of a protein
can be compared within the same protein, with a different conformation of
the same protein, with a mutant, or with a homolog. However, the local
flexibility and rigidity characteristics should not be compared between
evolutionarily too distant proteins as the proteins' architecture or fold
may influence the absolute local flexibility and rigidity characteristics.
The same holds for the global flexibility and rigidity characteristics.
Furthermore, CNAnalysis characterizes the statics of a biomolecule, which
means that the flexibility and rigidity characteristics indicate only which
parts can move. However, how and to what extent a biomolecule's parts do
move cannot be revealed. In order to simulate protein movements, please
refer to the NMSim web server.
Where do I find a description of the results obtained from a CNA run?
Please see below for a detailed desription of the results. The following help sections are also available via the results page, please note the links "Click here for help".
1 Thermal unfolding simulation
CNA models biomolecules as body-and-bar networks. The atoms of the
biomolecules are modeled as bodies, while covalent and noncovalent bonds are
modeled as bars. A covalent bond is modeled as five bars, allowing for the
dihedral rotation about it. Peptide and double bonds are modeled with six
bars, disallowing any bond rotation. Considering that the mechanical
rigidity of a biomolecule is largely determined by non-covalent
interactions, there is also a need to include hydrogen bonds, salt bridges,
and hydrophobic interactions as constraints in the network. Stronger
interactions such as hydrogen bonds (and salt bridges) and hydrophobic
interactions are modeled as five bars and two bars, respectively. Weaker
interactions such as van der Waals interactions are not modeled as
constraints.
CNA carries out the thermal unfolding simulations by gradually removing
noncovalent constraints from the initial network representation. That is,
for a given network state s = f(T), hydrogen bonds (including salt bridges)
with an energy EHB > Ecut,hb are removed from the network. This follows the
idea that stronger hydrogen bonds will break at higher temperatures than
weaker ones. The number of hydrophobic contacts is either kept constant
during the thermal unfolding or increased to treat hydrophobic interactions
in a temperature-dependent manner. Finally, a rigid cluster decomposition is
performed on each constraint network states.
[Text adapted from Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke,
H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P.
Comba (ed.), S. 281-299, Wiley-VCH], click here for details.
2 Global flexibility indices
Global flexibility indices monitor the degree of flexibility and rigidity
within the constraint network at a macroscopic level. During the thermal
unfolding simulation, these global indices are calculated for each network
state s. Global indices allow identification of phase transition points
(points when these indices change sharply during the thermal unfolding
simulation) that relate to the folded-unfolded transitions of proteins.
Plots for all global indices are displayed for a single network-based CNA
run. In the case of an ensemble-based run, a summary of the identified phase
transition points is presented instead. The following global indices are
calculated by the CNA web server:
2.1 Floppy mode density, Φ
Floppy modes F refer to the number of internal independent degrees of
freedom that are associated with dihedral rotations in a network.
Normalization of F by the number of (overall) internal degrees of freedom
associated with the N atoms results in a floppy mode density Φ.
2.2 Mean rigid cluster size, S
Originating from percolation theory, moments of the size distribution of rigid clusters (i.e., the microstructure of the network) can be used to analyze macroscopic properties of constraint networks. In this context, S denotes the mean rigid cluster size with the size of the largest rigid cluster always being excluded from the calculation. This leads to S being zero as long as one rigid cluster dominates the whole network or if all rigid clusters have vanished.
2.3 Rigidity order parameter, P∞ type 1 and type 2
P∞ is another global index originating from percolation theory and is derived from the microstructure of a constraint network. P∞ denotes the fraction of the network belonging to the giant percolating cluster (type 1) or the actual largest rigid cluster (type 2). The giant percolating cluster is defined as the largest rigid cluster present at low temperature values with all constraints in place during the thermal unfolding simulation. As long as the network is in the rigid phase, it is dominated by one rigid cluster and, hence, P∞ is close to one. In the floppy phase, with a vanishing largest rigid cluster, P∞ is zero.
2.4 Cluster configuration entropy, H type 1 and type 2
H was introduced by Andraud et al. (Physica A 1994, 207, 208) as a
morphological descriptor for heterogeneous materials. It is a particularly
useful index for characterizing macroscopic properties of a constraint
network in terms of its microstructure. H has been adapted from Shannon's
information theory and, thus, is a measure of the degree of disorder in the
realization of a given state.
It is defined as a function of the probability (ws) that an atom is part of
a cluster of size s (s-cluster). Where 
with ns being the cluster number normalized by the total number of atoms
(N)
For H type 1, k = 1 which corresponds to the original definition by Andraud
et al. A modified version (H type 2) uses k = 2. Using the modified H, later
phase transition points during the thermal unfolding simulation are
potentiated; these have been found to be related to the thermostability of
proteins. As long as the largest rigid cluster dominates the system, H is
zero because there is only one realization of the system possible. For the
same reason, H is zero if all atoms can move independently. In between, H is
nonzero because of multiple possible realizations of the system associated
with a heterogeneous cluster size distribution.
[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J.
Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.
3 Local indices Local flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a microscopic (residue) level. In general, the local index for a residue indicates the network state s during the thermal unfolding simulation when this residue becomes flexible from being rigid before. For an ensemble-based CNA run, local index values are averaged over the entire ensemble. The following local indices are calculated by the CNA web server:
3.1 Percolation index, pi
pi is a local analog to the rigidity order parameter P∞ in that it monitors the percolation behavior of a biomolecule on a microscopic level. As such, it allows identifying the hierarchical organization of the giant percolating cluster during the thermal unfolding simulation. The index value pi is derived for each covalent bond i between two atoms Ai,{1,2} by the Ecut,hb value during the thermal unfolding simulation at which the bond segregates from the giant percolating cluster. For a Cα atom-based representation of a residue, the lower of the two pi values of the two backbone bonds is taken. pi = 0 then indicates that an atom has never been part of the giant percolating cluster, that is, the atom has always been in a flexible region of the biomolecule. In contrast, the lower pi the longer is a residue part of the giant percolating cluster during the thermal unfolding simulation. The lowest pi thus highlights the most stable subcomponent in the network.
3.2 Rigidity index ri
As a generalization of the percolation index pi, the rigidity index
ri is
defined for each covalent bond i between two atoms
Ai,{1,2} as the Ecut,hb
value during the thermal unfolding simulation at which the bond changes from
rigid to flexible. Phrased differently, this index monitors when a bond
segregates from any rigid cluster of the set of rigid clusters
For a Cα atom-based representation of a residue, the average of the two
ri
values of the two backbone bonds is taken. Accordingly, ri = 0 indicates
that an atom has always been in a flexible region of the biomolecule. In
contrast, the lower ri the longer is a residue part of a rigid cluster
during the thermal unfolding simulation and more rigid the residue is.
3.3 Stability maps
A stability map rcij is a two-dimensional itemization of the rigidity index
ri and is derived by identifying "rigid contacts" between two residues
R{i,j}, which are represented by their Cα atoms. A rigid contact exists if
two residues belong to the same rigid cluster. During the thermal unfolding
simulation, stability maps are then constructed in that, for each residue
pair, Ecut,hb is identified at which a rigid contact between two residues is
lost.
That way, a contact's stability relates to the microscopic stability in the
network and, taken together, the microscopic stabilities of all
residue-residue contacts result in a stability map. Thus, stability maps
denote the distribution of flexibility and rigidity within the system; they
identify regions that are flexibly or rigidly correlated across the
structure.
[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J.
Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.
4 Phase transition points Phase transition points are identified based on the global indices during the thermal unfolding simulation; these points correspond to the Ecut,hb when a global index changes rapidly indicating a shift of the network from being largely rigid to being largely flexible. For a protein, such a shift can be related to the folded-unfolded transition. For this, Ecut,hb is converted to a temperature scale using a linear relation: T = -20K / (kcal mol-1) x Ecut,hb + 300K. Phase transition points are identified on four of the global indices -P∞ (type 1 and 2) and H (type 1 and 2). However, there may be cases where such a transition cannot be identified; these include a lack of a sharp transition or the presence of multiple comparable transitions. For an ensemble-based CNA, a summary (mean and standard error of the mean) of these transition points is presented.
5 Unfolding nuclei Unfolding nuclei or weak spots are residues that belong to the giant rigid cluster until the phase transition point and segregate from the giant rigid cluster at this point during the thermal unfolding simulation. From the point of view of protein stability, unfolding of the giant rigid cluster begins from these residues and, hence, these residues are termed weak spots. It is more likely that mutating these residues makes a protein more thermostable. In case of an ensemble-based CNA run, for all residues the frequency of being a weak spot in all individual structures of the ensemble is presented.
6 Homo-multimeric proteins or proteins with internal symmetry
CNA considers each residues of a protein individually, that is, it does not consider related residues from different chains of homo-multimeric proteins (or symmetry-related residues) as identical and, hence, does not average out index values for such related residues. Accordingly, the local index values for related residues can differ because of differences in the local environment of these residues (=> rigidity index, stability maps) or because of the very definition of the index, which considers the whole homo-multimer/symmetric protein as one unit and monitors the thermal unfolding simulation of this whole unit (=> percolation index). In the latter case, it may be that residues of one of the chains are identified as weak spots (because unfolding starts here) whereas those of the other chain(s) do not harbor any weak spots character.