FAQ

What can I do with the CNAnalysis web server?

The CNAnalysis web server allows performing flexibility and rigidity analysis of a protein at global and local levels of detail. Accordingly, the output from the server will be global indices characterizing the overall rigidity of a protein and local indices characterizing flexibility and rigidity characteristics at a residue level.

How does the CNAnalysis approach work?

CNAnalysis is a rigidity analysis approach that models biomolecules as a network of sites (atoms) and constraints (covalent bonds and non-covalent interactions). For characterizing the rigidity of such a network, CNAnalysis performs a rigid cluster decomposition, i.e., it decomposes the network into clusters of atoms with no internal degrees of freedom and flexible links in between such clusters. By consecutively removing hydrogen bond constraints from (and, if requested, adding hydrophobic tether constraints to) the network with increasing temperature, CNAnalysis simulates the melting of the protein. Carrying out the rigid cluster decomposition on all such intermediate networks, CNAnalysis identifies one (or multiple) phase transition(s) where the network switches from an overall rigid state to a floppy one. Then weak spots (unfolding nuclei) are identified in the biomolecule from where the unfolding starts. For carrying out the rigid cluster decomposition, CNAnalysis uses the FIRST program. FIRST has been developed by M. Thorpe and coworkers and builds on ideas by D. Jacobs, L. Kuhn, and M. Thorpe. For an introductory read into how FIRST and CNAnalysis works, see Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke, H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P. Comba (ed.), S. 281-299, Wiley-VCH, Weinheim, 2011.

Where do I get more detailed information about the CNA approach and what should I cite when reporting the use of this service?

For general information on CNA, the CNA web server, and the FIRST approach, please refer to / cite:
- Pfleger, C., Rathi, P.C., Klein D.L., Radestock S., Gohlke H. J. Chem. Inf. Model. 2013, doi: 10.1021/ci400044m.
- Krüger, D.M., Rathi, P.C., Pfleger, C., Gohlke, H. Nucleic Acids Res. 2013, doi: 10.1093/nar/gkt292.
- Jacobs, D., Rader, A.J., Kuhn, L.A., Thorpe, M.F. Proteins 2001, 44, 150-165.

For information on the global and local indices, please refer to / cite:
- Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122.

For information on weak spots, please refer to / cite:
- Radestock, S., Gohlke, H. Eng. Life Science 2008, 8, 507-522.
- Rathi, P.C., Radestock, S., Gohlke, H. J. Biotechnology 2012, 159, 135-144.

For information on how to link structure, (thermo)stability and function, please refer to / cite:
- Radestock, S., Gohlke, H. Proteins 2011, 79, 1089-1108.

For information on ensemble-based CNA, please refer to / cite:
- Gohlke, H., Kuhn, L.A., Case, D.A. Proteins 2004, 56, 322-327.
- Rathi, P.C., Radestock, S., Gohlke, H. J. Biotechnology 2012, 159, 135-144.
- Pfleger, C., Gohlke, H. Structure 2013, 21, 1725-1734.

What do I need to perform a CNAnalysis run?

You need one (or multiple, see below) PDB file(s) (*.pdb) in standard format or a PDB-ID.

What structure preparations does CNAnalysis do on input?

In the case a PDB-ID is supplied, the protein structure is prepared by fixing incomplete residues, adding hydrogen atoms, and removing all hetero atoms including water molecules. In contrast, for a user-provided PDB file, no such preparation is done by CNAnalysis. Therefore, uploaded PDB file(s) must be conforming to PDB format without any missing atoms, all hydrogen atoms must have been added, and hetero atoms including water molecules are retained. In the case of multi-model PDB structures, only the first model is considered. An uploaded PDB file(s) with disordered atom(s) is not allowed. A guideline on adding hydrogen atoms and on keeping ligands, ions, and water molecules in a protein structure can be found here.

How can I perform a CNAnalysis run?

Upload one or more (as .zip, or .tgz file) PDB file(s) and choose the analysis type accordingly (please see the next FAQ for an explanation of the different analysis types). Selecting an analysis type will automatically fill all further required parameters with default values. If you want to change the default values, please have a look here as to their meaning and allowed ranges of the parameters. Pressing the "Submit" button will start the simulation. A link to the results page will appear immediately. The results page is updating itself every minute providing the status of your job; the results will appear after completion of the calculations. Please note that this may take from minutes to hours depending on the load of the server and the size of the protein, the analysis type, and the chosen parameters. If you provide an (optional) email address, the link to the results page will be sent to this address; please check the junk folder of your email if you do not see the mail in your inbox.

Which analysis type to choose?

The CNAnalysis can be run either on a single network or on an ensemble of networks. The first analysis type performs a thermal unfolding simulation on a single network generated from a single input structure. In contrast, the second and third analyses types are ensemble-based approaches. In the second type, the configurations of the network ensemble are generated from multiple input structures, e.g., from a structural ensemble generated by MD simulations or an NMR ensemble or multiple crystal structures of one biomolecule. Alternatively, in the third type, the configurations of the network ensemble are generated from a single input structure using fuzzy-constraint definitions. The ensemble-based approaches have been developed because rigidity analyses can be sensitive to the input structures. Hence, in general, the ensemble-based approaches tend to give more consistent results with respect to different input structures than the single network approach.

What do the different parameters mean?

Please see here for a detailed description of the parameters.

I got an error message when I try to submit a job. What can I do?

All error messages come along with a description that should help the user to fix the problem. Please also check that your PDB file fulfills the input conditions (see PLEASE NOTE). However, if a problem can not be solved, please do not hesitate to contact support[at]cnanalysis.de.

I got a logfile as a result. Where is the problem?

Please open the logfile and search for the keyword "ERROR". Here you will find a detailed description of the problem. Please also check that your PDB file fulfills the input conditions (see PLEASE NOTE). However, if a problem can not be solved, please do not hesitate to contact support[at]cnanalysis.de.

What cannot be done with the CNAnalysis web server?

The local flexibility and rigidity characteristics of residues of a protein can be compared within the same protein, with a different conformation of the same protein, with a mutant, or with a homolog. However, the local flexibility and rigidity characteristics should not be compared between evolutionarily too distant proteins as the proteins' architecture or fold may influence the absolute local flexibility and rigidity characteristics. The same holds for the global flexibility and rigidity characteristics. Furthermore, CNAnalysis characterizes the statics of a biomolecule, which means that the flexibility and rigidity characteristics indicate only which parts can move. However, how and to what extent a biomolecule's parts do move cannot be revealed. In order to simulate protein movements, please refer to the NMSim web server.

Where do I find a description of the results obtained from a CNA run?

Please see below for a detailed desription of the results. The following help sections are also available via the results page, please note the links "Click here for help".

1 Thermal unfolding simulation

CNA models biomolecules as body-and-bar networks. The atoms of the biomolecules are modeled as bodies, while covalent and noncovalent bonds are modeled as bars. A covalent bond is modeled as five bars, allowing for the dihedral rotation about it. Peptide and double bonds are modeled with six bars, disallowing any bond rotation. Considering that the mechanical rigidity of a biomolecule is largely determined by non-covalent interactions, there is also a need to include hydrogen bonds, salt bridges, and hydrophobic interactions as constraints in the network. Stronger interactions such as hydrogen bonds (and salt bridges) and hydrophobic interactions are modeled as five bars and two bars, respectively. Weaker interactions such as van der Waals interactions are not modeled as constraints.

CNA carries out the thermal unfolding simulations by gradually removing noncovalent constraints from the initial network representation. That is, for a given network state s = f(T), hydrogen bonds (including salt bridges) with an energy E_HB > E_cut,hb are removed from the network. This follows the idea that stronger hydrogen bonds will break at higher temperatures than weaker ones. The number of hydrophobic contacts is either kept constant during the thermal unfolding or increased to treat hydrophobic interactions in a temperature-dependent manner. Finally, a rigid cluster decomposition is performed on each constraint network states.

[Text adapted from Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke, H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P. Comba (ed.), S. 281-299, Wiley-VCH], click here for details.

2 Global flexibility indices

Global flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a macroscopic level. During the thermal unfolding simulation, these global indices are calculated for each network state s. Global indices allow identification of phase transition points (points when these indices change sharply during the thermal unfolding simulation) that relate to the folded-unfolded transitions of proteins. Plots for all global indices are displayed for a single network-based CNA run. In the case of an ensemble-based run, a summary of the identified phase transition points is presented instead. The following global indices are calculated by the CNA web server:

2.1 Floppy mode density, Φ

Floppy modes F refer to the number of internal independent degrees of freedom that are associated with dihedral rotations in a network. Normalization of F by the number of (overall) internal degrees of freedom associated with the N atoms results in a floppy mode density Φ.

2.2 Mean rigid cluster size, S

Originating from percolation theory, moments of the size distribution of rigid clusters (i.e., the microstructure of the network) can be used to analyze macroscopic properties of constraint networks. In this context, S denotes the mean rigid cluster size with the size of the largest rigid cluster always being excluded from the calculation. This leads to S being zero as long as one rigid cluster dominates the whole network or if all rigid clusters have vanished.

2.3 Rigidity order parameter, P_∞ type 1 and type 2

P_∞ is another global index originating from percolation theory and is derived from the microstructure of a constraint network. P_∞ denotes the fraction of the network belonging to the giant percolating cluster (type 1) or the actual largest rigid cluster (type 2). The giant percolating cluster is defined as the largest rigid cluster present at low temperature values with all constraints in place during the thermal unfolding simulation. As long as the network is in the rigid phase, it is dominated by one rigid cluster and, hence, P_∞ is close to one. In the floppy phase, with a vanishing largest rigid cluster, P_∞ is zero.

2.4 Cluster configuration entropy, H type 1 and type 2

H was introduced by Andraud et al. (Physica A 1994, 207, 208) as a morphological descriptor for heterogeneous materials. It is a particularly useful index for characterizing macroscopic properties of a constraint network in terms of its microstructure. H has been adapted from Shannon's information theory and, thus, is a measure of the degree of disorder in the realization of a given state.

It is defined as a function of the probability (w_s) that an atom is part of a cluster of size s (s-cluster). Where

with n_s being the cluster number normalized by the total number of atoms (N)

For H type 1, k = 1 which corresponds to the original definition by Andraud et al. A modified version (H type 2) uses k = 2. Using the modified H, later phase transition points during the thermal unfolding simulation are potentiated; these have been found to be related to the thermostability of proteins. As long as the largest rigid cluster dominates the system, H is zero because there is only one realization of the system possible. For the same reason, H is zero if all atoms can move independently. In between, H is nonzero because of multiple possible realizations of the system associated with a heterogeneous cluster size distribution.

[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.

3 Local indices

Local flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a microscopic (residue) level. In general, the local index for a residue indicates the network state s during the thermal unfolding simulation when this residue becomes flexible from being rigid before. For an ensemble-based CNA run, local index values are averaged over the entire ensemble. The following local indices are calculated by the CNA web server:

3.1 Percolation index, p_i

p_i is a local analog to the rigidity order parameter P_∞ in that it monitors the percolation behavior of a biomolecule on a microscopic level. As such, it allows identifying the hierarchical organization of the giant percolating cluster during the thermal unfolding simulation. The index value p_i is derived for each covalent bond i between two atoms A_i,{1,2} by the E_cut,hb value during the thermal unfolding simulation at which the bond segregates from the giant percolating cluster. For a C_α atom-based representation of a residue, the lower of the two p_i values of the two backbone bonds is taken. p_i = 0 then indicates that an atom has never been part of the giant percolating cluster, that is, the atom has always been in a flexible region of the biomolecule. In contrast, the lower p_i the longer is a residue part of the giant percolating cluster during the thermal unfolding simulation. The lowest p_i thus highlights the most stable subcomponent in the network.

3.2 Rigidity index r_i

As a generalization of the percolation index p_i, the rigidity index r_i is defined for each covalent bond i between two atoms A_i,{1,2} as the Ecut,hb value during the thermal unfolding simulation at which the bond changes from rigid to flexible. Phrased differently, this index monitors when a bond segregates from any rigid cluster of the set of rigid clusters

For a C_α atom-based representation of a residue, the average of the two r_i values of the two backbone bonds is taken. Accordingly, r_i = 0 indicates that an atom has always been in a flexible region of the biomolecule. In contrast, the lower ri the longer is a residue part of a rigid cluster during the thermal unfolding simulation and more rigid the residue is.

3.3 Stability maps

A stability map rc_ij is a two-dimensional itemization of the rigidity index r_i and is derived by identifying "rigid contacts" between two residues R_{i,j}, which are represented by their C_α atoms. A rigid contact exists if two residues belong to the same rigid cluster. During the thermal unfolding simulation, stability maps are then constructed in that, for each residue pair, E_cut,hb is identified at which a rigid contact between two residues is lost.

That way, a contact's stability relates to the microscopic stability in the network and, taken together, the microscopic stabilities of all residue-residue contacts result in a stability map. Thus, stability maps denote the distribution of flexibility and rigidity within the system; they identify regions that are flexibly or rigidly correlated across the structure.

[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.

4 Phase transition points

Phase transition points are identified based on the global indices during the thermal unfolding simulation; these points correspond to the E_cut,hb when a global index changes rapidly indicating a shift of the network from being largely rigid to being largely flexible. For a protein, such a shift can be related to the folded-unfolded transition. For this, E_cut,hb is converted to a temperature scale using a linear relation: T = -20K / (kcal mol^-1) x E_cut,hb + 300K. Phase transition points are identified on four of the global indices -P_∞ (type 1 and 2) and H (type 1 and 2). However, there may be cases where such a transition cannot be identified; these include a lack of a sharp transition or the presence of multiple comparable transitions. For an ensemble-based CNA, a summary (mean and standard error of the mean) of these transition points is presented.

5 Unfolding nuclei

Unfolding nuclei or weak spots are residues that belong to the giant rigid cluster until the phase transition point and segregate from the giant rigid cluster at this point during the thermal unfolding simulation. From the point of view of protein stability, unfolding of the giant rigid cluster begins from these residues and, hence, these residues are termed weak spots. It is more likely that mutating these residues makes a protein more thermostable. In case of an ensemble-based CNA run, for all residues the frequency of being a weak spot in all individual structures of the ensemble is presented.

6 Homo-multimeric proteins or proteins with internal symmetry

CNA considers each residues of a protein individually, that is, it does not consider related residues from different chains of homo-multimeric proteins (or symmetry-related residues) as identical and, hence, does not average out index values for such related residues. Accordingly, the local index values for related residues can differ because of differences in the local environment of these residues (=> rigidity index, stability maps) or because of the very definition of the index, which considers the whole homo-multimer/symmetric protein as one unit and monitors the thermal unfolding simulation of this whole unit (=> percolation index). In the latter case, it may be that residues of one of the chains are identified as weak spots (because unfolding starts here) whereas those of the other chain(s) do not harbor any weak spots character.