Description of the results obtained from a CNA run

CNA computes several global and local indices while carrying out a thermal unfolding simulation of a protein. From the global indices, phase transition points and structural weak spots (unfolding nuclei) are identified. The local indices are useful in linking flexibility and function and to understand the impact of ligand binding on protein flexibility. A brief description of the thermal unfolding simulation is presented here, followed by the description of indices, phase transition points, and weak spots.

1 Thermal unfolding simulation

CNA models biomolecules as body-and-bar networks. The atoms of the biomolecules are modeled as bodies, while covalent and noncovalent bonds are modeled as bars. A covalent bond is modeled as five bars, allowing for the dihedral rotation about it. Peptide and double bonds are modeled with six bars, disallowing any bond rotation. Considering that the mechanical rigidity of a biomolecule is largely determined by non-covalent interactions, there is also a need to include hydrogen bonds, salt bridges, and hydrophobic interactions as constraints in the network. Stronger interactions such as hydrogen bonds (and salt bridges) and hydrophobic interactions are modeled as five bars and two bars, respectively. Weaker interactions such as van der Waals interactions are not modeled as constraints.

CNA carries out the thermal unfolding simulations by gradually removing noncovalent constraints from the initial network representation. That is, for a given network state s = f(T), hydrogen bonds (including salt bridges) with an energy E_HB > E_cut,hb are removed from the network. This follows the idea that stronger hydrogen bonds will break at higher temperatures than weaker ones. The number of hydrophobic contacts is either kept constant during the thermal unfolding or increased to treat hydrophobic interactions in a temperature-dependent manner. Finally, a rigid cluster decomposition is performed on each constraint network states.

[Text adapted from Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke, H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P. Comba (ed.), S. 281-299, Wiley-VCH], click here for details.

2 Global flexibility indices

Global flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a macroscopic level. During the thermal unfolding simulation, these global indices are calculated for each network state s. Global indices allow identification of phase transition points (points when these indices change sharply during the thermal unfolding simulation) that relate to the folded-unfolded transitions of proteins. Plots for all global indices are displayed for a single network-based CNA run. In the case of an ensemble-based run, a summary of the identified phase transition points is presented instead. The following global indices are calculated by the CNA web server:

2.1 Floppy mode density, Φ

Floppy modes F refer to the number of internal independent degrees of freedom that are associated with dihedral rotations in a network. Normalization of F by the number of (overall) internal degrees of freedom associated with the N atoms results in a floppy mode density Φ.

2.2 Mean rigid cluster size, S

Originating from percolation theory, moments of the size distribution of rigid clusters (i.e., the microstructure of the network) can be used to analyze macroscopic properties of constraint networks. In this context, S denotes the mean rigid cluster size with the size of the largest rigid cluster always being excluded from the calculation. This leads to S being zero as long as one rigid cluster dominates the whole network or if all rigid clusters have vanished.

2.3 Rigidity order parameter, P_∞ type 1 and type 2

P_∞ is another global index originating from percolation theory and is derived from the microstructure of a constraint network. P_∞ denotes the fraction of the network belonging to the giant percolating cluster (type 1) or the actual largest rigid cluster (type 2). The giant percolating cluster is defined as the largest rigid cluster present at low temperature values with all constraints in place during the thermal unfolding simulation. As long as the network is in the rigid phase, it is dominated by one rigid cluster and, hence, P_∞ is close to one. In the floppy phase, with a vanishing largest rigid cluster, P_∞ is zero.

2.4 Cluster configuration entropy, H type 1 and type 2

H was introduced by Andraud et al. (Physica A 1994, 207, 208) as a morphological descriptor for heterogeneous materials. It is a particularly useful index for characterizing macroscopic properties of a constraint network in terms of its microstructure. H has been adapted from Shannon's information theory and, thus, is a measure of the degree of disorder in the realization of a given state.

It is defined as a function of the probability (w_s) that an atom is part of a cluster of size s (s-cluster). Where

with n_s being the cluster number normalized by the total number of atoms (N)

For H type 1, k = 1 which corresponds to the original definition by Andraud et al. A modified version (H type 2) uses k = 2. Using the modified H, later phase transition points during the thermal unfolding simulation are potentiated; these have been found to be related to the thermostability of proteins. As long as the largest rigid cluster dominates the system, H is zero because there is only one realization of the system possible. For the same reason, H is zero if all atoms can move independently. In between, H is nonzero because of multiple possible realizations of the system associated with a heterogeneous cluster size distribution.

[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.

3 Local indices

Local flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a microscopic (residue) level. In general, the local index for a residue indicates the network state s during the thermal unfolding simulation when this residue becomes flexible from being rigid before. For an ensemble-based CNA run, local index values are averaged over the entire ensemble. The following local indices are calculated by the CNA web server:

3.1 Percolation index, p_i

p_i is a local analog to the rigidity order parameter P_∞ in that it monitors the percolation behavior of a biomolecule on a microscopic level. As such, it allows identifying the hierarchical organization of the giant percolating cluster during the thermal unfolding simulation. The index value p_i is derived for each covalent bond i between two atoms A_i,{1,2} by the E_cut,hb value during the thermal unfolding simulation at which the bond segregates from the giant percolating cluster. For a C_α atom-based representation of a residue, the lower of the two p_i values of the two backbone bonds is taken. p_i = 0 then indicates that an atom has never been part of the giant percolating cluster, that is, the atom has always been in a flexible region of the biomolecule. In contrast, the lower p_i the longer is a residue part of the giant percolating cluster during the thermal unfolding simulation. The lowest p_i thus highlights the most stable subcomponent in the network.

3.2 Rigidity index r_i

As a generalization of the percolation index p_i, the rigidity index r_i is defined for each covalent bond i between two atoms A_i,{1,2} as the Ecut,hb value during the thermal unfolding simulation at which the bond changes from rigid to flexible. Phrased differently, this index monitors when a bond segregates from any rigid cluster of the set of rigid clusters

For a C_α atom-based representation of a residue, the average of the two r_i values of the two backbone bonds is taken. Accordingly, r_i = 0 indicates that an atom has always been in a flexible region of the biomolecule. In contrast, the lower r_i the longer is a residue part of a rigid cluster during the thermal unfolding simulation and the more rigid the residue is.

3.3 Stability maps

A stability map rc_ij is a two-dimensional itemization of the rigidity index r_i and is derived by identifying "rigid contacts" between two residues R_{i,j}, which are represented by their C_α atoms. A rigid contact exists if two residues belong to the same rigid cluster. During the thermal unfolding simulation, stability maps are then constructed in that, for each residue pair, E_cut,hb is identified at which a rigid contact between two residues is lost.

That way, a contact's stability relates to the microscopic stability in the network and, taken together, the microscopic stabilities of all residue-residue contacts result in a stability map. Thus, stability maps denote the distribution of flexibility and rigidity within the system; they identify regions that are flexibly or rigidly correlated across the structure.

[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J. Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.

4 Phase transition points

Phase transition points are identified based on the global indices during the thermal unfolding simulation; these points correspond to the E_cut,hb when a global index changes rapidly indicating a shift of the network from being largely rigid to being largely flexible. For a protein, such a shift can be related to the folded-unfolded transition. For this, E_cut,hb is converted to a temperature scale using a linear relation: T = -20K / (kcal mol^-1) x E_cut,hb + 300K. Phase transition points are identified on four of the global indices -P_∞ (type 1 and 2) and H (type 1 and 2). However, there may be cases where such a transition cannot be identified; these include a lack of a sharp transition or the presence of multiple comparable transitions. For an ensemble-based CNA, a summary (mean and standard error of the mean) of these transition points is presented.

5 Unfolding nuclei

Unfolding nuclei or weak spots are residues that belong to the giant rigid cluster until the phase transition point and segregate from the giant rigid cluster at this point during the thermal unfolding simulation. From the point of view of protein stability, unfolding of the giant rigid cluster begins from these residues and, hence, these residues are termed weak spots. It is more likely that mutating these residues makes a protein more thermostable. In case of an ensemble-based CNA run, for all residues the frequency of being a weak spot in all individual structures of the ensemble is presented.

6 Homo-multimeric proteins or proteins with internal symmetry

CNA considers each residues of a protein individually, that is, it does not consider related residues from different chains of homo-multimeric proteins (or symmetry-related residues) as identical and, hence, does not average out index values for such related residues. Accordingly, the local index values for related residues can differ because of differences in the local environment of these residues (=> rigidity index, stability maps) or because of the very definition of the index, which considers the whole homo-multimer/symmetric protein as one unit and monitors the thermal unfolding simulation of this whole unit (=> percolation index). In the latter case, it may be that residues of one of the chains are identified as weak spots (because unfolding starts here) whereas those of the other chain(s) do not harbor any weak spots character.