CNA computes several global and local indices while carrying out a thermal unfolding simulation of a protein. From the global indices, phase transition points and structural weak spots (unfolding nuclei) are identified. The local indices are useful in linking flexibility and function and to understand the impact of ligand binding on protein flexibility. A brief description of the thermal unfolding simulation is presented here, followed by the description of indices, phase transition points, and weak spots.
1 Thermal unfolding simulation
CNA models biomolecules as body-and-bar networks. The atoms of the
biomolecules are modeled as bodies, while covalent and noncovalent bonds are
modeled as bars. A covalent bond is modeled as five bars, allowing for the
dihedral rotation about it. Peptide and double bonds are modeled with six
bars, disallowing any bond rotation. Considering that the mechanical
rigidity of a biomolecule is largely determined by non-covalent
interactions, there is also a need to include hydrogen bonds, salt bridges,
and hydrophobic interactions as constraints in the network. Stronger
interactions such as hydrogen bonds (and salt bridges) and hydrophobic
interactions are modeled as five bars and two bars, respectively. Weaker
interactions such as van der Waals interactions are not modeled as
constraints.
CNA carries out the thermal unfolding simulations by gradually removing
noncovalent constraints from the initial network representation. That is,
for a given network state s = f(T), hydrogen bonds (including salt bridges)
with an energy EHB > Ecut,hb are removed from the network. This follows the
idea that stronger hydrogen bonds will break at higher temperatures than
weaker ones. The number of hydrophobic contacts is either kept constant
during the thermal unfolding or increased to treat hydrophobic interactions
in a temperature-dependent manner. Finally, a rigid cluster decomposition is
performed on each constraint network states.
[Text adapted from Rathi, P.C., Pfleger, C., Fulle, S., Klein, D.L., Gohlke,
H. Statics of biomacromolecules. in: "Modeling of Molecular Properties", P.
Comba (ed.), S. 281-299, Wiley-VCH], click here for details.
2 Global flexibility indices
Global flexibility indices monitor the degree of flexibility and rigidity
within the constraint network at a macroscopic level. During the thermal
unfolding simulation, these global indices are calculated for each network
state s. Global indices allow identification of phase transition points
(points when these indices change sharply during the thermal unfolding
simulation) that relate to the folded-unfolded transitions of proteins.
Plots for all global indices are displayed for a single network-based CNA
run. In the case of an ensemble-based run, a summary of the identified phase
transition points is presented instead. The following global indices are
calculated by the CNA web server:
2.1 Floppy mode density, Φ
Floppy modes F refer to the number of internal independent degrees of
freedom that are associated with dihedral rotations in a network.
Normalization of F by the number of (overall) internal degrees of freedom
associated with the N atoms results in a floppy mode density Φ.
2.2 Mean rigid cluster size, S
Originating from percolation theory, moments of the size distribution of rigid clusters (i.e., the microstructure of the network) can be used to analyze macroscopic properties of constraint networks. In this context, S denotes the mean rigid cluster size with the size of the largest rigid cluster always being excluded from the calculation. This leads to S being zero as long as one rigid cluster dominates the whole network or if all rigid clusters have vanished.
2.3 Rigidity order parameter, P∞ type 1 and type 2
P∞ is another global index originating from percolation theory and is derived from the microstructure of a constraint network. P∞ denotes the fraction of the network belonging to the giant percolating cluster (type 1) or the actual largest rigid cluster (type 2). The giant percolating cluster is defined as the largest rigid cluster present at low temperature values with all constraints in place during the thermal unfolding simulation. As long as the network is in the rigid phase, it is dominated by one rigid cluster and, hence, P∞ is close to one. In the floppy phase, with a vanishing largest rigid cluster, P∞ is zero.
2.4 Cluster configuration entropy, H type 1 and type 2
H was introduced by Andraud et al. (Physica A 1994, 207, 208) as a
morphological descriptor for heterogeneous materials. It is a particularly
useful index for characterizing macroscopic properties of a constraint
network in terms of its microstructure. H has been adapted from Shannon's
information theory and, thus, is a measure of the degree of disorder in the
realization of a given state.
It is defined as a function of the probability (ws) that an atom is part of
a cluster of size s (s-cluster). Where
with ns being the cluster number normalized by the total number of atoms
(N)
For H type 1, k = 1 which corresponds to the original definition by Andraud
et al. A modified version (H type 2) uses k = 2. Using the modified H, later
phase transition points during the thermal unfolding simulation are
potentiated; these have been found to be related to the thermostability of
proteins. As long as the largest rigid cluster dominates the system, H is
zero because there is only one realization of the system possible. For the
same reason, H is zero if all atoms can move independently. In between, H is
nonzero because of multiple possible realizations of the system associated
with a heterogeneous cluster size distribution.
[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J.
Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.
3 Local indices Local flexibility indices monitor the degree of flexibility and rigidity within the constraint network at a microscopic (residue) level. In general, the local index for a residue indicates the network state s during the thermal unfolding simulation when this residue becomes flexible from being rigid before. For an ensemble-based CNA run, local index values are averaged over the entire ensemble. The following local indices are calculated by the CNA web server:
3.1 Percolation index, pi
pi is a local analog to the rigidity order parameter P∞ in that it monitors the percolation behavior of a biomolecule on a microscopic level. As such, it allows identifying the hierarchical organization of the giant percolating cluster during the thermal unfolding simulation. The index value pi is derived for each covalent bond i between two atoms Ai,{1,2} by the Ecut,hb value during the thermal unfolding simulation at which the bond segregates from the giant percolating cluster. For a Cα atom-based representation of a residue, the lower of the two pi values of the two backbone bonds is taken. pi = 0 then indicates that an atom has never been part of the giant percolating cluster, that is, the atom has always been in a flexible region of the biomolecule. In contrast, the lower pi the longer is a residue part of the giant percolating cluster during the thermal unfolding simulation. The lowest pi thus highlights the most stable subcomponent in the network.
3.2 Rigidity index ri
As a generalization of the percolation index pi, the rigidity index
ri is
defined for each covalent bond i between two atoms
Ai,{1,2} as the Ecut,hb
value during the thermal unfolding simulation at which the bond changes from
rigid to flexible. Phrased differently, this index monitors when a bond
segregates from any rigid cluster of the set of rigid clusters
For a Cα atom-based representation of a residue, the average of the two
ri
values of the two backbone bonds is taken. Accordingly, ri = 0 indicates
that an atom has always been in a flexible region of the biomolecule. In
contrast, the lower ri the longer is a residue part of a rigid cluster
during the thermal unfolding simulation and the more rigid the residue is.
3.3 Stability maps
A stability map rcij is a two-dimensional itemization of the rigidity index
ri and is derived by identifying "rigid contacts" between two residues
R{i,j}, which are represented by their Cα atoms. A rigid contact exists if
two residues belong to the same rigid cluster. During the thermal unfolding
simulation, stability maps are then constructed in that, for each residue
pair, Ecut,hb is identified at which a rigid contact between two residues is
lost.
That way, a contact's stability relates to the microscopic stability in the
network and, taken together, the microscopic stabilities of all
residue-residue contacts result in a stability map. Thus, stability maps
denote the distribution of flexibility and rigidity within the system; they
identify regions that are flexibly or rigidly correlated across the
structure.
[Text adapted from Pfleger, C., Radestock, S., Schmidt, E., Gohlke, H. J.
Comput. Chem. 2012, DOI: 10.1002/jcc.23122], click here for details.
4 Phase transition points Phase transition points are identified based on the global indices during the thermal unfolding simulation; these points correspond to the Ecut,hb when a global index changes rapidly indicating a shift of the network from being largely rigid to being largely flexible. For a protein, such a shift can be related to the folded-unfolded transition. For this, Ecut,hb is converted to a temperature scale using a linear relation: T = -20K / (kcal mol-1) x Ecut,hb + 300K. Phase transition points are identified on four of the global indices -P∞ (type 1 and 2) and H (type 1 and 2). However, there may be cases where such a transition cannot be identified; these include a lack of a sharp transition or the presence of multiple comparable transitions. For an ensemble-based CNA, a summary (mean and standard error of the mean) of these transition points is presented.
5 Unfolding nuclei Unfolding nuclei or weak spots are residues that belong to the giant rigid cluster until the phase transition point and segregate from the giant rigid cluster at this point during the thermal unfolding simulation. From the point of view of protein stability, unfolding of the giant rigid cluster begins from these residues and, hence, these residues are termed weak spots. It is more likely that mutating these residues makes a protein more thermostable. In case of an ensemble-based CNA run, for all residues the frequency of being a weak spot in all individual structures of the ensemble is presented.
6 Homo-multimeric proteins or proteins with internal symmetry
CNA considers each residues of a protein individually, that is, it does not consider related residues from different chains of homo-multimeric proteins (or symmetry-related residues) as identical and, hence, does not average out index values for such related residues. Accordingly, the local index values for related residues can differ because of differences in the local environment of these residues (=> rigidity index, stability maps) or because of the very definition of the index, which considers the whole homo-multimer/symmetric protein as one unit and monitors the thermal unfolding simulation of this whole unit (=> percolation index). In the latter case, it may be that residues of one of the chains are identified as weak spots (because unfolding starts here) whereas those of the other chain(s) do not harbor any weak spots character.