Procathepsin K…
Life in the Pits
By
Kevin Frederick
Biochemistry
Lycoming College



Cathepsin K is a recently discovered protease that belongs to the papain family of cysteine proteases.  This novel cathepsin has also been referred to as cathepsin OC2, O, X, or O2 (Barthlow et al., 1996).  It has been suggested that this protein is involved in the resorption of bone matrix, which is important in the dynamic process of bone remodeling.  This resorption phase of bone remodeling is carried out by osteoclasts, which adhere to the surface of bone.  This action of osteoclasts initially leads to the formation of an extracellular compartment, the resorption pit, which is maintained at low pH.  It is into this pit that the osteoclast then secretes proteolytic enzymes, such as cathepsin K (D’Alessio et al., 1999).  The specific function of cathepsin K is as an efficient collagenase that cleaves both type I and type II collagens at their helical domains (Bromme et al., 1999).  Furthermore, it has collagenolytic activity inside the helical region, which is unique among mammalian proteinases (Borel et al., 1998).  The presence of cathepsin K in chondroclasts and RA-synoviocytes also suggests that it is involved in the normal turnover of cartilage (Bromme et al., 1999).  With these functions of cathepsin K, mutations in the cathepsin K gene have been shown to cause pycnodysostosis, which is a rare autosomal recessive skeletal dysplasia characterized by short stature, osteosclerosis, bone fragility, and abnormal bone and tooth development (Chapman et al., 1997).  These characteristics of pycnodysostosis are a direct result of the mutation causing a loss of collagenolytic activity in cathepsin K (Bromme et al., 1999).

Like other members of the papain superfamily, cathepsin K is synthesized as an inactive proenzyme, which contains a 99 residue proregion (Cygler et al., 1999).  The proenzyme, procathepsin K, contains a total of 314 residues and has a molecular weight of 35297 Daltons (ribbon diagram.pdf).  Furthermore, it contains 37 strongly basic (+) residues (K, R), 34 strongly acidic residues (D, E), 86 hydrophobic residues (A, I, L, F, W, V) and 89 polar residues (N, C, Q, S, T, Y).  The inactive proenzyme is converted to its mature active form by proteolytic cleavage of the 99 amino acid propeptide from the amino-terminus (D’Alessio et al., 1999).  This processing of procathepsin K in vitro is autocatalytic at 4oC and pH 4 and does not require another protease, but it can be catalyzed by mature cathepsin K.  On the other hand, activation of procathepsin K in vivo is likely to occur in the low pH environment of the resorption pit.  Once procathepsin K is secreted into the resorption pit, it undergoes a conformational change, which is induced by the lower pH.  This conformational change results in the unmasking of the active site and makes the propeptide more vulnerable to endoproteolysis.  Cleavage at the preferred Pro-X-X sites then leads to fragmenting of the propeptide.  Once this fragmenting is completed, the propeptide is further degraded by endoproteolysis at less preferred sites, which results in the fully active mature cathepsin K.  Activation in the resorption pit could be further accelerated by the catalytic action of newly formed mature cathepsin K (Amegadzie et al., 1997).  Another theory of procathepsin K activation consists of a pH – sensing mechanism, containing Asp65 and Lys20, to trigger autocatalysis.  At low pH, the salt bridges formed between these two residues become protonated and disrupted, which is then translated to the entire domain of the propeptide.  This change results in an increase in the mobility of the propeptide.

  Then, cleavage takes place at residues Glu4 (the beginning of the N-terminal domain), Ala59 (the beta-sheet region), Ser83, and Glu95 (the end of the C-terminal segment of the propeptide).  Initial cleavage would be at Ser83 to allow the propeptide to dissociate from the active site.  Once the site is free and the globular domain shifts, cleavage at Ala59 would occur.  Two fragments, residues 4 - 59 and 60 - 83, would then dissociate from the mature cathepsin K (D’Alessio et al., 1999).

The propeptide of procathepsin K consists of residues 1-99 (yellow), whereas the mature cathepsin K domain consists of residues 100-314 (blue).  This mature cathepsin K domain is virtually identical to the mature cathepsin K structure alone.  For instance, there is no change in the overall conformation of the mature portion of the protein in procathepsin K.  However, there are a few changes in side-chain positions that are a result of interactions with the propeptide in the active site (D’Alessio et al., 1999).

The propeptide can be further divided into three segments: the globular domain (residues 5-73) (blue), the active cleft-binding segment (residues 74-81) (yellow), and the C-terminal segment (residues 82-99) (red).  The globular domain consists of three alpha helices and one beta-strand (D’Alessio et al., 1999).  The first and second helices, connected by a six residue long loop, pack tightly together to form a hydrophobic mini-core.   Salt bridges and hydrogen bonds help to stabilize this fold (Cygler et al., 1999).  The C-terminal segment, on the other hand, is highly flexible, which confers its role in the activation of the protein (D’Alessio et al., 1999).

The globular domain of the propeptide is anchored to the mature protein by hydrophobic interactions and hydrogen bonding.  For instance, the hydrophobic faces of helices 2 and 3 and the beta-strand of the propeptide pack tightly against the apolar surface of the mature cathepsin K domain.  This apolar surface, residues 236 - 251, is referred to as the propeptide binding loop (PBL) (green-blue).

The propeptide straddles the PBL, which allows the beta-strand of the propeptide (yellow) to hydrogen bond with a beta-strand of the mature protein domain (blue) and form an anti-parallel beta-sheet .  Several main-chain to side-chain hydrogen bonds are also formed.  The hydrophobic surface of the PBL is formed mainly by Phe243, which forms key interactions between the PBL and the propeptide by forming several van der Waals contacts.  Aromatic stacking creates further interactions by forming an aromatic network that extends from the propeptide into the mature protein domain  (D’Alessio et al., 1999).

Interactions between the active cleft-binding segment and the mature cathepsin K domain involve residues 74-81, which lie in the active site.  The orientation of the propeptide in the cleft is actually opposite to that for a natural substrate.  This unnatural orientation results in the relative inactivity of procathepsin K as an enzyme.  The interactions in the active site involve the third helix positioning the propeptide to enter the active site at the S’-subsites and continue through the S-subsites. The S’-subsites (green) are occupied by Thr76 (S1’) and Met75 (S2’), whereas the S-subsites (red) are occupied by Gly77 (S1), Leu78 (S2), and Lys79 (S3).  Many hydrogen bonds and van der Waals interactions are responsible for forming these interactions in the active site  (D’Alessio et al., 1999).





The C-terminal segment of the propeptide has very high B-factors.  Due to this segment being so disordered, specific interactions with the mature protein domain cannot be determined.  However, the C-terminal segment contains the two cleavage sites Ser83 and Glu95.  Consistent with the mechanism of the protein’s activation, these two cleavage sites are found one or two residues from proline residues (D’Alessio et al., 1999).

Electrostatic interactions are also very important in procathepsin K (electrostatic interactions.pdf).  One important salt bridge is located in the active site and is formed between the propeptide and the mature protein domain, Lys79 to Asp160.  There are also five well-defined salt bridges within the propeptide globular domain and several networks of salt bridges throughout the entirety of the propeptide.  Disruption of these salt bridges exposes the hydrophobic core of the propeptide, which is crucial for the cleavage of the mature protein from the propeptide (D’Alessio et al., 1999).

From protein sequence analysis, several secondary structure predictions can be made.  Both the Garnier-Robson predictions and the Chou-Fasman predictions were similar and accurate for potential alpha helices.  For example, both Garnier-Robson and Chou-Fasman predicted an actual helix at residues 7 – 17.  The only major discrepancies occurred at the end of the sequence, where actual helices do not exist.  For the beta sheet predictions, Garnier-Robson and Chou-Fasman corresponded less closely than for the helix predictions.  However, the Chou-Fasman predictions were generally more accurate than the Garnier-Robson predictions.  The protein analysis also predicted the existence of several amphipathic helices.  One of these helices, residues 24 - 33 of the globular domain, is shown in the helical wheel projection .  On the one side of the helix, there exist several charged (K, R, D, E) and polar residues (N, S).  However, on the other side of the helix, there exist several hydrophobic residues (I, L, V).  These differences, with respect to potential interactions with water, between the two sides confer the amphipathic nature of the helix.
 
 Alignment of sequences  for procathepsin K from a human, a long tail macaque, a mouse, a rabbit, and a rat shows that these sequences are very similar.  Therefore, it appears that procathepsin K has been highly conserved throughout evolution.  Specifically, as to be expected, the residues comprising the active site (75-79) and the PBL (236-251) were very well conserved.  Furthermore, by investigation of the phylogenetic tree, one can see that humans and long tail macaques are closely related and that mice and rats are closely related.  These relationships make sense considering that humans and long tail macaques are both primates and that mice and rats are both rodents.

The cathepsin K gene has been localized to chromosome 1q21 (Chapman et al., 1997).  It was further determined that the gene spans approximately 12.1 kb of genomic DNA and is composed of eight exons ranging in size from 48 to 219 bp and seven introns ranging in size from 85 to 4326 bp (Debouck et al., 1997).  A map of the intron – exon organization can be found at the Gene Organization link.  The gene also contains a single transcriptional start site 49 bp upstream from the initiator Met codon and a promoter that has two AP1 sites, is not particularly GC-rich, and lacks SP1 sites (Chapman et al., 1997).  Other research, however, has shown that the cathepsin K gene actually contains two SP1 sites.  Moreover, the 5’-flanking region of the gene lacks a canonical TATA or CAAT box or an initiator element.  However, a nonconsensus AT-rich motif has been implicated as an initiator for transcription  (Debouck et al., 1997).  Sequence analysis of mRNA indicates that translation occurs between residues 130 and 1118.  Residues 175 through 1118 refer to the actual protein, whereas residues 130-174 refer to the signal peptide.

Several other types of cathepsins have been reported and have many similarities and differences to cathepsin K.  These other types of cathepsins include cathepsins S, B, and L.  As for similarities, all of these cathepsins are secreted as proenzymes.  Furthermore, cathepsins S, L, and K all belong to the same subfamily.  However, cathepsin B belongs to a different subfamily of the cysteine protease family (D’Alessio et al., 1999).  Moreover, a great deal of homology is seen between cathepsin K and the others.  Cathepsin K and cathepsin S are especially homologous with 71% similarity, but cathepsin L is also very similar (68%) (Amegadzie et al., 1996).  The intron - exon organization of cathepsin K and cathepsin S is also completely conserved within the coding region of the gene.  Therefore, with the additional existence of cathepsin K and cathepsin S at the same locus, these similarities are highly suggestive of gene duplication from an ancestral gene.  Likewise, the intron - exon organization of cathepsin L suggests that it may have arisen from the same ancestral gene, but its gene has been mapped to chromosome 9q21 - q22 (Debouck et al., 1997).  It has been further discovered that cathepsin K has greater substrate specificity and collagenolytic activity than the other cathepsins (Chapman et al., 1997).  And lastly, cathepsin K has been shown to have a much higher expression in osteoclasts than the other cathepsins (Barthlow et al., 1996).

Recently, several novel classes of cathepsin K inhibitors have been designed from the use of X-ray co-crystal structures of peptide aldehydes bound to papain (Dodds et al., 2000).  For instance, it has been discovered that the S-nitroso derivatives of glutathione and N-acetylpencillamine and the non-thiol NO donors NOR-1 and NOR-3 all inhibited the activity of purified cathepsin K (Campagnolo et al., 1999).  Furthermore, 1,3-bis[[N(a)-[(phenylmethoxy)carbonyl]-L-leucyl]amino]-2-propanone has been shown to bind to cathepsin K and act as a symmetric inhibitor. This inhibitor shows strong inhibition by spanning both the S- and S’- subsites (D’Alessio et al., 1999).  With the knowledge of these inhibitors and the role of cathepsin K, cathepsin K can serve as a possible drug target for the treatment of diseases such as osteoporosis (Campagnolo et al., 1999).

 Works Cited