Day-2 (Basic Training on Structure Based Drug Design)

Necessary things you should know before doing Protein Modeling

Raju Dash

Research Assistant

Molecular Modeling and Drug Design Laboratory

Overview of Protein

  • Proteins are made of amino acids.
  • Proteins are biopolymers comprised of chains of amino acids
  • There are 20 essential amino acids used to create proteins
  • Each of these amino acids shares a common structure:
    • a carboxylic acid group (C)
    • an amino group (N)
    • a hydrogen
    • a side-group(R)
The side groups are what make each amino acid unique. Now let’s review the following examples.
  •  Carboxy side group:  glutamic acid, aspartic acid
  •  Amino side group: lysine
  •  Guanidino side group:  arginine
Different side groups give the amino acids different biochemical properties, which in turn affects their structure and function. Now let’s review the 20 essential amino acids.


Important Note:
  • You may have noticed on the previous chart that each amino acid has a specific name, a 3-letter code, and a 1-letter code which we can use to refer to it.
  • The 1-letter code is most useful when looking at the protein sequence.
The different properties of the amino acid side groups give each amino acid unique bond formation ability. These bonding properties impact the resulting shape of any protein that is made from a linear combination of amino acids, as each side group may interact with other side groups in the protein chain in different ways
Now let’s review the Amino Acid Binding Properties Table below.

Biological Activity

The biological activity of an amino acid depends on the nature of the side (R) group:
  • Size
  • Hydrophobicity
  • Charge
  • Bonding characteristics

Protein Structure

  • The amino acid chain that results from translation is called the primary structure of the protein  it is simply a long polymer of amino acids connected via peptide bonds.
  • Because of the nature of the peptide bonds, the primary structure has an N terminus and a C terminus (see peptide bonds, above).
  • The peptide bonds are planar and cannot rotate, but the entire polypeptide can bend and take on additional structure


Secondary Structure

  • The secondary structure of a protein is any regular, repeating pattern of folding within the molecule.
  • Secondary structure is stabilized by hydrogen bonds between the amino and keto groups of the peptide bonds.


Alpha Helix

An alpha helix is a counter-clockwise spiral of a polypeptide chain with 3.6 amino acids per turn, for a size of 5.4 angstroms per turn, and are often associated with DNA binding regions of a protein. Let’s review the Ribbon diagram of an alpha-helix as shown below.


Beta strand

In a Beta strand, or sheet, the polypeptide backbone is lined up so that several strands form, with reference to the N termini, either parallel or anti-parallel blanket-like structures.
Let’s review the representation of an anti-parallel beta sheet.



Hairpin Turn

  • Hairpin turns allow a protein to loop back on itself at a full 180 degree  angle
    • These are the connectors between strands of an anti-parallel beta sheet, for example.
  • Turns typically contain 4 amino acids:
    • Rarer turns can contain 2 or 3 amino acids
    • Often contain Gly (the smallest amino acid) or Pro, which has a highly “flexible” peptide bond
  • Like other secondary structures, the turn is formed through hydrogen bonding between amino acids
Let’s review the following depiction of a hairpin turn and a 3-amino acid hairpin turn.
Important Note:
In a 3-amino acid hairpin turn, the N terminus is shown upper left, starting with a tyrosine, which is
bound to a proline which comprises the “turn” region, and is bound to tryptophan.


The term “coil” is used to refer to any other secondary structure not defined above
  • While less structured, they are not “random” or unorganized – still important to structure and function of the protein.
  • You may see these referred to as “random coils” – this is only because they are not highly structured like helices, sheets, turns, etc.


Tertiary Structure

Tertiary structure is the arrangement of a protein’s secondary structure in space
  • E.g. coils, sheets, turns, helices fold into a more complex globular (spherical) or filamentous/fibrous (elongated) form
  • Tertiary structure is maintained by a variety of chemical bonds
  • Just as the primary structure folds into secondary structures, the secondary structures (and non-conjoined primary structure) can fold into tertiary structures

Quaternary Structure

  • Proteins often form functional subunits of a larger complex.
  • A Quaternary structure is a special form of protein where two or more individual proteins come together to form a functional unit. These may be  homoquaternary  or  heteroquaternary structures

Protein 3D Structure Modeling 

Considering that, you need the 3D structure of a target protein, which has not been solved empirically by X-ray crystallography or NMR. You have only the sequence of the corresponding target protein. And that helps you to generate a 3D structure of your desired target protein, if you have any similar 3D structure (50% or better sequence identity would be good), of other protein from other sources. This technique is called “homology modeling”. It is, at best, moderately accurate for the positions of alpha carbons (Backbone of the protein) in the 3D structure, in regions where the sequence identity is high.
A homology modeling routine needs two items of input:
  1. The sequence of the protein with unknown 3D structure, the “target sequence“.
  2. A 3D template is chosen by virtue of having the highest sequence identity with the target sequence. The 3D structure of the template must be determined by reliable empirical methods such as crystallography or NMR,  and is typically a published atomic coordinate “PDB” file from the Protein Data Bank.
Figure: A typical scheme of homology modeling

Important Note:


FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (“>”) symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase


Sequence alignment:

It is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
  • Pairwise Sequence Alignment is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences (protein or nucleic acid).
  • By contrast, Multiple Sequence Alignment (MSA) is the alignment of three or more biological sequences of similar length. From the output of MSA applications, homology can be inferred and the evolutionary relationship between the sequences studied.


Basic Information (Tutorial Example)


Overview of Protein Kinase (PK)

  • Represent one of the largest superfamilies in eukaryotes (~ 2% of vertebrate genes encode kinases).
  • They are functionally related by their catalytic domain, a region of approximately 250-300 amino acids.
  • PKs are ATP-dependent enzymes that facilitate the transfer of a phosphoryl group from the γ-position of ATP to hydroxyls of serine, threonine or tyrosine of their respective substrates.
  • Since the phosphoryl transfer reaction represents one of the most important post-translationall modifications in the cell it is not surprising to find PKs in pivotal roles in the regulation of a wide variety of critical cellular processes including cell cycle, cell growth, cell death, metabolism and differentiation.
Note: The ATP binding pocket of the protein kinase is a major consideration in Protein Kinase Inhibitor Design and Development.


Cyclin dependent kinases (CDK)

  • Group of several different kinases involved in regulation of the cell cycle.
  • They phosphorylate other proteins on their serine or threonine residues, but CDKs must first bind to a cyclin protein in order to be active. Different combinations of specific CDKs and cyclins mark different parts of the cell cycle.
  • Additionally, the phosphorylation state of CDKs is also critical to their activity, as they are subject to regulation by other kinases (such as CDK-activating kinase) and phosphatases (such as Cdc25).
  • Once the CDKs are active, they phosphorylate other proteins to change their activity, which leads to events necessary for the next stage of the cell cycle
  • CDKs also have roles in transcription, metabolism, and other cellular events.
Because of their key role in the controlling cell division, mutations in CDKs are often found in cancerous cells. These mutations lead to uncontrolled growth of the cells, where they are rapidly going through the whole cell cycle repeatedly. Therefore, inhibitors of CDK have been developed as treatments for some types of cancer.

Chimera Software: Download Application