De Novo Amino Acid Sequencing & Protein Sequencing

BioPharmaSpec provides a protein sequencing service that allows you to unambiguously confirm the amino acid sequence of a protein. Primary amino acid sequencing is the determination of the correct positions of each of the amino acids in your biopharmaceutical. This is different to peptide mapping, where the sequence is confirmed based only on the masses of peptides created through product digestion.

Peptide mapping or Protein Sequencing?

During peptide mapping, the protein is digested into peptides and these peptides are analyzed by mass spectrometry. Using an appropriate digestion strategy, all peptides can be assigned using mass spectrometry and overlapping these peptides can confirm the sequence against a theoretical amino acid sequence. The exact positions of each and every amino acid is therefore not confirmed during peptide mapping.

In cases where there is no theoretical sequence or it is a regulatory requirement to confirm the sequence unambiguously (e.g. during biosimilar development), protein sequencing or de novo primary amino acid sequencing should be performed.

Applications of Protein Sequencing

One of the main applications of protein sequencing (or primary, de novo, amino acid sequencing) is to fulfil an important biosimilar regulatory requirement. Primary amino acid sequencing is an essential analysis during the development of a biosimilar, because you must ensure that the primary protein sequence of the biosimilar is identical to that of the innovator product. This is the only way to prove that the biosimilar is the same molecule as the target (or innovator/ reference) molecule. It is not sufficient to draw conclusions about the amino acid sequence based on peptide mapping or the DNA sequence (due to transcription or translation errors).

Once the protein sequences of the biosimilar and innovator are proven to be identical, other comparability studies should be performed to show that the molecules are similar in their functionality and their secondary, tertiary and higher order structure.

Protein Sequencing Techniques

The below flow diagram illustrates the steps taken during a protein sequencing study at BioPharmaSpec:

Looking at this flow diagram, we can see that primary amino acid sequencing relies on two main analytical techniques:

  1. Mass spectrometry (On line LC-MS plus MSe and MS/MS) and,
  2. gas phase sequencing (also known as Edman chemistry or N-terminal sequencing).

Let’s now consider how each of these techniques is applied during protein sequencing.

The Application of Mass Spectrometry (MS) During Protein Sequencing

There are two main reasons for using MS for sequence determination:

  1. MS works by accurately identifying the masses of components passing through the system optics.
  2. Certain types of mass spectrometers (e.g. quadrupole orthogonal time-of-flight [Q-TOF] instruments) have the capacity to generate fragment ions from peptides either through isolation of specific components (so called MS/ MS analysis) or through the use of energy switching within the instrument (MSe).

The figure below shows an example of the data generated in the high- and low energy channels of a Q-TOF mass spectrometer for a particular peptide eluting from an LC separation of a protein digest mixture. The low-energy mass spectrum clearly shows the presence of a signal that is consistent in mass with a particular peptide from the digest. The associated high-energy mass spectrum shows the fragment ions that were derived from this component and from which sequence information can be determined.

MS is an excellent tool for amino acid sequence analysis and provides a significant amount of information. However, its strength comes from its ability to define amino acids within the sequence via the mass differences between fragment ions. How can we distinguish between amino acids with the same mass?

The Application of N-Terminal Sequencing During Protein Sequencing

While most amino acids differ in mass, the amino acids leucine and isoleucine are isomers and therefore isobaric, meaning they have the same mass. This means that a simple peptide fragmentation spectrum will not be able to distinguish these amino acids from each other. To provide a full and comprehensive peptide sequence with no ambiguities, differentiating these two amino acids is necessary. This can be achieved using gas phase sequencing (also known as Edman or N-terminal sequencing)

Gas phase sequencing is performed by the sequential chemical removal, derivatization, and analysis of the N-terminal amino acid from a peptide or protein. The released amino acid derivative is then identified based on its relative chromatographic retention time against a panel of identically derivatized amino acids. Since leucine and isoleucine have different structures and therefore different retention times, they can be readily distinguished from one another (see figure below).

When performed together as part of a harmonized workflow, the above techniques are used to provide complete amino acid sequence information to the level required by the regulatory authorities.

BioPharmaSpec provides a comprehensive protein sequencing service for all biopharmaceuticals. Please contact us to find out more.