Sequence variants are produced as a result of an incorrect amino acid residue being incorporated into the backbone of a protein.
Misincorporations like this, often known as “read errors”, occur during the process of translation and are a result of a mismatch between the mRNA and tRNA due to incorrect base pair alignment. This may only occur in a small fraction of the resulting protein molecules, if at all, but different positions within the protein chain may be susceptible to sequence variant formation and thus low levels of different sequence variants may be produced. The ICH Q6B guidelines state that the amino acid sequence of the protein must be confirmed using amino acid sequencing techniques, in order to demonstrate that the protein is a true product of the DNA. Recent guidance also suggests that it is advisable to perform amino acid sequencing at an early stage of product development, so that the developer can be sure that the selected clone produces the correct primary sequence.
Mass spectrometry and N-terminal sequencing (Edman degradation) are the two main methods used to fully sequence a protein (with N-terminal sequencing providing sequence information that may be lacking from the mass spectrometric data due to, for example, poor fragment ion production leading to weak data in a particular region of a protein). Furthermore, mass spectrometry is not suitable for the unambiguous identification of amino acids that have the same mass (e.g. Leucine and Isoleucine). N-terminal sequencing procedures can be performed on collected peptides containing these amino acids to identify which is present at a particular position. However, the use of Edman chemistry alone (i.e. without mass spectrometry based sequencing) is not recommended since this would be a very time consuming and laborious process. This combined approach is a powerful way of sequencing even very large proteins, such as monoclonal antibodies. The use of the two methods also provides orthogonality to the data, which increases the strength of the structural characterization package.
As outlined above, screening for sequence variants should be performed in the early phases of product development and characterization. In this way, any sequence variants will be discovered while it is still feasible to reselect clones or cell lines, if appropriate, and the financial and timeline impacts will be lessened. It should be noted that in some cases, it may be that sequence variants (or a certain subset of them within predefined limits) are accepted as part of the make-up of the product and are shown to have no clinically meaningful effects.
Mass spectrometry is the ideal tool for assessment of sequence variants with its high sensitivity and ability to provide accurate mass information. Also, depending on instrument type, sequence specific fragment ions can be used to confirm the identity of the native peptide and any sequence variants, if present in sufficient levels. Peptide mapping data can be screened for the presence of expected or predicted variants, based on the predicted mass of the variant peptide. In many cases, variant peptides will have a different elution position compared to the native peptide which helps with identification.
Screening of data for variants can be performed in a targeted manner through specific mass searches if they are known, or are at least considered plausible in the protein sequence. Where there is no prior knowledge of sequence variants, the data can be assessed in an untargeted manner through the use of error tolerant data searches (described in more detail below) but such analysis can produce a large volume of complex data requiring careful interpretation to avoid false positives. Using an on-line LC/ES-MS peptide mapping approach is therefore a key technique in the assessment of sequence variants.