A Winning Combination for Confident Decision-making in Biopharmaceutical Development!
Data processing in modern biopharmaceutical analysis generates a huge amount of data, especially as the techniques we use to investigate structure evolve and increase in resolution and sensitivity. This increased level of generated data contains information that has to be assimilated and interpreted to maximize its value and thus enhance product knowledge. As we seek to delve ever deeper into datasets to glean as much information as possible, there is an increasing reliance on the use of software and the power of computational processing for data mining. Of course, the converse is also true: as computational power increases, we are able to use that power to develop more sophisticated analytical technologies. The two go hand-in-hand.
Automatic data processing is a very powerful and now ubiquitous tool, allowing rapid and streamlined data analysis. Software can readily sift through data, identifying and tabulating key pieces of structural information far faster and more efficiently than any scientist. Indeed, it is quite possible that, with larger datasets, human interpretation could result in overlooked information that is simply lost in the sheer volume of data!
Large Datasets and The Power of Data Processing
Naturally, software data processing speeds up structure determination and process development, since the information needed to make key decisions is more rapidly generated. Certain techniques such as mass spectrometry and proteomics-type analyses generate very large datasets and extensive use of in silico processing facilitates their interpretation. Indeed, for distillation and processing of large datasets, computational power is a prerequisite for data assessment in meaningful timescales. Thus, peptide maps can be generated, post-translational modifications identified and glycan structures determined rapidly, as well as identities made of proteins present in complex protein mixtures (e.g. MS-based host cell protein analysis). Other forms of analysis such as aggregation studies using sedimentation velocity ultracentrifugation (SV-AUC) require complex algorithms to take raw data and process it into a meaningful output. These algorithms are continually evolving, allowing ever deeper investigations into data.
Scientific Expertise Adds Context
Is fully automated processing sufficient? Well, we have two separate but related needs for computer processing: firstly, the assessment of large datasets to facilitate handling of the sheer volume of data obtained and secondly, the translation of data from recorded information to meaningful output. This therefore begs the question: “Should we just turn all our data processing over to computers?” Well, in my opinion, no. The idea that scientific expertise becomes less important or indeed can be bypassed in the era of computer driven data processing is far from reality. To the contrary, there is cause to feel that expert human involvement is more important than ever and that processed data has to have scientific meaning and value, which requires scientific oversight to maintain.
The processed output from some of the large datasets mentioned above should always be further assessed by an experienced operator for meaning and to ensure that any key parameters required for appropriate processing have been met. The use of inappropriate parameters or incorrect in silico handling of some aspect (e.g. incorrect peak assignment in mass spectral data) will lead to errors in output which should be identified and removed from the final processed data. For example, weak signals mistakenly identified as being derived from a particular peptide based on an erroneous assignment of what are actually fragment ions from a related or mis-cleaved peptide. Likewise, appropriate baseline settings mean that noise is not being assessed, or indeed identified, as containing peptides or post-translationally modified species which are actually not present.
When reviewing data at BioPharmaSpec, I consider the following:
- Is the software trying to force itself to find something it expects to see that is not actually there?
- Are the filters and set parameters appropriate for the job in hand?
- For deep data mining, does the output give true value and increase product knowledge or is it simply muddying the waters?
It is also important to consider that data must be given scientific context for its full weight to be felt. A series of numbers or values simply derived from in silico processing is of little meaning unless placed within a sound scientific framework.
Software and The Scientist
Whilst software packages are becoming ever more sophisticated and can hugely increase the speed at which data can be processed, they are only ever as good as the code behind them. However, this does not consider all the vagaries and variables that may be encountered in all experiments and experimental circumstances, such as sensitivity or resolution of individual component values from one another (e.g. as could be found in mass data from mass spectrometric peptide mapping analyses).
Certainly, for larger and more complex analytical data sets there is no replacement for final data assessment from an experienced and trained scientist who can review the processed data with experience to draw upon, as well as the ability to contextualize data, possibly through alignment with results from orthogonal studies that serve to support conclusions within that dataset. Over-reliance on in silico data interpretation without adequate control could lead to erroneous conclusions and result in drift away from the correct and appropriate manufacturing pathway or incorrect conclusions about the nature or composition of the drug product.
As with so many things in life, it is about striking the right balance. Computational power and the ability to mine data has increased massively in recent years and is likely to continue to do so. This has brought about a significant increase in the amount of data that can be generated and handled and thus better product knowledge and, by extension, product quality. Knowing the strengths and limitations of computer data processing allows the most effective use of processing power, with scientific oversight and interpretation being applied to manage and control the process. This makes for effective and sound data interpretation, ensures that any and all datasets are given product and scientific context, removes any inaccurate or plain wrong computational assignments and is the stance that we take at BioPharmaSpec.