WHAT IS PROTEOMICS?
Click to expand
Almost all diseases manifest themselves as changes in the expression, abundance or signaling status of proteins. Therefore, the precise analysis of the proteome (the entirety of a biological system’s proteins) is a crucial step in the understanding, diagnosis, and treatment of diseases. Mass spectrometry-based proteomics is a powerful technique for the simultaneous analyses of thousands of proteins, fueling biomarker research and drug discovery.
PROTEOMIC DATA ANALYSIS
The analysis of proteomic data heavily relies on the automated matching of acquired tandem mass spectra of peptides (fragments of proteins) to protein sequence databases. This process relies on simple assumptions and the key concepts have remained largely unchanged since their introduction in 1993. We believe that we only see the tip of the iceberg. To date, only half of the data acquired from a sample can be identified using classical data analysis workflows, leading to lost productivity, precious samples, and opportunities.
THE POWER OF DEEP LEARNING
Recent developments in the field of machine learning revolutionize all branches of research. Artificial neural networks learn to perform tasks without previously defined rule sets, solely based on annotated training data. We have learned to harness this power to predict properties of peptides like liquid chromatography retention time or fragmentation behavior inside the mass spectrometer.
PREDICTING PEPTIDE PROPERTIES
The MSAID founders developed a generic deep learning framework called INFERYS which learns to predict any peptide property from training data. INFERYS demonstrates superior accuracy performance well above all other current approaches. The algorithm was trained using millions of mass spectra and can be adapted to all common mass spectrometers with minimal additional training. The model is universally applicable to proteins from any organism, creating huge opportunities in areas such as immunopeptidomics, proteogenomics, or metaproteomics. The novel, intelligent search algorithm CHIMERYS is fueled by accurate predictions provided by INFERYS and enables a deeper, more comprehensive data analysis.
Proteins are molecular machines that facilitate a lot of processes necessary to sustain life. They provide a large variety of functions, from structure to metabolism and regulation in every living organism. Proteomics as a field focuses on the identification and quantification of proteins on large scale. Research questions may on protein abundance, the variety of proteoforms due to post-translational modifications (PTMs), and stable or transient protein-protein interactions. Proteomics also moves towards the in the clinical setting studying proteins regulated in different conditions and and diseases.
FIELDS OF INVESTIGATION
Proteomics is commonly used to investigate proteins on large scale:
when and where the proteins are expressed in what quantity
how proteins are modified by post-translational modifications (PTMs) such as phosphorylation
rates of protein production, degradation, and steady-state abundance
how proteins interact with other proteins and protein-complexes
Proteomics aims to be a breakthrough technology that will allow doctors to better diagnose and treat diseases. Large research interest is finding biological markers that signal disease, targets for drugs, and a detailed understanding of biology on the molecular level.
PROTEOMIC TECHNOLOGIES & WORKFLOW
Proteins can be investigated using several technologies. These can be roughly categorized into antibody-based techniques, array-based techniques and mass spectrometry-based technolgies. Mass spectrometry-based proteomics has developed into the workhorse for the large scale investigation on proteins, largely due to its throughput and the independence from biological reagents.
- Bottom-up proteomics: Technology relying on digesting proteins to peptides using proteolytic enzymes (e.g. Trypsin). This technology has the advantage of being applicable to very complex samples, no prior knowledge of the sample (besides origin) is needed.
- Liquid chromatography (LC): Technology of separating peptides prior to mass spectrometry. Usually directly interfaced with the mass spectrometer in bottom-up proteomics experiments where peptides are directly ionized and injected in the mass spectrometer after separation.
- Mass spectrometer (MS): Instruments determining the mass over charge of peptides by recording mass spectra (lists of masses and corresponding intensities/quantities). Usually, both the mass of the intact peptide is determined before fragmenting the peptide in shorter fragments to determine the amino acid sequence.
- Bioinformatics: Software and databases are enabling technologies for proteomics by automatically assigning acquired mass spectra to peptide sequences. This enables searching hundreds o f thousands of spectra in a short period of time in contrast to manually assigning spectra to peptides and therefore proteins.
The analysis of proteomic data heavily relies on the automated matching of acquired tandem mass spectra of peptides (fragments of proteins) to protein sequence databases. This process relies on simple assumptions and the key concepts have remained largely unchanged since their introduction in 1993. MSAID replaces current algorithms with powerful, AI-based solutions and paves the way for a deeper, and more reliable way of interrogating proteomics data. Powered by vast amounts of training data, we develop deep learning models for bottom-up proteomics and integrate them into innovative software solutions. We make our software and services easily and readily available, thereby boosting the use of machine learning in the field of proteomics.