Before I started studying at Ghent University, I was already very interested in molecular biology and all the features and mechanisms involved in genetics. Therefore, I enrolled in studies in Bioscience Engineering: Cell and Gene Biotechnology. During my master years, especially the computational aspect of this scientific branch got my growing attention. This resulted in doing a master thesis at BIOBIX, titled ‘Ribosome profiling, a useful tool in the search for micropeptides’. In this research, I helped constructing bio-informatic tools to build up a micropeptide prediction pipeline which identifies possibly coding small open reading frames out of ribosome profiling data. Afterwards, resulting small translation product candidates were matched against mass spectrometry data.


After obtaining my master degree, I continued at BIOBIX as a PhD student. I obtained a scholarship of the Special Research Fund (BOF) of Ghent University with a project entitled ‘In depth insight into the translatome using state-of-the-art proteomics and ribosome profiling’. I did my research under the supervision of Prof. Wim Van Criekinge, Prof. Petra Van Damme and Gerben Menschaert. In 2020, I obtained my PhD by defending my thesis, ‘the hunt for new proteoforms using ribosome profiling’.


The work in my doctoral research focused on ribosome profiling (RIBOseq), a recently described next generation sequencing technique in which only the sequence of the ribosome-protected mRNA fragments is read during the sequencing protocol. In this way, a global view of translation on subcodon level is available, highlighting the exact positions of the translating ribosomes on the mRNA strand. Even more recently, different additional flavours of this RIBOseq protocol were devised, enabling to capture a more quantitative ribosome profile or an overall translation initiation profile. However, the analysis of all these new kinds of data inquire improvements and further standardisation.


Descent quality control is such an essential improvement to assure solid conclusions out of ribosome profiling data. Nevertheless, directed quality control tools for mapped RIBOseq data were not available yet. I tried to fill this gap by developing my own software. MappingQC (‘mQC’) is a tool to easily generate figures which give a nice overview of the quality of the mapping of ribosome profiling data. More specific, it gives an overview of the P site offset calculation, the gene distribution and the metagenic classification. Furthermore, MappingQC does a thorough analysis of the triplet periodicity and the linked triplet phase (typical for ribosome profiling) in the canonical transcripts of your data. Especially, the link between the phase distribution and the RPF length, the relative sequence position and the triplet identity are taken into account.


Another application in which I use ribosome profiling information is the PROTEOFORMER pipeline. It enables the automated processing of data derived from RIBO-seq. As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5’ and 3’ extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called non-coding regions. Furthermore, proof-of-concept is reported for the improvement of spectrum matching by including Prosit (Kuster lab, TUM, Munich, Germany), a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. Next to Prosit, MS2PIP (Compomics lab, UGent-VIB) provides a random forest strategy to predict additional features of theoretical spectra.


After my doctoral research, it was clear that this last research track was not completely done. Therefore, as a post-doctoral researcher, I am at the moment diving further into this path of trying to use these next generation spectrum features for better matching strategies between search spaces and experimental mass spectra in our proteogenomics case study.

Steven Verbruggen

post-docoral fellow

ribosome profiling, mapping quality control, proteogenomics, proteoforms, mass spectrometry, spectrum intensity features