Thank you for visiting my website !
If you did not find the information you are seeking, please Contact me directly.

Dr Benjamin Linard

With the advent of high-throughput DNA sequencing technologies, the research of efficient and proficient bioinformatic algorithms and methodologies became a critical element in many medical, evolutionary and environmental studies.

After studying biochemistry and molecular biology, I specialized in bioinformatics and computer sciences during my Master and PhD Theses. Since then, I continuously developed my research interests and skills towards the fields of environmental and evolutionary sciences.

My past projects spanned over a large area of environmental/medical applications with at their core the development of new bioinformatic algorithms, software, databases and analysis workflows.

Below is a short list of keywords summarizing the main subjects I studied during the past years:

Side A : Computer sciences

ALGORITHMS
classification, phylogenetics, sequence alignment ...
SOFTWARE
Python, Java, C++, web frameworks (Django), SQL, DBMS platforms, ...
DEVOPS
cluster computing, web services, reproducibility, continuous integration, ...

Side B : Environmental genomics

EVOLUTIONARY GENOMICS :
genome assembly, genome annotation, orthology inference, biological networks, ...
ECOLOGY :
trophic interactions, microbial communities, symbiosis, ...
BIODIVERSITY :
taxonomy, systematics, aquatic/soil biodiversity, ...

Short CV

Current affiliation

logo

Past affiliations

Education

2012 : Phd Bioinformatics, University of Strasbourg
2007 : Master Bioinformatics & Structural Biology, University of Strasbourg
2006 : License (bachelor) in Biology & Computer Science
2005 : Technological diploma (DUT) Biochemistry and food industry

Short Bio

Since 2019 : research officer in Spygen R&D department
(Hosted in LIRMM, Montpellier, France)

Development of a bioinformatic platform support for the Vigilife Consortium
- optimization and software development for eDNA analysis
- web frameworks for eDNA analyses submission
- digitisation of the R&D processes
- establishment of public / private partnerships & co-developments.

2018 : Research assistant. Univ Montpellier, LIRMM / ISEM / AGAP, France.

LabEx Project Orthologap: development of a new algorithm to classify sequences into families of orthologous genes.
- phylo-k-mer based supervised classification.
- orthology and paralogy.
- genomic sequence annotations.

2016 : Research assistant, CNRS, LIRMM, Montpellier, France.

European Consortium Virogenesis. Development of an alignment-free phylogenetic placement method.
- development of the phylo-k-mers idea, algorithms for their computation.
- phylogenetic placement for taxonomic identification.
- applications in virome analysis.
- viral genome recombination detection.

2013 : Research assistant, Natural History Museum of London, United Kingdom.

Project Biodiversity Initiative : "Breaking the taxonomic barrier"
- Development of the methods of mitochondrial metagenomics and metagenomic skimming.
- Applications in systematics, phylogenetics and biodiversity discovery.
- biodiversity of Coleoptera and soil arthropods.
- metagenomics for gut content and trophic interactions.

2009 : PhD Student, University of Strasbourg, IGBMC, Strasbourg, France.

Projects Evolhupro and OrthoInspector. Development of algorithms and software for comparative genomics, genome annotation and multi-scale omics data integration and visualisation.
- bipartite graph-based clustering for orthology and paralogy inference.
- multi-scaled omics data integration (genome, proteome, biological networks).
- omics data representation and machine learning.

Research

Main topics

Axis 1 : « phylo-k-mers » for the classification of biological sequences

I study the analytical breakthrough offered by the "phylo-k-mers", an abbreviation for "phylogenetically informed substrings of length k".
Phylo-k-mer should not be confounded with classical k-mers extracted from real sequences (those correspond to the state of the sequences at the terminals). From a set of known reference sequences and a tree that represent their relationships, phylo-k-mers can be computed on the basis of models of evolution and algorithms of ancestral sequence reconstruction. Then, for a particular tree branch, it is possible to generate the set of k-mer that diverged from it and their associated probabilities.
There is (#states^k) possible k-mer for each reference alignment position and tree branch. Extended to all alignment positions and tree branches, computing phylo-k-mers rapidly becomes a challenging task, and leads to many open questions : how to choose the divergence points ? which algorithm should I use to explore this huge sequence space ? is the approach scalable to large tree and alignments ? how does it behave with sequencing error and alignment mistakes that would not fit the model of evolution ?
Still, as a first step it is possible to learn a large sequence space, the set of phylo-k-mers, from a relatively small reference dataset. Then, in a second step, classifiers can be developed to exploit phylo-k-mers in many applications, from taxonomic or functional classification of biological sequences, to the detection of evolutionary phenomena, for instance genome recombination.

phylokmers With this approach I developed RAPPAS, the first alignment-free algorithm of phylogenetic placement, (Linard et al, 2019) a task of sequence classification exploited in the critical process of taxonomic identification of metagenomic and metabarcoding sequences.

I was also involved in the development of SHERPAS (Scholz zt al, 2020), a tool designed for viral genome recombination detection. In both projects the classification phase demonstrated an acceleration of several orders of magnitude when compared to other available tools. This opened new possibilities for routine sequence-based taxonomic identification and diagnostics.

In collaboration with colleagues from classical phylogenetics, and exploring the theoretical relationships between the phylo-k-mer approach and machine learning techniques, I recently focused on applications in genome functional annotation.
Phylogenetic Placement occurred as a thriving research field over the past years, and in this context I got interested in means of benchmarking this classification process. This question lead me to develop PEWO, a platform for benchmarking placement tools. With this work, I try to alleviate the critical questions that arise between developers and end-users of phylogenetic placements : which placement algorithm and implementation best fits a particular input dataset ? Is their accuracy / computational resources trade-off adapted to a particular experimental design.

Axis 2 : orthology inference for genome annotation, comparative genomics and phylogenomics

Today genome sequencing is a tool accessible to many research laboratories and new genomes are sequenced at an increasing rate. The evolutionary relationships of Orthology links homologous genes issued from a speciation event. Because ortholog genes tend to be more similar in their biological function than paralogs, inferring accurate orthology predictions is a fundamental step in genome annotation, gene function analysis and phylogenomics.
This inference is however a difficult task. Incomplete genomes, domain loss and gains, gene loss, horizontal transfers, and fast-paced divergence between genes and gene families are some phenomenon explaining why new inference approach are continuously developed by the research community.

Graph-based orthology inference is generally considered as an interesting alternative to the more computationally-demanding phylogeny-based approaches (eg: species and gene tree inference + reconciliation). During my Phd, I developed Orthoinspector, an algorithm based on bi-partite graphs and design to infer orthologs relationships between the proteomes of two genomes. By extension, compiling every pairwise genome comparison for a set of genomes allow deducing orthologs between hundreds to thousands of complete genomes (Linard et al, 2011). Eight years and thousands of new complete genomes later, the Orthoinspector database still remains one of the largest online resource for pre-computed orthologs (Never et al, 2018).
The challenges of orthology lead me to join the Quest for Ortholog consortium. This workgroup aims to provide standards for the orthology community and maintains a benchmarking platform that evaluates the quality of new predictions.

More recently, I started to explore a link between the phylo-k-mer approach (axis 1) and one of the usage application of pre-computed orthology relationships: high-throughput and accurate classification of coding sequences into known gene families.

Axis 3 : Pipelines relevant to evolutionary & environmental applications

Beyond applied algorithms and software implementation, my research focuses on the methodological challenges encountered by the rapid development of environmental genomics. The success of an approach based on environmental DNA (eDNA), often means tight integration between every methodological steps, from sampling, DNA extraction, DNA sequencing, to bioinformatic analysis and data integration. Through international projects and consortiums, I specialized into the analysis of biological sequence sampled in different types of environments (soil, water, gut contents...). Not only am I interested in improving the generation of the DNA data itself, but I also aim for data integration with other features (ex: environmental or geographical parameters) and novel applications of data mining.

From a field point of view, I studied an alternative approach of eDNA sequencing, labeled "metagenome skimming" (Linard et al, 2016), as well as its more selective derivative, the approach of "mitochondrial mitogenomics" (Crampton-platt et al, 2016). These approaches are an alternative to metabarcoding when long taxonomic markers (> several kb), are seeked for poorly known ecosystems. As longer sequences generally means more phylogenetic signal, these approaches are an interesting basis for species-rich phylogenetic reconstruction, a critical step not only in systematics, but also for biodiversity studies where the completeness of reference database is essential.

These project made me participate in a large panel of thematics, including NGS protocols optimisation, arthropod systematics, biodiversity indices and the extraction of trophic interactions as well as microbial symbiotic interactions from gut contents. More recently, I focused on large-scale bioinformatic analysis of metabarcoding data and issues related to analysis reproducibility and scalability.

All together, tight interactions with researchers in Ecology, Evolution, Biodiversity and Conservation give me a vision encompassing every step of a research project based on eDNA. I aim to convert this expertise into the development of expert bioinformatics tools and pipelines that are adjusted to those applications.

Publications

See the DEDICATED PAGE
for a complete list of my scientific publications.

Other options :

Researchgate

Google Scholar

ORCID

Software & databases

Repositories

Phylo42 :
Phylo-k-mer dev.
centralization

github

Software

	Orthology inference	webpage	v1.0 - v3.0 Linard et al, 2012 Linard et al, 2015 Nevers et al, 2019
Bioanalysis pipelines : - Mitochondrial Metagenomics - Metagenome Skimming	QC, Assembly, Annotations		Linard et al, 2015 Linard et al, 2016 Andujar et al, 2019
	RAPPAS : Alignment-free phylogenetic placement	github	Linard et al, 2018
.	SHERPAS : Alignment-free recombination detection	github	Scholz et al, 2020
	PEWO : Phylogenetic placement benchmarking	github	Linard et al, 2020
Coming soon	CLAPAS : Phylo-k-mer based classification of coding sequences	.	:)

Databases

Pre-computed orthologs for thousands of complete genomes	website	v1.0 - v3.0 Linard et al, 2012 Linard et al, 2015 Nevers et al, 2019
KD4V : (Knowledge Discovery for Missense Variant)	webpage	Dao et al, 2012
Visual exploration of gene family evolutionary histories	Currently down	Linard et al, 2012

Other contributions

	Regular contributor	github	.
Taxonomy Java Tools	Java library	github	Linard et al, 2018
	Dataset contributions to the QFO consortium	website	Altenhoff et al,2016 Forslund et al,2018 Linard et al, 2021
	Metabarcoding analysis plateform	spygen.com vigilife.org

Tutoring

Professional Tutoring

CNRS formations (since 2017)

- "Molecular Phylogeny" & "Advanced Phylogeny" -
New modules introducing to the algorithms and applications of phylogenetic placement of environemental data.

- "NGS Data Analysis" -
New module focusing on OTU clustering and taxonomic classification.

Bioinformatics workshops

- 2013-2016 : NHM, London -
New 2 days workshops for the Phd Program introducing to sequence analysis and metagenomics.

- 2010-2012 : IGMBC, France -
Yearly intervention in the internal bioinformatics workshop.

Students supervision

Co-supervision of 1 Phd thesis.
- phylo-k-mer computation and phylogenetic placement. (director Eric Rivals)
Regular supervision of Licence (Bachelor) and Master students. Currently 10 student (2021). Some projects lead to co-authorship. (ex: Orthoinspector v2)

University lectures

So far, I had the chance to give lecture and practical courses for 3 different universities (Strasbourg ; Imperial College London ; Montpellier), for more than 380 hours of effective teachings.

Depending on the subjects, those were given for Computer Science, Ecology/Environment, or Health/Agronomy or Earth Sciences departments. For simplification, I regroup these interventions under the following thematics:

Computer sciences

Algorithms in bioinformatics (data representation, trees, graph, clustering, phylogeny...)
Introduction to programming : Python, Shell, Java ...
Introduction to Unix environments & shell scripting
Models for object programming (UML, software design)
Advanced Java Programming (GUI, RDBMS, networking, threading ...)
Data sciences : multivariate data analysis & visualisation, python, R, jupyter ...
Introduction to Devops : Gitflow, continuous integration, containers (Docker)

Evolution, Ecology & Biodiversity

Bases of population dynamics: competition, selection, models, practical courses in web lab.
Practical courses for the analysis of NGS / metagenomic data
Introduction to sequence databases and sequence analysis (data extraction, alignment, phylogeny)
Introduction to eDNA approaches and applications

Genomics

Bioinformatic methods for genome annotation (de novo & reference-based)
Methodological approaches for comparative genomics
Introduction to Transcriptomics / Proteomics
Functional annotation : orthology, gene ontology, biological networks

Public outreach

In a world were scientific facts are regularly challenged, I am deeply convinced that scientific outreach and vulgarisation are critical subjects.
When time permits, I am participating to activities related to science vulgarisation in schools and in public events. Below is a non-exhaustive list of events in which I was or continue to be involved either to create new activities or give public talks.

Openlab :
A program lead by the University of Strasbourg, in which Phd Students go to regional High School and give a practical course introducing DNA molecular analysis and genetics as a Police enquiry scenario.

Fêtes de la science :
Regular participation to this French national day dedicated to science outreach.
I mostly propose activities introducing what are bioinformatics. In particular, I created some mini-games introducing the discrepancies between species traits (ecology, morphology...) and DNA similarity (evolutionary relationships).
In the same context and in partnership with the Académie de Montpellier, I intervene into priority area schools to introduce kids to the field of science, what is the university and what is the job of a researcher. My key point being that these are not inaccessible jobs.
Université Tiers Temps Montpelier :
An open university service of the university of Montpellier teaching a large array of scientific, literary, law subjects to adults. I gave general talks related to bioinformatics and environmental DNA.
Science Uncovered :
An event organized by the Natural History Museum of London in which scientists have a direct contact with the public. With my former colleagues, we were leading recreationnal activities of DNA extraction and talks related to environmental DNA and biodiversity exploration.

Contact

For professional enquiries :

University lectures (Montpellier)		mail
Spygen	website
LIRMM	website	mail

For other inquiries : gmail