BechaML - Datasets Evaluation

Discover BechaML ! Our tool to guarantee the quality of benchmarks for Machine Learning.

BechaML Application

Our project

This project is conducted as part of our 4th-year project module within INSA Rennes' informatics department.

The context

BechaML focuses on Machine Learning for Natural Langage Processing (NLP) and aims to solve a straightforward problem. In several scientific papers, the public benchmarks used to evaluate and compare Machine Learning systems were biased or of poor quality.

A fresh solution

BechaML aims to benchmark benchmarks. Each benchmark is submitted to Machine Learning algorithms. Metrics are then calculated and compared. If the algorithms are relatively recent, we can then tell which benchmarks are still relevant.

Objectives of our project

One solution regarding this issue was introduced in a research paper : Benchmarking benchmarks - introducing new automatic indicators for benchmarking Spoken Language Understanding corpora by BECHET and RAYMOND, 2019. It is implemented inside BechaML. The principle is the following : Multiple algorithms are tested with the same dataset. From the executions, we can then, for each line of the dataset, come to one of the following conclusions :

The classifiers all agree and predict the expected class, the test is considered trivial, judged too easy.

They all agree, but none predicts the expected class, this might be due to an annotation mistake. It is, therefore, necessary to take a closer look at the affected lines to further analyze them.

They don’t agree on their predictions, some are correct, some aren’t, a cluster of examples emerges as a real challenge for the classifiers. It is in this case that we really need to save this dataset for comparison.

They don’t agree, and none predict the right class, we have either identified a challenge for every one of our algorithms or there’s a problem in the dataset.

Features of our application

BechaML is a web application.

The software delivers a large amount of services. After the execution of the experiment, we have access to different metrics (confusion matrix, agreement matrix, etc…). We also get a visualisation of the predictions of each algorithm, label after label of each dataset. This way, we can analyze and understand the cause of each predictions.

BechaML can be executed locally or on a cluster. The cluster mode is one of the main assets of the application. It is necessary in order to process the different phases of an experiment, or several experiments at once. This allows to reduce the processing time.

  • 01- Import the datasets

  • 02- Import the algorithms used to test these datasets

  • 03- Start the experiments

  • 04- Parallelize the processing of these experiments

  • 05- Fetch the results

BechaML architecture


The Back-End was developped in Python. Its role is to launch the experiments and get the necessary data to generate experiment reports.

Each experiment is divided into modules, each executed in their respective container.


The Front-End is programmed with the Angular Framework. The graphical interface relies on Bootstrap, a CSS framework. This choice guarantees the esthetics of our application.

The communication between the Front-End and Back-End is realised using a Rest API. To do so, Django handles the Backend.


The experiments are launched in separate containers on a Google Cloud. The orchestration of these containers is made easier by Kubernetes.

The user can then manage the different resources of the cluster to optimise the processing times.

Our Team

Dakini Mallam Garba

Dakini Mallam Garba

Volodia Parol-Garino

Volodia Parol-Garino

Hugo Thomas

Hugo Thomas

Adrien Paillé

Adrien Paillé

Julien Letoile

Julien Letoile

Romain Hubert

Romain Hubert

Romain HU

Romain HU