ShredID

Usage

So how exactly can you know the origin of a document?

The process consists of two main phases.

Learning phase

You must first of all assign data (as scanned images of shredded documents) to two shredder profiles which will be the classification subject.

Prediction phase

Now you should provide unknown data which will be classified. You may then choose over several classification methods that our tool provides. These selected methods will then contribute to a final prediction of whether the provided data belongs to one shredder or the other.

Preprocessing

Whether you are in learning or prediction phase, the first step of our tool's work is to preprocess scanned images. We first clean them up with morphology operations. Then comes a color detection in an HSV space to extract the strips from the green background. The next step is to extract the paper strips using a contour detection algorithm and straighten them in case they were slightly distorted. Finally, we separate the strip into two half-strips and generate their profile as a 1-dimensional signal.

This 1D signal will be used to generate data we will feed the classifiers with, such as Recurrence Plots and Fourier transforms.

Close-up of a shredder teeth mark on a document

Classification

In order to predict the correct shredder, we use several classification methods. Some are very well known machine learning techniques that have been proven to be efficient whereas others are a little bit more experimental and tailor-made for this application.

Width comparison

Sometimes, the simpler the better. We use the paper strip width to compare shredders characteristics. As simple as it seems, it can sometimes be all we need to find the correct shredder.

Fourier transform analysis

Heavily used in signal processing, this technique allows us to extract fundamental frequencies from the signal created by the teeth of the shredder's wheels. We then compare those frequencies and their respective amplitudes to a single reference spectrum for each known shredder.

Random forest

Some characteristics of the signal such as maximum and minimum derivatives, main harmonics from the Fourier transform or mean width is extracted to be fed to a random forest model, a decision tree based bagging classifier.

Recurrence plot CNN

Deep Learning technics usually outperform every other algorithm on image classification tasks. We use convolutional neural nets on a recurrence plot (a 2D representation of a 1D signal) to predict the right shredder.

KNN

The characteristics that we pass to the random forest are also used by our last classifier. The K nearest neighbours algorithm allows us to represent a strip of paper in a high dimensional space, and measure how far away it is from known data points.

Scenario

Usage

So how exactly can you know the origin of a document?

The process consists of two main phases.

Learning phase

Prediction phase

Preprocessing

Classification

In order to predict the correct shredder, we use several classification methods. Some are very well known machine learning techniques that have been proven to be efficient whereas others are a little bit more experimental and tailor-made for this application.

Width comparison

Fourier transform analysis

Random forest

Recurrence plot CNN

KNN

Team

Gildas Avoine

Supervisor

Florian Arnoud

Data Science

Justin Bouvet

Data Science

Alexis Jensen

Media & Interactions

Cristian-David Martinez-Collazos

Kim-Phan Nguyen

Data Science

Lucien Poirier

Security

Manuel Poisson

Security

Tools and frameworks

Some of the projects we use

Python

Heavily used scripting language

OpenCV

Computer vision framework

ScikitLearn

Machine Learning framework

Tensorflow

Deep Learning framework

Keras

Deep Learning framework

StreamLit

User interface tool