Preloader image

SHREDID

Until now, paper shredders have been widely used to "reliably" destroy sensitive information. With our newly developed tool, each shredder's identity is no longer hidden.

Scenario

Let us take an example. You managed to find and reconstruct a destroyed document that may prove illegal activities such as tax evasion of a certain company X. Company X denies and blame another company Y for planting evidence. You have access to both shredders of the two companies. With ShredID, you may now find out which company is guilty.

Usage

So how exactly can you know the origin of a document?

The process consists of two main phases.

Learning phase

You must first of all assign data (as scanned images of shredded documents) to two shredder profiles which will be the classification subject.

Prediction phase

Now you should provide unknown data which will be classified. You may then choose over several classification methods that our tool provides. These selected methods will then contribute to a final prediction of whether the provided data belongs to one shredder or the other.

Preprocessing


Whether you are in learning or prediction phase, the first step of our tool's work is to preprocess scanned images. We first clean them up with morphology operations. Then comes a color detection in an HSV space to extract the strips from the green background. The next step is to extract the paper strips using a contour detection algorithm and straighten them in case they were slightly distorted. Finally, we separate the strip into two half-strips and generate their profile as a 1-dimensional signal.

This 1D signal will be used to generate data we will feed the classifiers with, such as Recurrence Plots and Fourier transforms.

Close-up of a shredder teeth mark on a document

Classification

In order to predict the correct shredder, we use several classification methods. Some are very well known machine learning techniques that have been proven to be efficient whereas others are a little bit more experimental and tailor-made for this application.

Width comparison

Sometimes, the simpler the better. We use the paper strip width to compare shredders characteristics. As simple as it seems, it can sometimes be all we need to find the correct shredder.

Fourier transform analysis

Heavily used in signal processing, this technique allows us to extract fundamental frequencies from the signal created by the teeth of the shredder's wheels. We then compare those frequencies and their respective amplitudes to a single reference spectrum for each known shredder.

Random forest

Some characteristics of the signal such as maximum and minimum derivatives, main harmonics from the Fourier transform or mean width is extracted to be fed to a random forest model, a decision tree based bagging classifier.

Recurrence plot CNN

Deep Learning technics usually outperform every other algorithm on image classification tasks. We use convolutional neural nets on a recurrence plot (a 2D representation of a 1D signal) to predict the right shredder.

KNN

The characteristics that we pass to the random forest are also used by our last classifier. The K nearest neighbours algorithm allows us to represent a strip of paper in a high dimensional space, and measure how far away it is from known data points.

Team

Team Image

Gildas Avoine

Supervisor
Team Image

Florian Arnoud

Data Science
Team Image

Justin Bouvet

Data Science
Team Image

Alexis Jensen

Media & Interactions
Team Image

Cristian-David Martinez-Collazos

Team Image

Kim-Phan Nguyen

Data Science
Team Image

Lucien Poirier

Security
Team Image

Manuel Poisson

Security

Tools and frameworks

Some of the projects we use

Python

Heavily used scripting language

OpenCV

Computer vision framework

ScikitLearn

Machine Learning framework

Tensorflow

Deep Learning framework

Keras

Deep Learning framework

StreamLit

User interface tool