Let us take an example. You managed to find and reconstruct a destroyed document that may prove illegal activities such as tax evasion of a certain company X. Company X denies and blame another company Y for planting evidence. You have access to both shredders of the two companies. With ShredID, you may now find out which company is guilty.
You must first of all assign data (as scanned images of shredded documents) to two shredder profiles which will be the classification subject.
Now you should provide unknown data which will be classified. You may then choose over several classification methods that our tool provides. These selected methods will then contribute to a final prediction of whether the provided data belongs to one shredder or the other.
Whether you are in learning or prediction phase, the first step of our tool's work is to preprocess scanned images. We first clean them up with morphology operations. Then comes a color detection in an HSV space to extract the strips from the green background. The next step is to extract the paper strips using a contour detection algorithm and straighten them in case they were slightly distorted. Finally, we separate the strip into two half-strips and generate their profile as a 1-dimensional signal.
This 1D signal will be used to generate data we will feed the classifiers with, such as Recurrence Plots and Fourier transforms.
Sometimes, the simpler the better. We use the paper strip width to compare shredders characteristics. As simple as it seems, it can sometimes be all we need to find the correct shredder.
Heavily used in signal processing, this technique allows us to extract fundamental frequencies from the signal created by the teeth of the shredder's wheels. We then compare those frequencies and their respective amplitudes to a single reference spectrum for each known shredder.
Some characteristics of the signal such as maximum and minimum derivatives, main harmonics from the Fourier transform or mean width is extracted to be fed to a random forest model, a decision tree based bagging classifier.
Deep Learning technics usually outperform every other algorithm on image classification tasks. We use convolutional neural nets on a recurrence plot (a 2D representation of a 1D signal) to predict the right shredder.
The characteristics that we pass to the random forest are also used by our last classifier. The K nearest neighbours algorithm allows us to represent a strip of paper in a high dimensional space, and measure how far away it is from known data points.