In the SANUM project (2017-2018) we were interested in the use of NIR sensors and applications in the agri-food sector for quality control in small businesses.
The project was funded by the European Union with the support of the Autonomous Region of Sardinia and Sardegna Ricerche (POR FESR Sardinia 2014-2020). In this blog we will tell you what we have done and show you some results.
We have all heard about it. Nutrition-related health problems such as food allergies, obesity, diabetes and cardiovascular disease have grown in epidemic proportions. Furthermore, they are also causing a heavy toll on our society and our health systems. For this reason, the European Commission has launched a challenge to ICT companies, the Horizon Prize, for the development of an economic and non-invasive mobile solution that would allow users to measure and analyze food directly. Up for grabs: € 1 Million! Three companies have shared this award.
We asked ourselves why not try to use these new technologies in the Nurideas R&D and integrate them into the traceability and quality control we were developing for micro agribusiness companies?
The main objective of the SANUM project was the study of new functions – relating to food traceability and personalized nutrition. To later integrate these functions into the ecosystem of products and services developed by Nurideas. And thus increase the level of innovation of our company pipeline.
The project included a scientific validation process for the sensors divided into four phases:
- collection of an adequate number of food samples from dairy farms;
- analysis of the samples at the NMR and GC-MS spectrometry laboratories of the University of Cagliari;
- in parallel, the analysis of the same samples through two molecular sensors currently available on the market;
- validation of the results obtained.
We have tested only 2 of the 3 sensors that were the winners of the European Horizon Price Food scanner award in 2016. The aim of the project was twofold. Firstly to analyze the sensitivity level of the sensors and validate how they function. Next to understand if they could be used by micro and small agri-food businesses.
The NIR sensors
These two sensors use NIR (Near-InfraRed spectroscopy) technology but operate in different wavelength ranges (SCiO: 750-1040 nm; Tellspec: 900-1700 nm).
Both sensors communicate with the user’s smartphone via the Bluetooth connection and a specific app.
From a technical point of view, the sensors consist of a light source that irradiates the food sample and an optical sensor (spectrometer) that converges the rays of light reflected by the sample, returning a spectrum as a final analysis. The spectra are sent to the cloud server of the developer company. There they are analyzed and re-sent to the app on the user’s smartphone.
For the SANUM project we involved 3 dairy companies and a multifunctional agritourism located in different parts of Sardinia. Between December 2017 and July 2018, we analyzed various types of milk (goat, sheep and mixed sheep-goat) and corresponding cheeses.
In all, we analyzed 367 samples corresponding to 16 different products. We obtained a total of 2319 spectra. We can consider a spectra like a fingerprint for that product.
The results obtained with NIR sensors were compared with data obtained through more sophisticated instruments (NMR and GC-MS) of the University of Cagliari laboratories: the Department of Biomedical Sciences with the Metabolomics Clinical of Prof. Luigi Atzori and the Department of Life and Environment Sciences (DISVA) of Prof. Luigi Caboni.
Validation of results
Subsequently, for the validation of the sensors, we carried out comparative analysis. In detail, we compared the data obtained from NMR/GC-MS to data obtained from the sensors.
To illustrate let us look at an example of comparative analysis. The image shows the comparison of the results obtained on the three types of sampled cheese: fresh (cheese processing), semi-seasoned (about 5-6 months) and seasoned (more than 12 months).
Just by looking at it, we can see that the NIR spectra show marked differences between fresh cheeses and those with different degrees of seasoning.
The NMR spectra also show various differences between these three types of cheese:
- lactose is uniquely present in fresh cheese;
- arginine, lysine, glutamate, glutamine, histamine are much more concentrated in seasoned cheese and absent in the cool.
The analysis we have done on these samples are data-mining, using the tools offered by the “TheLab” application on the ConsumerPhysics (SCiO sensor developer company). TheLab allows you to perform PCA analysis (Principal Component Analysis) on the spectra and view the results in graphical form.
PCA (Principal Component Analysis) is a technique used to emphasize variations and bring out strong patterns in spectra. In this case, we use the PCA to make the data easier to explore and visualize. Indeed, the PCA reduces the entire spectrum from a vector of 330 values (one per wavelength) to a shorter vector (typically 3-6 values). All this without losing too much information present in the original spectrum.
In a first step, we analyze 4 cheeses that we easily recognize by looking at them. Let’s see if the sensors also spot differences.
In detail we search for “clusters” or clouds of data that have the same properties. We did this type of analysis for all samples. This has allowed us to filter outliers. Then we proceeded to clean up the data for the next creation of the models.
What emerges clearly from this analysis is the distinction between provola and ricotta. On the contrary the clouds (clusters) related to stracchino and mozzarella, respectively, are closer, less distinct.
This first step helps to visualize the quality of the data and to find anomalous spectra, these so-called “outliers”. Once we have cleaned the data set, we can create models that we will use later on for the recognition of new samples of each cheese.
We create a data-mining model by applying an algorithm to the data: it is a set of data, statistics and schema. We then apply this model to new data to generate predictions and inferences about relationships.
In general, we create two types of models:
Classification model: classification is category differentiation, based on the spectral fingerprint of the components in each category. This type of model, for example, allows you to classify samples by milk type, by feed type, by type of cheese. Users can then scan their sample and identify their family.
Estimation model: the estimation model works on a collection of samples that have a common numerical attribute, such as sugar content, fat content and so on. Based on previously measured samples that have a complete range of the target attribute, users can scan their sample and find out the value of the attribute of interest.
The number of samples and the data collected only allowed us to create some classification models to test the validity of the approach. In order to use these models “in production” we need a higher number of samples.
Below is an example of our approach to a classification model.
1 – First analysis
We carried the first analysis by merging the cheese collections of two of the companies involved. They produce and transform sheep’s milk obtained from two different sheep breeds (Sardinian black sheep and white sheep, respectively). We selected four types of cheese: mozzarella, provola, ricotta and stracchino. We had collected a significant number of samples and related NIR spectra.
As we saw in the previous PCA analysis, you see 4 clusters, 1 for each type of cheese. However, the mozzarella and stracchino clusters have some overlaps. We will also find these in the model.
2 – Model generation
Using the model generation tool in “TheLab” we created a ranking model for these four products.
In the figure we see the expected performance of the classification model (the so-called “confusion matrix”).
As expected following the PCA analysis, the resulting model shows some confusion. Because we are mainly analyzing samples of fresh cheeses. These do not in themselves have huge differences (see mozzarella and stracchino). This is represented by the values in the boxes outside the diagonal (in pink). Indeed, the model differentiates well the different products when the green boxes on the diagonal have values >90% and no other boxes has values >5-10%.
3 – Model test
To test the model and verify its validity, we used sheep-type samples which were not used to create the model. The results match the pattern: the ricotta is well recognized. On the contrary, the samples of “lavorazione formaggio” (cheese processing) are recognized as mozzarella or stracchino.
4 – Conclusions
To develop a more precise model, we will need to analyze multiple samples for each type of cheese. Even more we should involve a larger number of companies. We should also develop the sampling along one or more years of production. Because we must take into account the differences due to seasonality, weather conditions etc. Remember that these products are artisanal and therefore have much more variability than an industrial product.
The SANUM project (2017-2018) was funded by the European Union with the support of the Autonomous Region of Sardinia and Sardegna Ricerche (POR FESR Sardinia 2014-2020). This project has allowed us to study NIR sensors and applications in the agri-food sector. Furthermore it opens the door to new developments for Nurideas.