26 March 2024 | Hao Han, Ruyi Sha, Jing Dai, Zhenzhen Wang, Jianwei Mao and Min Cai
This study aims to develop a rapid and reliable method for tracing the origin of garlic samples from five distinct regions in China (Yunnan, Shandong, Henan, Anhui, and Jiangsu Provinces) using a combination of ultraviolet and mid-infrared spectroscopy. The chemical and nutritional composition of garlic varies significantly based on its production location, affecting its flavor and functional properties. To address this, the study collected 225 purple-skinned garlic samples and analyzed them using mid-infrared and ultraviolet spectroscopy. Three preprocessing methods—Multiple Scattering Correction (MSC), Savitzky–Golay Smoothing (SG Smoothing), and Standard Normalized Variate (SNV)—were applied to reduce background noise. Genetic Algorithm (GA) was used for feature extraction, and four machine learning algorithms—XGboost, Support Vector Classification (SVC), Random Forest (RF), and Artificial Neural Network (ANN)—were employed to classify the samples. The results showed that the SNV-GA-SVC, SNV-GA-RF, SNV-GA-ANN, and SNV-GA-XGboost models achieved 100% accuracy in both training and test sets when fusion data from ultraviolet and mid-infrared spectroscopy was used. The study concludes that the combination of these spectroscopic techniques and chemometrics provides a robust foundation for identifying the origin of garlic and other agricultural products, enhancing product traceability and consumer trust.This study aims to develop a rapid and reliable method for tracing the origin of garlic samples from five distinct regions in China (Yunnan, Shandong, Henan, Anhui, and Jiangsu Provinces) using a combination of ultraviolet and mid-infrared spectroscopy. The chemical and nutritional composition of garlic varies significantly based on its production location, affecting its flavor and functional properties. To address this, the study collected 225 purple-skinned garlic samples and analyzed them using mid-infrared and ultraviolet spectroscopy. Three preprocessing methods—Multiple Scattering Correction (MSC), Savitzky–Golay Smoothing (SG Smoothing), and Standard Normalized Variate (SNV)—were applied to reduce background noise. Genetic Algorithm (GA) was used for feature extraction, and four machine learning algorithms—XGboost, Support Vector Classification (SVC), Random Forest (RF), and Artificial Neural Network (ANN)—were employed to classify the samples. The results showed that the SNV-GA-SVC, SNV-GA-RF, SNV-GA-ANN, and SNV-GA-XGboost models achieved 100% accuracy in both training and test sets when fusion data from ultraviolet and mid-infrared spectroscopy was used. The study concludes that the combination of these spectroscopic techniques and chemometrics provides a robust foundation for identifying the origin of garlic and other agricultural products, enhancing product traceability and consumer trust.