A Comprehensive Review of Artificial Intelligence Techniques for Black Pepper and Spice Identification Using Computer Vision and Deep Learning

Main Article Content

Deep Kamle
Dr. Satyendra Sharma
Dr. Hemang Shrivastava

Abstract

The identification and classification of spices, particularly black pepper represent a significant challenge in food quality assurance supply chain management and adulteration detection. Conventional analytical methods, including chromatography and spectroscopy although precise are time-consuming expensive, and require expert knowledge limiting their applicability in field and industrial settings. Recent advancements in artificial Intelligence (AI), Computer Vision, and deep learning have opened transformative avenues for automated, non-destructive, and real-time spice identification systems. This comprehensive review critically examines the evolution and current state of AI-based techniques for black pepper and spice identification, encompassing traditional machine learning methods, convolutional neural network (CNN) architectures, and advanced deep learning models including ResNet, EfficientNet, Vision Transformers (ViT), and hybrid architectures. The review systematically covers image preprocessing pipelines, feature extraction strategies, publicly available benchmark datasets, and evaluation methodologies used in the field. A comparative analysis of recent studies highlights the superiority of transformer-based models over classical CNNs in multi-class spice recognition tasks, while lightweight architectures such as MobileNet and SqueezeNet demonstrate viability for edge deployment in resource-constrained environments. This paper identifies critical research gaps, including the scarcity of large-scale annotated spice datasets, model interpretability limitations, and the need for standardized evaluation benchmarks. This review concludes by delineating future research directions, including federated learning, explainable AI (XAI), and multi-model fusion frameworks, to advance the field toward robust, deployable, and industry-ready spice identification solutions.

Citations

Downloads

Download data is not yet available.

Article Details

Section

Research Articles

Author Biographies

Dr. Satyendra Sharma, SAGE University Indore

Dr. Satyendra Sharma is an Associate Professor in Sage University, Indore.

Dr. Hemang Shrivastava, SAGE University Indore

Dr. Hemang Shrivastava ia a Professor in Sage University, Indore

How to Cite

Kamle, D., Sharma, D. S., & Shrivastava, D. H. (2026). A Comprehensive Review of Artificial Intelligence Techniques for Black Pepper and Spice Identification Using Computer Vision and Deep Learning. Interdisciplinary Journal of AI, Machine Learning & Data Science, 1(3), e002. https://doi.org/10.66261/805c6c91

References

[1] Lu, S., Zhang, M., Shen, D., Deng, D., & Rui, L. (2025). Advancing spices quality with artificial intelligence: Research progress and future prospects. Food Reviews International. https://doi.org/10.1080/87559129.2025.2569568 DOI: https://doi.org/10.1080/87559129.2025.2569568

[2] Nargesi, M. H., & Kheiralipour, K. (2024). Ability of visible imaging and machine learning in detection of chickpea flour adulterant in original cinnamon and pepper powders. Heliyon, 10(16), e35944. https://doi.org/10.1016/j.heliyon.2024.e35944 DOI: https://doi.org/10.1016/j.heliyon.2024.e35944

[3] Balakrishnan, S. B., Padmanaban, P., & Malvannan, L. (2026). PiperNet: A hybrid deep learning approach for monitoring papaya seed adulteration in black pepper using hyperspectral imaging. Food Additives & Contaminants: Part A, 43(1), 15–31. https://doi.org/10.1080/19440049.2025.2598389 DOI: https://doi.org/10.1080/19440049.2025.2598389

[4] Li, J., Lu, Y., & Qin, J. (2023). Editorial: Evaluation of quality and safety of agricultural products by non-destructive sensing technology. Frontiers in Plant Science, 14, 1203029. https://doi.org/10.3389/fpls.2023.1203029 DOI: https://doi.org/10.3389/fpls.2023.1203029

[5] Sundaram, A., Masud, A., AlMarhoon, A., & Sarmah, B. (2022). Transfer learning approach for classification of widely used spices. Yanbu Journal of Engineering and Science, 19(2), 1–21. https://doi.org/10.53370/001c.35690 DOI: https://doi.org/10.53370/001c.35690

[6] Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaria, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(1), 53. https://doi.org/10.1186/s40537-021-00444-8 DOI: https://doi.org/10.1186/s40537-021-00444-8

[7] Terven, J., Córdova-Esparza, D. M., & Romero-González, J. A. (2023). A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction, 5(4), 1680–1716. https://doi.org/10.3390/make5040083 DOI: https://doi.org/10.3390/make5040083

[8] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90 DOI: https://doi.org/10.1109/CVPR.2016.90

[9] Li, Z., Wang, D., Zhu, T., Tao, Y., & Ni, C. (2024). Review of deep learning-based methods for non-destructive evaluation of agricultural products. Biosystems Engineering, 245, 56–83. https://doi.org/10.1016/j.biosystemseng.2024.07.002 DOI: https://doi.org/10.1016/j.biosystemseng.2024.07.002

[10] Gao, X., Xiao, Z., & Deng, Z. (2024). High accuracy food image classification via vision transformer with data augmentation and feature augmentation. Journal of Food Engineering, 365, 111833. https://doi.org/10.1016/j.jfoodeng.2023.111833 DOI: https://doi.org/10.1016/j.jfoodeng.2023.111833

[11] Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97, 6105–6114. https://proceedings.mlr.press/v97/tan19a.html

[12] Ma, N., Wu, Y., Bo, Y., & Yan, H. (2024). Chili pepper object detection method based on improved YOLOv8n. Plants, 13(17), 2402. https://doi.org/10.3390/plants13172402 DOI: https://doi.org/10.3390/plants13172402

[13] Thite, S., Godse, D., Patil, K., Chumchu, P., & Nyandoro, A. (2024). Facilitating spice recognition and classification: An image dataset of Indian spices. Data in Brief, 57, 110936. https://doi.org/10.1016/j.dib.2024.110936 DOI: https://doi.org/10.1016/j.dib.2024.110936

[14] Nfor, K. A., Armand, T. P. T., Ismaylovna, K. P., Joo, M. I., & Kim, H. C. (2025). An explainable CNN and Vision Transformer-based approach for real-time food recognition. Nutrients, 17(2), 362. https://doi.org/10.3390/nu17020362 DOI: https://doi.org/10.3390/nu17020362

[15] Iqbal, N., Mumtaz, R., Shafi, U., & Zaidi, S. M. H. (2021). Gray level co-occurrence matrix (GLCM) texture based crop classification using low altitude remote sensing platforms. PeerJ Computer Science, 7, e536. https://doi.org/10.7717/peerj-cs.536 DOI: https://doi.org/10.7717/peerj-cs.536

[16] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y DOI: https://doi.org/10.1007/s11263-015-0816-y

[17] Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7132–7141. https://doi.org/10.1109/CVPR.2018.00745 DOI: https://doi.org/10.1109/CVPR.2018.00745

[18] Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), LNCS 11211, 3–19. https://doi.org/10.1007/978-3-030-01234-2_1 DOI: https://doi.org/10.1007/978-3-030-01234-2_1

[19] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2010.11929

[20] Xiao, Z., Diao, G., & Deng, Z. (2024). Fine grained food image recognition based on Swin Transformer. Journal of Food Engineering, 380, 112134. https://doi.org/10.1016/j.jfoodeng.2024.112134 DOI: https://doi.org/10.1016/j.jfoodeng.2024.112134

[21] Nargesi, M. H., Amiriparian, J., Bagherpour, H., & Kheiralipour, K. (2024). Detection of different adulteration in cinnamon powder using hyperspectral imaging and artificial neural network method. Results in Chemistry, 9, 101644. https://doi.org/10.1016/j.rechem.2024.101644 DOI: https://doi.org/10.1016/j.rechem.2024.101644

[22] Lohumi, S., Lee, S., Lee, H., & Cho, B. K. (2015). A review of vibrational spectroscopic techniques for the detection of food authenticity and adulteration. Trends in Food Science & Technology, 46(1), 85–98. https://doi.org/10.1016/j.tifs.2015.08.003 DOI: https://doi.org/10.1016/j.tifs.2015.08.003

[23] Park, J. J., Cho, J. S., Lee, G., Yun, D. Y., Park, S. K., Park, K. J., & Lim, J. H. (2023). Detection of red pepper powder adulteration with Allura Red and red pepper seeds using hyperspectral imaging. Foods, 12(18), 3471. https://doi.org/10.3390/foods12183471 DOI: https://doi.org/10.3390/foods12183471

[24] Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Training strategy that makes strongly localizable representations in image classification. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 6023–6032. https://doi.org/10.1109/ICCV.2019.00612 DOI: https://doi.org/10.1109/ICCV.2019.00612

[25] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622 DOI: https://doi.org/10.1145/3422622

[26] Min, W., Wang, Z., Liu, Y., Luo, M., Kang, L., Wei, X., & Jiang, S. (2023). Large scale visual food recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 9932–9949. https://doi.org/10.1109/TPAMI.2023.3237871 DOI: https://doi.org/10.1109/TPAMI.2023.3237871

[27] Arrieta, A. B., Diaz-Rodriguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012 DOI: https://doi.org/10.1016/j.inffus.2019.12.012

[28] Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 DOI: https://doi.org/10.1016/j.ipm.2009.03.002

[29] Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713. https://doi.org/10.1109/CVPR.2018.00286 DOI: https://doi.org/10.1109/CVPR.2018.00286

[30] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 618–626. https://doi.org/10.1109/ICCV.2017.74 DOI: https://doi.org/10.1109/ICCV.2017.74

[31] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6572

[32] Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650. https://doi.org/10.18653/v1/P19-1355 DOI: https://doi.org/10.18653/v1/P19-1355

[33] Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv. https://arxiv.org/abs/2108.07258

[34] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60. https://doi.org/10.1109/MSP.2020.2975749 DOI: https://doi.org/10.1109/MSP.2020.2975749

[35] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1409.1556