A Taxonomy of DBSCAN Variants Addressing Parameter Dependence and High-Dimensionality

Main Article Content

Anunaya Manoj
Dr. Masood Husain Siddiqui
https://orcid.org/0000-0002-4049-9307
Dr. Deepak Kumar Singh

Abstract

Density-based clustering algorithms have emerged as a significant area of research within unsupervised machine learning due to their capability to identify clusters based on data density. Among these methods, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm has gained considerable attention owing to its robustness against noise and outliers. However, DBSCAN’s sensitivity to input parameters, its inability to detect clusters of varying densities, and its poor performance in high-dimensional spaces limit its applicability. Considering these challenges, this review categorizes variants of DBSCAN into four major groups: (1) parameter-independent, (2) parameter-reduced, (3) high-dimensional, and (4) hybrid algorithms. Within each category, we provide a detailed description of parameter estimation and clustering strategies. Our analysis highlights the use of graph-based methodologies, tree-based and grid structures, and other techniques including RNN, LSH, DLT, and triangular inequality to help automate parameter estimation and accelerate neighborhood searches. Moreover, hybrid strategies incorporating feature selection, feature extraction, graph-based modeling, game theory, and deep learning have shown promising potential in addressing both challenges simultaneously. By consolidating insights across these research directions, this review offers a comprehensive perspective on the evolution of DBSCAN, equipping researchers and practitioners with a deeper understanding of its limitations, existing solutions, and avenues for future exploration.

Citations

Downloads

Download data is not yet available.

Article Details

Data Availability Statement

No data was used.

Section

Research Articles

Author Biographies

Anunaya Manoj, University of Lucknow

Department of Statistics and Research Scholar

Dr. Masood Husain Siddiqui, University of Lucknow

Department of Statistics and Professor

Dr. Deepak Kumar Singh, Indian Institute of Information Technology Lucknow

Department of Information and Technology and Assistant Professor 

How to Cite

Manoj, A., Husain Siddiqui, M., & Kumar Singh, D. (2026). A Taxonomy of DBSCAN Variants Addressing Parameter Dependence and High-Dimensionality. Interdisciplinary Journal of AI, Machine Learning & Data Science, 1(2), e013. https://doi.org/10.66261/s5y5tk64

References

[1] Agapito G, Milano M, Cannataro M (2022) A Python clustering analysis protocol of genes expression data sets. Genes 13(10):1839. https://doi.org/10.3390/genes13101839 DOI: https://doi.org/10.3390/genes13101839

[2] Wang X, et al. (2020) Electricity market customer segmentation based on DBSCAN and k-means: A case on Yunnan electricity market. Proc Asia Energy and Electrical Engineering Symposium (AEEES):869–874. https://doi.org/10.1109/AEEES48850.2020.9121413 DOI: https://doi.org/10.1109/AEEES48850.2020.9121413

[3] Hou J, Liu W, E X, Cui H (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recognition 60:25–36. https://doi.org/10.1016/j.patcog.2016.04.015 DOI: https://doi.org/10.1016/j.patcog.2016.04.015

[4] Ananthi VP, Balasubramaniam P, Kalaiselvi T (2016) A new fuzzy clustering algorithm for the segmentation of brain tumor. Soft Computing 20(12):4859–4879. https://doi.org/10.1007/s00500-015-1775-5 DOI: https://doi.org/10.1007/s00500-015-1775-5

[5] Du Q, Dong Z, Huang C, Ren F (2016) Density-based clustering with geographical background constraints using a semantic expression model. ISPRS International Journal of Geo-Information 5(5):72. https://doi.org/10.3390/ijgi5050072 DOI: https://doi.org/10.3390/ijgi5050072

[6] Kowalski M, et al. (2008) Improved cosmological constraints from new, old, and combined supernova data sets. Astrophysical Journal 686(2):749–778. https://doi.org/10.1086/589937 DOI: https://doi.org/10.1086/589937

[7] Hancer E, Xue B, Zhang M (2020) A survey on feature selection approaches for clustering. Artificial Intelligence Review 53(6):4519–4545. https://doi.org/10.1007/s10462-019-09800-w DOI: https://doi.org/10.1007/s10462-019-09800-w

[8] Ester M, Kriegel HP, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc KDD:226–231.

[9] Ankerst M, Breunig MM, Kriegel HP (1999) OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD Record 28(2):49–60. DOI: https://doi.org/10.1145/304181.304187

[10] Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. Proc KDD:58–65.

[11] Silverman BW (1998) Density estimation for statistics and data analysis. New York: Chapman & Hall.

[12] Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42(3):1–21. https://doi.org/10.1145/3068335 DOI: https://doi.org/10.1145/3068335

[13] Chowdhury S, Helian N, Cordeiro de Amorim R (2023) Feature weighting in DBSCAN using reverse nearest neighbours. Pattern Recognition 137:109314. https://doi.org/10.1016/j.patcog.2023.109314 DOI: https://doi.org/10.1016/j.patcog.2023.109314

[14] Hou J, Gao H, Li X (2016) DSets-DBSCAN: A parameter-free clustering algorithm. IEEE Transactions on Image Processing 25(7):3182–3193. https://doi.org/10.1109/TIP.2016.2559803 DOI: https://doi.org/10.1109/TIP.2016.2559803

[15] Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2021) BLOCK-DBSCAN: Fast clustering for large scale data. Pattern Recognition 109:107624. https://doi.org/10.1016/j.patcog.2020.107624 DOI: https://doi.org/10.1016/j.patcog.2020.107624

[16] Wu F, Gardarin G (2001) Gradual clustering algorithms. Proc DASFAA:48–55. https://doi.org/10.1109/DASFAA.2001.916364 DOI: https://doi.org/10.1109/DASFAA.2001.916364

[17] Kailing K, Kriegel HP, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. SIAM Proceedings Series:246–256. https://doi.org/10.1137/1.9781611972740.23 DOI: https://doi.org/10.1137/1.9781611972740.23

[18] Fahim AM, Salem AM, Torkey FA, Ramadan MA (2006) Density clustering based on radius of data. International Journal of Applied Mathematics and Computer Science 3(2).

[19] Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data 10(1):1–51. https://doi.org/10.1145/2733381 DOI: https://doi.org/10.1145/2733381

[20] Li T, Heinis T, Luk W (2017) ADvaNCE-efficient and scalable approximate density-based clustering based on hashing. Informatica 28(1):105–130. https://doi.org/10.15388/Informatica.2017.122 DOI: https://doi.org/10.15388/Informatica.2017.122

[21] Chen Y, Tang S, Bouguila N, Wang C, Du J, Li H (2018) A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. Pattern Recognition 83:375–387. https://doi.org/10.1016/j.patcog.2018.05.030 DOI: https://doi.org/10.1016/j.patcog.2018.05.030

[22] Agarwal P, Mehta S, Abraham A (2021) A meta-heuristic density-based subspace clustering algorithm for high-dimensional data. Soft Computing 25(15):10237–10256. https://doi.org/10.1007/s00500-021-05973-1 DOI: https://doi.org/10.1007/s00500-021-05973-1

[23] Bataineh B, Alzahrani AA (2023) Fully automated density-based clustering method. Computers, Materials & Continua 76(2):1833–1851. https://doi.org/10.32604/cmc.2023.039923 DOI: https://doi.org/10.32604/cmc.2023.039923

[24] Wang Y, Wang DZ (2023) Learned accelerator framework for angular-distance-based high-dimensional DBSCAN. Proc EDBT. https://doi.org/10.48786/EDBT.2023.42

[25] Kazemi U, Soleimani S (2025) A new approach data processing: DBSCAN clustering using game-theory. Soft Computing 29(3):1331–1346. https://doi.org/10.1007/s00500-025-10405-5 DOI: https://doi.org/10.1007/s00500-025-10405-5

[26] Okkels CB, Aumüller M, Thomsen VB, Zimek A (2025) High-dimensional density-based clustering using locality-sensitive hashing. Proc EDBT. https://doi.org/10.48786/EDBT.2025.56

[27] Xu X, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. Proc ICDE:324–331. https://doi.org/10.1109/ICDE.1998.655795 DOI: https://doi.org/10.1109/ICDE.1998.655795

[28] Mustapha SMFDS (2023) An alternative parameter free clustering algorithm using data point positioning analysis. International Journal of Innovative Computing, Information and Control 19(6):1805–1825. https://doi.org/10.24507/ijicic.19.06.1805

[29] Yang XH, Jin LB, Ye W, Xiao J, Zhang D, Xu XL (2018) Laplacian centrality peaks clustering based on potential entropy. IEEE Access 6:55462–55472. https://doi.org/10.1109/ACCESS.2018.2871500 DOI: https://doi.org/10.1109/ACCESS.2018.2871500

[30] Abdulhameed TZ, Yousif SA, Samawi VW, Al-Shaikhli HI (2024) SS-DBSCAN: Semi-supervised density-based spatial clustering of applications with noise. IEEE Access 12:131507–131520. https://doi.org/10.1109/ACCESS.2024.3457587 DOI: https://doi.org/10.1109/ACCESS.2024.3457587

[31] Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Physics Procedia 24:1166–1170. https://doi.org/10.1016/j.phpro.2012.02.17 DOI: https://doi.org/10.1016/j.phpro.2012.02.174

[32] Starczewski A, Goetzen P, Er MJ (2020) A new method for automatic determining of the DBSCAN parameters. Journal of Artificial Intelligence and Soft Computing Research 10(3):209–221. https://doi.org/10.2478/jaiscr-2020-0014 DOI: https://doi.org/10.2478/jaiscr-2020-0014

[33] Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: Parameter reduction and outlier detection. Information Systems 38(3):317–330. https://doi.org/10.1016/j.is.2012.09.001 DOI: https://doi.org/10.1016/j.is.2012.09.001

[34] Kim JH, Choi JH, Yoo KH, Nasridinov A (2019) AA-DBSCAN: An approximate adaptive DBSCAN for finding clusters with varying densities. Journal of Supercomputing 75(1):142–169. https://doi.org/10.1007/s11227-018-2380-z DOI: https://doi.org/10.1007/s11227-018-2380-z

[35] Ros F, Guillaume S, Riad R, El Hajji M (2022) Detection of natural clusters via S-DBSCAN. Knowledge-Based Systems 241:108288. https://doi.org/10.1016/j.knosys.2022.108288 DOI: https://doi.org/10.1016/j.knosys.2022.108288

[36] Cheng D, et al. (2024) GB-DBSCAN: A fast granular-ball based DBSCAN clustering algorithm. Information Sciences 674:120731. https://doi.org/10.1016/j.ins.2024.120731 DOI: https://doi.org/10.1016/j.ins.2024.120731

[37] Lv Y, et al. (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22. https://doi.org/10.1016/j.neucom.2015.05.109 DOI: https://doi.org/10.1016/j.neucom.2015.05.109

[38] Bryant A, Cios K (2018) RNN-DBSCAN: A density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Transactions on Knowledge and Data Engineering 30(6):1109–1121. https://doi.org/10.1109/TKDE.2017.2787640 DOI: https://doi.org/10.1109/TKDE.2017.2787640

[39] Dai QZ, Xiong ZY, Xie J, Wang XX, Zhang YF, Shang JX (2019) A novel clustering algorithm based on the natural reverse nearest neighbor structure. Information Systems 84:1–16. https://doi.org/10.1016/j.is.2019.04.001 DOI: https://doi.org/10.1016/j.is.2019.04.001

[40] Uncu O, Gruver WA, Kotak DB, Sabaz D, Alibhai Z, Ng C (2006) GRIDBSCAN: Grid density-based spatial clustering of applications with noise. Proc IEEE SMC:2976–2981. https://doi.org/10.1109/ICSMC.2006.384571 DOI: https://doi.org/10.1109/ICSMC.2006.384571

[41] Latifi-Pakdehi A, Daneshpour N (2021) DBHC: A DBSCAN-based hierarchical clustering algorithm. Data & Knowledge Engineering 135:101922. https://doi.org/10.1016/j.datak.2021.101922 DOI: https://doi.org/10.1016/j.datak.2021.101922

[42] Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recognition 102:107206. https://doi.org/10.1016/j.patcog.2020.107206 DOI: https://doi.org/10.1016/j.patcog.2020.107206

[43] Li M, Bi X, Wang L, Han X (2021) A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm. Computer Communications 167:75–84. https://doi.org/10.1016/j.comcom.2020.12.019 DOI: https://doi.org/10.1016/j.comcom.2020.12.019

[44] Kryszkiewicz M, Lasek P (2010) TI-DBSCAN: Clustering with DBSCAN by means of the triangle inequality. Rough Sets and Current Trends in Computing 6086:60–69. https://doi.org/10.1007/978-3-642-13529-3_8 DOI: https://doi.org/10.1007/978-3-642-13529-3_8

[45] Mai ST, He X, Feng J, Böhm C (2013) Efficient anytime density-based clustering. Proc SIAM SDM:112–120. https://doi.org/10.1137/1.9781611972832.13 DOI: https://doi.org/10.1137/1.9781611972832.13

[46] Jahirabadkar S, Kulkarni P (2014) Algorithm to determine ε-distance parameter in density based clustering. Expert Systems with Applications 41(6):2939–2946. https://doi.org/10.1016/j.eswa.2013.10.025 DOI: https://doi.org/10.1016/j.eswa.2013.10.025

[47] Ding H, Yang F (2020) On metric DBSCAN with low doubling dimension. arXiv preprint arXiv:2002.11933. DOI: https://doi.org/10.24963/ijcai.2020/426

[48] Xing Z, Zhao W (2024) Block-diagonal guided DBSCAN clustering. IEEE Transactions on Knowledge and Data Engineering 36(11):5709–5722. https://doi.org/10.1109/TKDE.2024.3401075 DOI: https://doi.org/10.1109/TKDE.2024.3401075

[49] Liu K, Zhou D, Zhou X (2009) Clustering by ordering density-based subspaces. Proc Visual Data Mining Workshop:197–201.

[50] Agarwal P, Mehta S (2019) ABC_DE_FP: A novel hybrid algorithm for complex continuous optimisation problems. International Journal of Bio-Inspired Computation 14(1):46–61. DOI: https://doi.org/10.1504/IJBIC.2018.10014476

[51] Sarma A, et al. (2019) μDBSCAN: An exact scalable DBSCAN algorithm for big data exploiting spatial locality. Proc IEEE CLUSTER:1–11. https://doi.org/10.1109/CLUSTER.2019.8891020 DOI: https://doi.org/10.1109/CLUSTER.2019.8891020

[52] Zhang Y, Wang X, Li B, Chen W, Wang T, Lei K (2016) Dboost: A fast algorithm for DBSCAN-based clustering on high dimensional data. Advances in Knowledge Discovery and Data Mining 9652:245–256. https://doi.org/10.1007/978-3-319-31750-2_20 DOI: https://doi.org/10.1007/978-3-319-31750-2_20

[53] Broder AZ, Carmel D, Herscovici M, Soffer A, Zien J (2003) Efficient query evaluation using a two-level retrieval process. Proc CIKM:426–434. https://doi.org/10.1145/956863.956944 DOI: https://doi.org/10.1145/956863.956944

[54] Weng S, Gou J, Fan Z (2021) h-DBSCAN: A simple fast DBSCAN algorithm for big data. Proceedings of Machine Learning Research:81–96.

[55] Jiang H, Li J, Yi S, Wang X, Hu X (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Systems with Applications 38(8):9373–9381. https://doi.org/10.1016/j.eswa.2011.01.135 DOI: https://doi.org/10.1016/j.eswa.2011.01.135

[56] Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition 58:39–48. https://doi.org/10.1016/j.patcog.2016.03.008 DOI: https://doi.org/10.1016/j.patcog.2016.03.008

[57] Maheshwari R, Mohanty SK, Mishra AC (2023) DCSNE: Density-based clustering using graph shared neighbors and entropy. Pattern Recognition 137:109341. https://doi.org/10.1016/j.patcog.2023.109341 DOI: https://doi.org/10.1016/j.patcog.2023.109341

[58] Garg S, Kaur K, Batra S, Kaddoum G, Kumar N, Boukerche A (2020) A multi-stage anomaly detection scheme for augmenting security in IoT-enabled applications. Future Generation Computer Systems 104:105–118. https://doi.org/10.1016/j.future.2019.09.038 DOI: https://doi.org/10.1016/j.future.2019.09.038

[59] Perafan-Lopez JC, Ferrer-Gregory VL, Nieto-Londoño C, Sierra-Pérez J (2022) Performance analysis and architecture of a clustering hybrid algorithm called FA+GA-DBSCAN. Entropy 24(7):875. https://doi.org/10.3390/e24070875 DOI: https://doi.org/10.3390/e24070875

[60] Beer A, et al. (2024) SHADE: Deep density-based clustering. Proc IEEE ICDM:675–680. https://doi.org/10.1109/ICDM59182.2024.00075 DOI: https://doi.org/10.1109/ICDM59182.2024.00075