Gradient Descent Batch Clustering for Image Classification
DOI:
https://doi.org/10.5566/ias.2905Keywords:
batch clustering, gradient descent, image classification, principal component analysis, stochastic processAbstract
The batch clustering algorithm for classification application requires the initial parameters and also has a drifting phenomenon for the stochastic process. The initial parameters are critical for the clustering to con-verge to the partial optimum. The drifting phenomenon in original batch clustering still has space to be improved thus to speed up the convergence based on the initial parameters. This paper proposes an unsupervised clustering method by addressing these two issues. Firstly, the estimation method for the initial parameters has been given in preliminary with a hierarchical manner of principal component analysis (PCA). The nonlinear parameters have been estimated based on a mathematical connection between PCA and clusters membership. With initial parameters, the drifting issue is addressed by combing the gradient descent and the batch clustering on an auxiliary objective to refine the initial parameters. The efficiency of the clustering process is proved based on the relationship between two quadratic functions followed by a justification. In addition, the effectiveness of the proposed method has been validated with the statistical F measure in classification application. The validation results show that the efficiency of the proposed gradient descent batch clustering has been improved significantly with trade-off to the accuracy in comparison of the original algorithms under the mean squared error (MSE) criterion.
References
Asuni N. and Giachetti A, Testimage: a large-scale archive for testing visual devices and basic image processing algorithms. 2014, STAG: Smart Tools & Apps for Graphics.
BandyoPadhyay S (2001), 'Clustering using simulated annealing with probabilistic redistribution', Interna-tion J Pattern Recogn Artif Intell, vol.15, no.2, pp.269-85.
Chen B, et al. (2005), 'Novel Hybrid Hierarchical-K-means Clustering Method (H-K-means) for Microarray Analysis', Computational Systems Bioinformatics Conference 2005. Stanford University, IEEE, pp.105-8.
Delport V (1996), 'Codebook design in vector quantisation using a hybrid system of parallel simulated annealing and evolutionary selection', Electron Lett, vol.32, no.13, pp.1158-60.
Du K-L. and Swamy MNS (2006), 'Neural Networks in a Softcomputing Framework', London, Springer-verlag London Limited.
Duda RO, Hart PE, and Stork DG (2000), 'Pattern Classification', 2nd, New York, Wiley-Interscience.
Ester, M., et al. (1996). 'A density-based algorithm for discovering clusters in large spatial databased with noise', KDD-96 Proceedings, 226-31.
Guha S, Rastogi R, and Shim K (2000), 'Rock: A robust clustering algorithm for categorical attributes', Information Systems, vol.25, no.5, pp.345-66.
Guha, S., R. Rastogi, and K. Shim (1998). 'CURE: An efficient clustering algorithm for large databases', Proc. ACM SIGMOD Int. Conf. Management of Data, 73-84.
Gray NH, Anderson JD, Devine JD, Kwasnik JM (1976). Topological properties of random crack networks. Math Geol 8:617–26.
Haykin S (1999), 'Neural Network A Comprehensive Foundation', 2nd, New Jersey, Tom Robbins.
Ioffe S and Szegedy C (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167, 2008.
Jimenez AR, Ceres R, and Pons JL (2000). 'A Survey of Computer Vision Methods for Locating Fruit on Trees', Transactions of the ASAE, vol.43, no.6, pp.1911-20.
Karypis G, Han E-HS, and Kumar V (1999), 'Chameleon: hierarchical clustering using dynamic modeling', Computer, vol.32, no.8, pp.68-75, Issn:0018-9162, Doi:10.1109/ 2.781637.
Khan F, 'An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application', Applied Soft Computing, vol.12, 2012, pp.3698-700.
Kohonen T (2001), 'Self-Organizing Maps', 3rd, Berlin, Springer.
Kohonen T (1990). 'The Self-Organizing Map', Proceedings of the IEEE, vol.78, 1464-80.
Krishna K, and Murty MN (1999), 'Genetic K-Means Algorithm', IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, vol.29, no.3, pp.433-39.
Lai JZC, and Liaw Y-C (2008), 'Improvement of the k-means clustering filtering algorithm', Patt Recogn, vol.41, pp.3677-81.
Li P, Lee S-H, and Hsu H-Y (2012), 'Fusion on Citrus Image Data from Cold Mirror Acquisition System', Int. J Comput Vis Image Process, vol.2, no.4, pp.12-26, Issn:2155-6997, Doi:10.4018/ijcvip.2012100102.
Li P, Lee S-H, and Hsu H-Y (2011). 'Study on citrus fruit image data separability by segmentation methods', 2011 International Conference on Power Electronics and Engineering Application. Shenzhen, China, Procedia Engineering Elsevier, vol.23, 408-16, Isbn:1877-7058.
Linde Y, Buzo A, and Gray RM (1980), 'An Algorithm for Vector Quantizer Design', IEEE Trans Commun, vol.COM-28, no.1, pp.84-95.
MACQUEEN J (1967). 'Some methods for classification and analysis of multivariate observations', 5th Berkeley Symp on Math Statistics and Probability. University of California Press, Berkeley, 281-97.
Patane G, and Russo M (2001), 'The enhanced LBG algorithm', Neural Networks, vol.14, pp.1219-37.
Qian Y, et al. (2016), 'Space Structure and Clustering of Categorical Data', IEEE Trans Neural Netw Learn Syst, vol.27, no.10, pp.2047-59, Doi: 10.1109/TNNLS.2015.2451151.
Sujatha S, and Sona AS, 'New fast K-means clustering algorithm using modified centroid selection method', Int J Eng Res Technol (IJERT), vol.2, no.2, 2013, pp.1-9, Issn:2278-0181.
Xiang T, and Gong S, 'Spectral clustering with eigenvector selection', Patt Recogn, vol.41, 2008, pp.1012-29.
Xu R, and Wunsch D (2005), 'Survey of Clustering Algorithms', IEEE Trans Neural Netw, vol.16, no.3, pp.645-78.
Zaremba W, and Sutskever I, 'Learning to Execute', Internationa conference on Learning Representa-tions, 2015.
Zhang T, Ramakrishnan R, and Livny M (1996). 'BIRCH: An Efficient Data Clustering Method for Very Large Databases', ACM SIGMOD Conf. Management of Data, 103–14.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Jae-Sam Park
This work is licensed under a Creative Commons Attribution 4.0 International License.