Paper:

# Learning Similarity Matrix from Constraints of Relational Neighbors

## Masayuki Okabe^{*} and Seiji Yamada^{**}

^{*}Information and Media Center, Toyohashi University of Technology, 1-1 Tenpaku, Toyohashi, Aichi 441-8580, Japan

^{**}National Institute of Informatics, the Graduate University for Advanced Studies (SOKENDAI), 2-1-2 Hitotsubashi, Chiyoda, Tokyko 101-8430, Japan

This paper describes a method of learning similarity matrix from pairwise constraints assumed used under the situation such as interactive clustering, where we can expect little user feedback. With the small number of pairwise constraints used, our method attempts to use additional constraints induced by the affinity relationship between constrained data and their neighbors. The similarity matrix is learned by solving an optimization problem formalized as semidefinite programming. Additional constraints are used as complementary in the optimization problem. Results of experiments confirmed the effectiveness of our proposed method in several clustering tasks and that our method is a promising approach.

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.14, No.4, pp. 402-407, 2010.

- [1] W. Wu, C. Yu, A. Doan, and W. Meng, “An interactive clusteringbased approach to integrating source query interfaces on the deep Web,” In Proc. of the 2004 ACM SIGMOD Int. Conf. on Management of data, pp. 95-106, 2004.
- [2] D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” In Proc. of the Seventeenth ACM-SIGIR Conf., pp. 3-12, 1994.
- [3] S. Basu, I. Davidson, and K. L. Wagstaff, “Constrained Clustering,” CRC Press, 2008.
- [4] O. Chapelle, B. Scholkopf, and A. Zien, “Semi-Supervised Learning,” The MIT Press, 2006.
- [5] K. Wagstaff and S. Roger, “Constrained K-means Clustering with Background Knowledge,” In Proc. of the 18th Int. Conf. on Machine Learning, pp. 577-584, 2001.
- [6] D. Klein, S. D. Kamvar, and C. D. Manning, “From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering,” In Proc. of the 19th Int. Conf. on Machine Learning, pp. 307-314, 2002.
- [7] S. Shwartz, Y. Singer, and A. Y. Ng, “Online and Batch Learning of Pseudo-Metrics,” In Proc. of the 21st Int. Conf. on Machine Learning, pp. 94-101, 2004.
- [8] J. Davis et al., “Information-Theoretic Metric Learning,” In Proc. of the 24th Int. Conf. on Machine Learning, pp. 209-216, 2007.
- [9] W. Tang, H. Xiong, S. Zhong, and J. Wu, “Enhancing Semi-Supervised Clustering: A Feature Projection Perspective,” In Proc. of the 13th Int. Conf. on Knowledge Discovery and Data Mining, pp. 707-716, 2007.
- [10] S. C. H. Hoi, R. Jin, and M. R. Lyu, “Learning Nonparametric Kernel Matrices from Pairwise Constraints,” In Proc. of the 24th Int. Conf. on Machine Learning, pp. 361-368, 2007.
- [11] Z. Li, J. Liu, and X. Tang, “Pairwise Constraint Propagation by Semidefinite Programming for Semi-supervised Classification,” In Proc, of the 25th Int. Conf. on Machine Learning, pp. 576-583, 2008.
- [12] K. Fujisawa, M. Kojima, and K. Nakata, “Exploiting sparsity in primal-dual interior-point methods for semidefinite programming,” Mathematical Programming, Series B 79, pp. 235-253, 1997.
- [13] I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” In Proc. of the seventh ACM SIGKDD Conf., pp. 269-274, 2001.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 International License.