Prototype-based learning for large and multimodal data sets

Acronym: PLM
Supervisor(s): Barbara Hammer, Frank-Michael Schleif
Member(s): Xibin Zhu
Research Areas: B, D
 
Abstract: 

The goal of the project is to extend intuitive prototype- or exemplar-based machine learning methods to streaming and multimodal data sets and to test their suitability for data inspection and visualization. One particular focus will be put on kernelized or relational variants which offer a very general interface to the data in terms of the dissimilarities which can be chosen according to the specific data characteristics at hand. In this context, important questions are how prototypes can be represented intuitively to the user, how can structural constituents be combined and weighted accordingly, and how can sparse models be derived efficiently for complex structural objects.

Methods and Research Questions: 

Prototype- and exemplar-based machine learning constitutes a very intuitive way to deal with data since the derived models represent their decisions in terms of relevant data points which can directly be inspected by experts in the field. Due to their efficient and intuitive training, excellent generalization ability, and flexibility to deal with missing values or streaming data many, successful applications in diverse areas such as robotics or bioinformatics have been conducted. One problem of the techniques when facing modern data sets, however, is their restriction to Euclidean settings – hence they cannot adequately deal with modern, complex and inherently non-Euclidean data sets. In the project, extensions to general dissimilarity data by means of relational extensions will be considered, and it will be investigated how the benefitial aspects of prototype-based learning such as interpretability, efficiency, the capability of dealing with multimodal data or very large data sets can be transferred to this setting.

The project will focus on important representative tools from prototype-/exemplar-based learning, such as GLVQ and SRLVQ in the supervised domain and NG, GTM, and Affinity Propagation in the unsupervised domain. Relational extensions are based on an implicit embedding of general dissimilarity data in pseudo-Euclidean space, and an according implicit adaptation of prototypes which can be computed based on the given pairwise dissimilarities only. A number of problems arise in this context, such as the following: How can powerful prototype-based techniques be transferred to the relational setting? How can implicitely represented 'relational' prototypes be presented to humans in an intuitive way? How can missing values be dealt with, such as missing dissimilarities, or even missing parts of structures? Can we infer and adapt the relevance of structural parts for dissimilarity data? How can we deal with large data sets? How can linear or sub-linear methods be derived from the general mathemathical framework?  

Promising ways to tackle these challenges focus on intrinsic properties of prototype-based techniques, such as their reference to prototypes which can be approximated by explicit exemplars of the data sets. These usually offer an intuitive interface to human observers, as well as a powerful compression technique suitable to overcome the stability/plasticity dilemma, for example. Various intuitive paradigms which partially have already been investigated in the context of the classical models such as relevance learning will be transferred to the demanding setting of relation data.

Outcomes: 

In a first step, several classical unsupervised techniques have been transferred to relational data, such as e.g. NG or GTM. These have extensively been compared to alternative classical kernel based clustering and classification algorithms, leading to comparable performance. First techniques to arrive at efficient linear time methods have been implemented based on patch processing or the Nyström approximation, respectively. The resulting methods have linear time instead of squared complexity, and provide very accurate approximations depending on the characteristics of the data. Empirical tests in the context of biomedical domains have been conducted to prove this behavior. Currently, first experiments with supervised techniques show a similar and very promising behavior.

Publications

Learning vector quantization for (dis-)similarities

Hammer B, Hofmann D, Schleif F-M, Zhu X (2014)
NeuroComputing 131: 43–51.
Journal Article | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2615730
 

Sparse prototype representation by core sets

Schleif F-M, Zhu X, Hammer B (2013)
In: IDEAL 2013. Hujun Yin et.al (Ed);
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2625202
 

Secure Semi-supervised Vector Quantization for Dissimilarity Data

Zhu X, Schleif F-M, Hammer B (2013)
In: IWANN (1). Rojas I, Joya G, Cabestany J (Eds); Lecture Notes in Computer Science, 7902 Springer: 347–356.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2615717
 

Semi-Supervised Vector Quantization for proximity data

Zhu X, Schleif F-M, Hammer B (2013)
In: Proceedings of ESANN 2013. : 89–94.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2615701
 

Linear Time Relational Prototype Based Learning

Gisbrecht A, Mokbel B, Schleif F-M, Zhu X, Hammer B (2012)
Int. J. Neural Syst. 22(5).
Journal Article | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2625232
 

Fast approximated relational and kernel clustering

Schleif F-M, Zhu X, Gisbrecht A, Hammer B (2012)
In: Proceedings of ICPR 2012. IEEE: 1229 - 1232.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2615750
 

Soft Competitive Learning for large data sets

Schleif F-M, Zhu X, Hammer B (2012)
In: Proceedings of MCSD 2012. : 141–151.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2615756
 

Cluster based feedback provision strategies in intelligent tutoring systems

Gross S, Zhu X, Hammer B, Pinkwart N (2012)
In: Proceedings of the 11th international conference on Intelligent Tutoring Systems. Berlin, Heidelberg: Springer-Verlag: 699–700.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2536437
 

Approximation techniques for clustering dissimilarity data

Zhu X, Gisbrecht A, Schleif F-M, Hammer B (2012)
Neurocomputing 90: 72–84.
Journal Article | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2509852
 

Patch Processing for Relational Learning Vector Quantization

Zhu X, Schleif F-M, Hammer B (2012)
In: ISNN (1). : 55–63.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2534910
 

A Conformal Classifier for Dissimilarity Data

Schleif F-M, Zhu X, Hammer B (2012)
In: AIAI (2). : 234–243.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2534888
 

White Box Classification of Dissimilarity Data

Hammer B, Mokbel B, Schleif F-M, Zhu X (2012)
In: HAIS (1). : 309–321.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2534868
 

Patch Affinity Propagation

Zhu X, Hammer B (2011)
Presented at the 19th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium
Conference Proceeding / Paper | Published | English

Link: http://pub.uni-bielefeld.de/publication/2091665
 

Topographic Mapping of Dissimilarity Data

Hammer B, Gisbrecht A, Hasenfuss A, Mokbel B, Schleif F-M, Zhu X (2011)
In: WSOM'11.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2276485
 

Linear time heuristics for topographic mapping of dissimilarity data

Gisbrecht A, Schleif F-M, Zhu X, Hammer B (2011)
In: Intelligent Data Engineering and Automated Learning - IDEAL 2011: IDEAL 2011, 12th international conference, Norwich, UK, September 7 - 9, 2011 ; proceedings. Lecture Notes in Computer Science, 6936 Berlin, Heidelberg: Springer: 25–33.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2276480
 

Accelerating dissimilarity clustering for biomedical data analysis

Gisbrecht A, Hammer B, Schleif F-M, Zhu X (2011)
In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. : pp.154-161.
Conference Proceeding / Paper | Published | Quality Controlled | English

Link: http://pub.uni-bielefeld.de/publication/2276522