Finding Hidden Factors Using Independent Component Analysis
Erkki Oja, Helsinki University of Technology
Independent Component Analysis (ICA) is a computational technique for revealing hidden factors that underlie sets of measurements or signals. ICA assumes a statistical model whereby the observed multivariate data, typically given as a large database of samples, are assumed to be linear or nonlinear mixtures of some unknown latent variables. The mixing coefficients are also unknown. The latent variables are nongaussian and mutually independent, and they are called the independent components of the observed data. By ICA, these independent components, also called sources or factors, can be found. Thus ICA can be seen as an extension to Principal Component Analysis and Factor Analysis. ICA is a much richer technique, however, capable of finding the sources when these classical methods fail completely. In many cases, the measurements are given as a set of parallel signals or time series. Typical examples are mixtures of simultaneous sounds or human voices that have been picked up by several microphones, brain signal measurements from multiple EEG sensors, several radio signals arriving at a portable phone, or multiple parallel time series obtained from some industrial process. The term blind source separation is used to characterize this problem. The lecture will first cover the basic idea of demixing in the case of a linear mixing model and then take a look at the recent nonlinear demixing approaches. Although ICA was originally developed for digital signal processing applications, it has recently been found that it may be a powerful tool for analyzing text document data as well, if the documents are presented in a suitable numerical form. A case study on analyzing dynamically evolving text is covered in the talk.
Dan Roth, University of Illinois at Urbana-Champaign
Research in machine learning concentrates on the study of learning single concepts from examples. In this framework the learner attempts to learn a single hidden function from a collection of examples, assumed to be drawn independently from some unknown probability distribution. However, in many cases -- as in most natural language and visual processing situations -- decisions depend on the outcomes of several di erent but mutually dependent classifiers. The classifiers' outcomes need to respect some constraints that could arise from the sequential nature of the data or other domain specific conditions, thus requiring a level of inference on top the predictions. We will describe research and present challenges related to Inference with Classifiers -- a paradigm in which we address the problem of using the outcomes of several different classifiers in making coherent inferences -- those that respect constraints on the outcome of the classifiers. Examples will be given from the natural language domain.
Bernhard Schölkopf, Max Planck Institute for Biological Cybernetics
In the 90s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to the development of a new class of theoretically elegant learning machines which use a central concept of SVMs -- kernels -- for a number of different learning tasks. Kernel machines now provide a modular and simple to use framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm, and they have been shown to perform very well in problems ranging from computer vision to text categorization and applications in computational biology.
Learning with Mixture Models: Concepts and Applications
Padhraic Smyth, University of California, Irvine
Probabilistic mixture models have been used in statistics for well over a century as flexible data models. More recently these techniques have been adopted by the machine learning and data mining communities in a variety of application settings. We begin this talk with a review of the basic concepts of finite mixture models: what can they represent? how can we learn them from data? and so on. We will then discuss how the traditional mixture model (defined in a fixed dimensional vector space) can be usefully generalized to model non-vector data, such as sets of sequences and sets of curves. A number of real-world applications will be used to illustrate how these techniques can be applied to large-scale real-world data exploration and prediction problems, including clustering of visitors to a Web site based on their sequences of page requests, modeling of sparse high-dimensional "market basket" data for retail forecasting, and clustering of storm trajectories in atmospheric science.