582638 Unsupervised machine learning
Lecture course in English, 4-6 cu (ECTS), 2008-2009
Teachers
Lectures: Aapo Hyvärinen
Exercices and computer projects: Urs Köster and Michael Gutmann
Schedule
In the 4th period, starts 11/03/2009, ends 24/04/2009.
Lectures are on Wednesdays and Fridays, 14:15-15:45 at lecture room C222.
Exercice sessions are on Tuesdays, 14:15-15:45 at room BK106.
Note also Easter break: no lectures or exercices on 10th, 14th, 15th April.
Registration
Please register using the ILMO system.
Please give feedback on the course.
Target audience
Master's students in statistics (incl. EuroBayes), computer
science (specialization in algorithms & machine learning, intelligent systems, or bioinformatics), or applied mathematics
(specialization e.g. in stochastics)
Description
Unsupervised learning is one of the main streams of machine learning,
and closely related to exploratory data analysis and data mining. This
course describes some of the main methods in unsupervised learning.
In recent years, machine learning has become heavily dependent on
statistical theory which is why this course is somewhere on the
borderline between statistics and computer science. Emphasis is put
both on the statistical formulation of the methods
as well as on their computational implementation. The goal is not
only to introduce the methods on a theoretical level but also to show
how they can be implemented in scientific computing environments
such as Matlab or R. Computer projects are an important part of the
course.
How to obtain the credits
There are two ways of getting credits for this course:
- Taking the exam which consists of solving mathematical problems. The exam is on Mon 4th May in B123 & CK112, at 16:00-19:00 o'clock.
- Doing computer projects which consist of programming in either Matlab or R (you can choose which one you use)
If you do one of these, you get 4 cu. If you do both of them, you get 6 cu. You are strongly encouraged to do both of them.
You also have the option of handing in the mathematical exercices given in the exercice sessions. This will give you extra points worth, at the maximum, 20% of the total points. Details
Prerequisites
- Statistics majors: Bachelor's degree recommended.
- Mathematics majors: Bachelor's degree recommended. It should include basic courses in analysis (including vector analysis), linear algebra I&II, introduction to probability, introduction to statistical inference. (Preferably also some more statistics courses.)
- Computer science majors: Bachelor's degree recommended. It should include the mathematics courses listed above for mathematic majors, as well as
both the courses "Introduction to machine learning" and "Probabilistic models" or their previously lectured counterpart "Computational data analysis I"
Contents:
- Introduction
- supervised vs. unsupervised learning
- applications of unsupervised learning
- probabilistic formulation: generative models or latent variable models
- overview of the topics below
- Numerical optimization
- gradient method, Newton's method, stochastic gradient, alternating variables
- Principal component analysis and factor analysis
- formulation as minimization of reconstruction error or maximization of component variance
- computation using
covariance matrix and its eigen-value decomposition
- factor analysis and interpretation of PCA as estimation of gaussian generative model
- factor rotations
- Independent component analysis
- problem of blind source separation, why non-gaussianity is needed for identifiability
- correlation vs. independence
- ICA as maximization of non-gaussianity, measurement of non-Gaussianity by cumulants
- likelihood of the model and maximum likelihood estimation
- information-theoretic approach, connections between different approaches
- implementation by gradient methods and FastICA
- Clustering
- k-means algorithm
- formulation as mixture of gaussians
- maximization of likelihood: alternating variables method, EM algorithm
- Nonlinear dimension reduction
- non-metric multi-dimensional scaling and related methods, e.g. kernel PCA, IsoMap
- Kohonen's self-organizing map
Course material
There is no book for the course. Handouts, typically chapters of books, will be
provided, and together with some lecture slides, these will contain
the material of the lectures. This material, together with the exercices, will be made available here (material will be added after each lecture). You will need a login name and password which are given during the lectures (or you can email the assistants)
Aapo Hyvärinen, Feb-April 2009. Last update 26 Apr 2009.