Please note that while lecture notes will be provided, the lecture itself will sometimes digress to web-based material and Java applets for demonstrations. These will usually be listed on this page, but not on the PDF slides.
Lecture 1: Introduction: Graphs and Probability
Review of some famous graphical models including circuits, fault trees, and ising lattices. Review of graphs including definitions for directed and undirected graphs, relations such as parents and ancestors, and connectivity notions. Review of probability as frequency or belief in both discrete and continuous contexts. Review of modelling as a key part of intelligent systems design. Examples of using graphs to model dependence, causality and relevance. [ Lecture 1 in PDF, in Postscript, 2-up, gzipped ]
Preparatory reading: Please review the material under graphs from Graph Theory and the Ising Model, and course notes of Three Concepts: Probability (PDF).
Lecture 2: Independence Modelling
Functional independence in optimization as a method for task simplification. Graph partitioning as a means of problem decomposition. Probabilistic independence as functional independence in probability space. Undirected graphs, relevance, and independence defined as set separation. Directed graphs, dependence, and independence defined on the underlying undirected graph. Soundness and completeness between graphical independence and functional independence forms. A catalogue of graphical forms for time-series, spatial modelling, diagnosis and fault detection and their independence properties. [Lecture 2 in PDF , in Postscript, 2-up, gzipped ]
Preparatory reading: Please review Shachter's manuscript "Examples without Numbers" (handed out in Lecture 1) and Scheines' tutorial on d-separation. You don't have to understand d-separation, because its difficult and we'll use a Lauritzen-style definition instead, but have a look at it. Part I of Bishop's tutorial (see below) is good material for this lecture.
Lecture 3: Exact Inference
Given a graphical model on discrete variables of low dimension, and a probability formula, evaluate the probability, or find the maximum likelihood assignment for the free variables. We will consider this in a general sense. We will develop an exact algorithm for the computation as nested summation on a clustered graph using variable elimination as the basic step. This turns out to create a clique tree, the general independence structure. Thus we will consider finding good clique trees, and their complexity. We will not cover implementation details for the algorithm used to speed it up in practice. [Notes in PDF (updated since lecture!!), in Postscript, 2-up, gzipped]
Preparatory reading: Please review R. McEliece and S. M. Aji, 2000. The Generalized Distributive Law, IEEE Trans. Inform. Theory, vol. 46, no. 2 (March 2000), pp. 32 5--343. Read sections I, II and VI only.
Lecture 4: Introduction to Modelling and Learning
We begin by viewing B-Course applied to the "Visit to Asia" network. Learning will look at simple cases: learning proportions, and learning a regression curve. These examples are covered in detail because they form the basis of many algorithms. The proportions example will be done using the excellent Beta Coin Experiment applet from the Prob./Stats Object Library. They introduce the notion of sufficient statistics and the trade-off between complexity and predictive accuracy, and they illustrate the problems with priors. Discussion of modelling looks at the interaction between design decisions and the quality of the developed/proposed solution. [Notes in PDF (updated since lecture!!), in Postscript, 2-up, gzipped]
Preparatory reading: Please review Henry Tirri's notes on proportions. Heckerman's tutorial is a reference for this section.
Lecture 5: Learning with Graphs
Learning of graphs requires one to learn both the graphical structure and the probability parameters at each node. Probabilistic reasoning can handle both, given suitable assumptions. Learning on graphs will look at methods in the B-Course system. We will look at the maximum likelihood approach and some priors used for the Bayesian approach. The search space will be briefly discussed. The key techniques used here will be considered: parameter independence, the exponential family, the notion of "evidence" and its use, and multiple models. [Notes in PDF (updated since lecture!!), in Postscript, 2-up, gzipped
Preparatory reading: Please look at the B-Course Library. You could optionally review initial part of Friedman and Goldszmidt's tutorial, not beyond first 50 slides (first 20 is review). Have a practice on B-Course with some example data generated using Teemu's generator.
Lecture 6: Selected Applications
Applications will be presented by Tomi Silander on learning with B-Course, on new PCA models in information retrieval by Wray Buntine, and on location sensing by Petri Myllymäki.
- Petri's slides: ``Graphical Models on Manhattan: A Probabilistic Approach to Mobile Device Positioning''
Preparatory reading: None for this week.
There is no need to purchase any books. We will supply copies of any necessary material prior to lectures. Material available online will sometimes be in Adobe's PDF or Postscript format so you should be able to view these formats. Moreover, some of the demonstrations require Java capable browsers. Other than the use of these long-time standard formats, we make efforts to avoid online material that requires special browsers (e.g., thus material converted from Powerpoint is avoided). Note when we say something is a reference book, we mean you should not use it to learn the material, but only to check details or find more advanced material.
Graphical Models |