University of Helsinki homepageSuomeksiPå svenskaIn English
University of Helsinki Department of Computer Science
 

Department of Computer Science

582637 Project in probabilistic models (2 cr), Todennäköisyysmallien harjoitustyö (2 op), Spring 2010

The language used in the course will be English.

Sessions

18.03.-29.04. Thu 16-18 in C220

Course instructor: University lecturer Hannes Wettig (hannes.wettig at cs.helsinki.fi), Room A324

Introduction

This is a new course belonging to the new Algorithms and Machine Learning sub-pogramme in the Master's programme of the department, and together with 582636 Probabilistic models (4 cr), it forms one of the three optional courses of the sub-programme.

For students in the old Intelligent Systems specialisation area: this course replaces, together with the course 582636 Probabilistic Models (4 cr), the course Three Concepts: Probability (6 cr).

Prerequisites

The students are expected to do the course 582636 Probabilistic models (4 cr) first before taking this course. In exceptional cases, please contact the course instructor.

Course Description

This course involves project work in probabilistic modeling. This year, your task is to compete in missing data completion: the winner of the competition is the student who by the end of the course has been able to approximate the missing entries best with respect to a given objective function. More precisely, in the beginning you are given a matrix where some of the entries are missing. The competition consists of four rounds, and after each round, you are expected to give your guesses for the missing entries, and we will tell you (but not your competitors) which of your guesses were close to the original values that had been erased. In each round a number of best guesses of all contestants will be revealed to the contestants who submitted them, so making good guesses already in the first round pays off, as you will get more useful information for the following rounds.

The data remains the same throughout the competition, and the winner is the participant who after four has best filled in the missing entires. After the four rounds, you are expected to give a presentation at the final meeting. You are also required to keep a study diary in which you report all your methods tried and your thoughts on the subject. Please include everything you tried, whether it worked or not.

The grade of the course depends on the following factors:

  • Your success in the competition (measured with respect to other competitors)
  • The technical quality of the approaches you try during the competition. As this is a project on probabilistic models, we appreciate only solutions based on probabilistic modeling. You can explore alternative approaches as well if you wish, but be warned that in the unlikely event that you win the competition with a non-probabilistic approach, and in your report you describe no attempts with probabilistic models, you will not get maximum points.
  • The innovativeness and range of your work. We value good imagination and hard work, so if you have good ideas, and take the time to explore many of them, this will earn you more points even if the results for some reason are not good as you expected.
  • The quality of the final report. In your report, try to bring out the two aspects above and describe also your failures. Extra points if you can analyze the reasons for your potential failures.
  • Quality of your presentation at the final seminar. You are expected to prepare slides for the presentation. Extra points for additional material, like animations etc.

Data and scoring functions

The data set is made up of wireless local area network (WLAN) signal strength measurements. It consists of 8030 lines, each representing a scan at a certain time and location, and 35 columns, each representing a frequency. The entries d(j,i) are thus signal strengths at time and place j for channel i. These are given in decibels, the value -100 meaning "no signal detected". Value 0 means "missing", these are the values you need to replace by something meaningful. There are 44.779 such missing entries.
  • As strong signals carry more information than weak signals, each (missing) entry is assigned a weight w(j,i)=1/d*(j,i)^2, where d*(j,i) is the original, erased entry.
  • You are to optimize the weighted sum of squares of the errors d(j,i)-d*(j,i), where d(j,i) is your guess.
  • In each round all your correct guesses of the special value -100 will be revealed to you. All other guesses of the whole group will be ranked by their quality, ie. their weighted square error w(j,i)*|d(j,i)-d*(j,i)|. The best guesses will be revealed to, and only to, their submitter. We shall reveal around the top five percent.

    Data File Exchange

    Please mail the file in the original format, only with zeros replaced to Mika Urtela, mika dot urtela at helsinki dot fi. You may also send a link to the file, if you would rather put it in your public_html. In this case please make sure reading rights are set correctly!

    In return, you will get a link to your next round's data. The address is the password.

    Course schedule

    18.3. First meeting, discussion of the task

    25.3. Second meeting, details on submission

    1.4. Easter holiday

    8.4. Third meeting, first submission deadline tue 6.4.!

    15.4. Fourth meeting, second submission deadline wed 14.4.!

    22.4. Fifth meeting, third submission deadline wed 21.4.!

    29.4. Final meeting, presentations. Final submission deadline wed 28.4.!

    Course material

    See the material of the course Probabilistic models.


    Petri Myllymäki/Hannes Wettig