===================================================================== segkh: Sequence segmentation with recurrent sources Copyright (C) 2004 Aristides Gionis This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY. For more details see the license that accompanies the code. The code is an implementation of the algorithms described in the paper [1] A. Gionis, H. Mannila, Finding recurrent sources in sequences, 7th International Conference on Research in Computational Molecular Biology (RECOMB) 2003 ===================================================================== General ======= The code is implemented in C++ for Linux/UNIX platforms. It consists of the following files: segkh.cc The main C++ file Makefile The Makefile LICENSE The file containing the software license README This file To create the executable program simply run "make" or "gmake". The program is called "segkh". By running the program with no parameters, a help screen is shown with a short explanation of how the program should be called. Input ===== The program takes its input from the standard input. For example, if the data is stored in a file called "datafile", it can be inputed to the program using, e.g., the commands $cat datafile | segkh c 20 5 $segkh r 30 7 < datafile The input to the program is a multi-dimensional time series. The format of the input is assumed to be as follows: -- The first line is an integer specifying the number of dimensions of the time series. -- Then n lines follow. One line for each point in the time series. Each line consists of d numbers specifying the d dimensions for the corresponding point. (Notice, there is no need to specify the number of points n) Parameters ========== The program needs three parameters from the commant line 1. The first parameter is a letter specifying the algorithm to be used. The options are: r for the Segment2Level algorithm c for the ClusterSegment algorithm e for the EM algorithm starting from the solution found by the ClusterSegment algorithm For details on the algorithms see the original paper [1]. 2. The second parameter is the number of segments k to be used for the segmentation 3. The third parameter is the number of levels h to be used for the segmentation Example of usage: segkh c 10 4 < timeseries.1