Software by Veli Mäkinen
Compressed Data Structures
Most newer implementations can be found at group pages SuDS and GSA.
Below my older compressed data structure implementations. Some of them are now plugged into Pizza & Chili Corpus, that contains library implementations of several compressed text indexes and testbed data collections.
- Implementation of the Compact Suffix Array index structure (CPM 2000 & ALENEX 2001): Download csa.zip and see README inside the package for instructions. The package contains C++ source code for constructing and querying compact suffix arrays.
- Implementation of the Compressed Compact Suffix Array index structure (CPM 2004): Download ccsa.zip and see README inside the package for instructions. The package contains C++ source code for constructing and querying compressed compact suffix arrays.
- Implementation of a Huffman-FM index structure (SPIRE 2004). Download hufffm.zip and see README inside the package for instructions. The package contains C++ source code for constructing and querying a version of FM-index where Huffman-compression is first applied to the text.
- Implementation of a RLFM index structure (CPM 2005): Download rlfm.zip and see README inside the package for instructions. The package contains C++ source code for constructing and querying versions of FM-index structure. Updated 5.11.2004 with new functionality and several speedups. FM-index was introduced by Ferragina and Manzini, FOCS 2000. The above implementation uses exactly the same search mechanism as proposed by them, but the internal structures are quite different. You might also be interested to see their implementation of FM-index.
Music Information Retrieval
- Implementation of Transposition invariant string matching algorithms corresponding to STACS 2003 & JALG 2005 articles.
- See CBRAHMS music retrieval engine for implementations of P1, P2, P3 algorithms of ISMIR 2003 article.
- Implementation of pattern splitting algorithm of CPM 2003 article.
Bioinformatics
- 2D Electrophoresis Gel Matching software corresponding to CPM 2002 article is implemented using Borland C++ Builder for Windows. It features automatic matching of gel images without needing user-defined landmarks as all the commercial softwares. Also some sort of spot detection is provided. However, the software does not contain all the other handy features of commercial softwares, hence it is more in the prototype status. If you are interested to try it out, or willing to continue its development, ask me for the source.
- Peak Alignment prototype software corresponding to the Biomolecular Engineering article (2007) is also implemented using Borland C++ Builder. It contains the basic algorithm implementations and some visualizations. Ask me for the source.
- Mass Spectra Calibration algorithm corresponding to the ACM/IEEE TCBB article (2007) is used in the mass spectra routines developed in Jena. Ask them for the software.
- Implementation of the algorithm for missing patterns problem corresponding to the WABI 2004 article.
- Implementation of the compressed suffix tree corresponding to the Bioinformatics & WEA 2007 articles can be found at the project page.
- NEW: Implementation of a simple adapter filtering tool for short read alignment can be found here.
- NEW: Normalized N50 assembly metric implementation corresponding to BMC Bioinformatics paper in 2012 can be found here.