HaploRec: Efficient and accurate large-scale reconstruction of haplotypes

HaploRec [1,2] is a statistical haplotype reconstruction algorithm targeted for large-scale disease association studies. It is especially suitable for data sets with a large number of subjects and a large number of possibly sparsely located markers.

HaploRec is implemented in the Java programming language, and thus works on any platform for which a java virtual machine is available (practically all common operating systems). It requires Java virtual machine 1.5 or higher.

References

[1] Lauri Eronen, Floris Geerts, Hannu Toivonen. HaploRec: Efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 7:542, 2006.

[2] Lauri Eronen, Floris Geerts, Hannu Toivonen. A Markov Chain Approach to Reconstruction of Long Haplotypes. Proceedings of the 9th Pacific Symposium on Biocomputing (PSB'04), 104-115, January 2004. World Scientific.

Obtaining HaploRec

HaploRec is freely available for academic, non-commercial use. Commercial use requires a license.

Click here to obtain an academic license and download HaploRec (version 2.3).
Click here for information on commercial licensing.
Download HaploRec documentation

Version history

2.3 (February 2008) Current version. Introduces an improved Markov model, where the conditional probabilities are smoothed over several different context lengths. This is now the default model. More efficient splitting and combining of files in the windowing version intended for chromosome-wide data sets. Minor improvements in memory management.
2.2 (May 2007) Conceptually identical to version 2.1. Extension program which enables handling of chromosome-wide data sets is now implemented in java instead of perl, improving platform independence.
2.1 (September 2006) The version described in [1]. Introduces an improved segmentation model that gives more accurate results than the segmentation model used in version 2.0.
2.0 (June 2005) This version introduced an EM-based algorithm and a segmentation-based haplotype probability model, and featured several computational improvements, allowing the use of larger data sets.
1.0 (January 2004) Original implementation of the algorithm introduced in [2]. Used a variable-order markov model. Used a simple "optimistic matching" strategy for fragment frequency estimation (instead of the EM algorithm used by the later versions).

Data sets

Download an example data set. This data set is one of the data sets used in the experiments of [1]. Simulated data, 1000 subjects, 30 markers, 33kb average marker spacing.
Download data sets used in the PSB04 article [2]. (a short description of the data sets).

Data mining in genetics

Topics and introduction | Group | Publications