Back to HPM and TreeDT
Haplotype Pattern Mining (HPM) algorithm for gene localization
Input
marker map M = (m1, ... ,mk)
phenotype vector Y = (Y1, ..., Yn)
haplotype matrix H of size n * k
association threshold x for chi-squared test
maximum pattern length l
maximum number of gaps g
maximum gap size s
Output
prediction for a DS locus: the marker with the highest marker frequency
in those haplotype patterns over M that are strongly associated
with the disease with respect to H, Y, and x, and
that adhere to l, g, and s
Method
// Initialize the set of strongly associated haplotype patterns:
PP := {};
// Number of case and control chromosomes:
piA := number of disease-associated chromosomes;
piC := number of control chromosomes;
pi := piA + piC;
// A lower bound for pattern frequency:
lb := piA * pi * x / (piC
* pi + piA * x);
// Variable for iterating over different patterns:
P = (p1, ... , pk) := ('*',
... , '*');
for i := 1 to k {
// alleles(mi) is the set of alleles
of the i:th marker
foreach a in alleles(mi)
{
pi :=
a;
// Test pattern P
and all its extensions:
depthFirst(P, i,
i, 0, 0);
// Reset pi:
pi :=
'*';
}
}
for i := 1 to k { compute marker frequency
f(mi) in the set PP of patterns; }
output the marker which has the highest marker frequency;
// Test haplotype pattern P and all patterns that can be generated
by extending P from the right:
procedure depthFirst(P, start, i, nr_of_gaps,
gap_length) {
// Add strongly associated patterns to PP:
if chi-squared(P, M,
H, Y) >= x and pi
!= '*' then insert P to the set PP;
// Return if extended patterns would be too long:
if i = k or
i+1-start > l then return;
// Return if extended patterns can not be strongly
disease-associated:
if frequency of P in disease-associated
chromosomes is less than lb then return;
// Create and test legal extensions of current pattern
P (3 cases):
// 1. Give marker i+1 all possible values:
foreach a in alleles(mi+1)
{
pi+1
:= a;
depthFirst(P, start,
i+1, nr_of_gaps, 0);
}
// 2. Introduce a new gap starting at marker i+1:
if pi != '*'
and nr_of_gaps < g and s >=
1 then {
pi+1
:= '*';
depthFirst(P, start,
i+1, nr_of_gaps+1, 1);
}
// 3. Extend the current gap over marker i+1:
if pi = '*'
and gap_length < s then {
pi+1
:= '*';
depthFirst(P, start,
i+1, nr_of_gaps, gap_length+1);
}
// Before returning, reset pi+1:
pi+1 := '*';
return;
}
Back to HPM and TreeDT