Hierarchical Max-Margin Markov Algorithm implementation (c) 2004-2007 Juho Rousu, This package contains the MATLAB implementation of the Hierarchical Max-Margin Markov Algorithm, as described in the manuscript accepted to Journal of Machine Learning Research. 'Rousu, Saunders, Szedmak and Shawe-Taylor: Kernel-Based Learning of Hierarchical Multilabel Classification Models' The implementation is due to Juho Rousu (firstname dot lastname at cs.helsinki.fi). The implementation is a research prototype and is provided AS IS, no warranties or quarantees of any kind are given. Data format =========== Kx_tr m by m x-kernel matrix of the training points Kx_ts m by m' x-kernel matrix of the test points, m' is the number of test points X_tr m by n matrix of training point features, where n is the number of features associated with each training point X_ts m' by n matrix of test point features, where n is the number of features associated with each training point Y_tr m by k matrix containing the multilabels of the training points. Coding in the matrix : 1 = microlabel off, 2 = microlabel on Y_ts m' by k matrix containing the multilabels of the testing points. Coding in the matrix : 1 = microlabel off, 2 = microlabel on E |E| x 4 matrix containing the edge set of the hierarchy, (E(i,1),E(i,2)) contains the i'th edge as microlabel indices, E(i,3) is the running index of edge, E(i,4) is the depth of the edge in the hierarchy Note #1: due to the way the inference algorithm has been implemented, the global root of the hierarchy should only have a single child. Easiest way of ensuring this is to add a new dummy root node to the hierarchy and edge from that to the current root. (This is a bit ugly, but was simpler to implement this way) Note #2: Kx_tr and Kx_ts (or X_tr and X_ts), Y_tr, Y_ts and E need to be set global before starting the algorithm. params Input parameter structure .x_datasource Where x-data is to be found: 0 = kernel Kx_tr in memory, 1 = matrix X_tr, 2 = load from disk .max_ip_iter maximum number of interior point iterations for LIPSOL .profiling 1 = produce profiling information during learning, 0 = do not .profile_tm_interval Time interval between profile computations in seconds (default 600 sec) .profile_log File to output the profile info .verbosity Verbosity level .tolerance The tolerance for numerical inaccuracy .epsilon Stopping criterion, typically maximum allowed relative duality gap .chunksize How many examples to process at a time .enforce_tree Only constrain margins of labelings that represent trees (relaxed dual, so approximates the original .tree_inference Whether to predict labelings consistent with the tree or arbitrary multilabels .slack Slack parameter C .maxiter Maximum number of conditional gradient iterations .ws_workmem_limit How much working mememory to use .hloss Use hierarhical loss (mistake in child penalized only if parent was correct) .scaling Scaling of the loss 0 = no scaling, 1 = scaling by number of siblings, 2 = scaling by size of the subtree below Contents ======== Main programs ------------- learn_M3_maxtree_cgd The main algorithm for learning HM^3, in chunked form where a set of examples is simultaneously updated learn_M3_maxtree_cgd_decomp The main algorithm for learning HM^3, in decomposed form where a single example is updated at a time [lower memory footprint] learn_M3_LP_cgd_decomp The same but using LP instead of dynamic programming for conditional gradient (the algorithm in the ICML'05 paper) Update directions and inference ------------------------------- max_tree_direction Computes the update direction for the marginalized dual (the conditional gradient) using dynamic programming inference on the tree max_tree_direction_x Same as above but for a single example (used by the maxtree_cgd_decomp version) max_LP_direction_x Same as above but via solving an LP (used by the LP_cgd_decomp version) max_gradient_labeling The dynamic programming search for max gradient labelings, used both by max_tree_direction and test_M3 (with different gradients) max_gradient_labeling_x Same as above but for a single example (used by the maxtree_cgd_decomp version) compute_w_phi_e Computes the gradient vector required for obtaining max-likelihood labeling from max_gradient_labeling_ls Matrix vector products ---------------------- compute_Kmu Efficiently computes the product K \mu between the marginalized joint kernel and the marginal dual vector. compute_Kmu_x Same as above but for a single example i: [K \mu]_i (used by the _decomp version) compute_Kmu_1 Efficiently computes the product K v, vhere v is a vertex of the feasible set given by the conditional gradient. Used for speeding up saddle point computations [in the _decomp version the same code is inlined, hence no separate routine 'compute_Kmu_1x' or the like] Profiling --------- compute_duality_gap_vector Computes the duality gap and its approximate division to the training example test_M3 Tests the accuracy of the model with the current test set compute_M3_error Tests the accuracy of the model during optimization Learning problem setup ---------------------- get_x_kernel A stub for obtaining the x-kernel load_x_kernel Load the kernel from the disk (not tested) compute_loss_vector For obtaining the vector of losses in the compute_loss_scaling marginalized dual objective form_consistency_constraints.m Sets up the constraint matrices (used by learn_M3_LP_cgd_decomp) form_box_constraints_x.m compute_tree_size Computes the size of a subtree find_edge_neighbours Finds the parent and the children of a node precomputeIndicatorMatrices Computes indicator matrices representing the 4x4 tiles that make up the joint kernel matrix. Used for speed-up purposes compute_domain_sizes Computes domain sizes of node-labelings compute_edge_domain_sizes Precomputes domain size of edge-labelings Note: the code is in some places optimized not to refer to these tables but assumes binary node-domains and 4-value edge-domains This of course should be corrected. M3_set_globals Sets most of the global variables used by the algorithms. Does not set the kernels, data and the edges: Kx_tr and Kx_ts (or X_tr and X_ts), Y_tr, Y_ts and E need to be set global before starting the algorithm binary_var_count computes the number of edge-dual variables