Recent trends in stringology - study group
News
- Some new papers added 14th September to complete the list of possible subjects for the subsequent seminar to be held in the autumn semester.
- The name in parantheses in the end of each paper listed below refera to the person who introduced the paper. Papers not associated with any name were not covered.
- The last meeting of the study group was held Thursday 10th June.
Working policy
- Two persons give introductory talks on two different topics on each Monday.
- 1-2 papers are selected based on the introductions to be read by the whole group between Monday-Thursday.
- On Thursday the group discusses the papers in more detail.
Schedule
- Starting from thursday 13.5 all meetings in room B227b (end of library)
- monday 3.5 14-16: organization
- monday 10.5 14-16: 2 introductory talks by Kjell and Veli
- thursday 13.5 10-13: in detail discussion
- monday 17.5 14-16: 2 introductory talks by Juha and Ari
- wednesday 19.5 10-13: in detail discussion
- monday 24.5 14-16: introductory talks by Stefan
- thursday 27.5 10-13: in detail discussion
- monday 31.5 14-16: 2 introductory talks by Aristides and Juho
- thursday 3.6 10-13: in detail discussion
- monday 7.6 14-16: introductory talk by Kimmo
- thursday 10.6 10-13: in detail discussion and introductory talk by Tomi
Participants
- Esko Ukkonen
- Juha Kärkkäinen
- Stefan Burkhardt
- Shunsuke Inenaga
- Veli Mäkinen (organizer)
- Margus Lukk
- Aristides Gionis
- Taneli Mielikäinen
- Kjell Lemström
- Tomi Pasanen
- Ari Rantanen
- Kimmo Palin
- Panayiotis Tsaparas
- Yoan Pinzon (visiting from Kings College, London)
- Juho Rousu (visiting 31.5 from Royal Holloway, London)
Topics
- Pattern discovery
- Inverse problems in stringology
- Compressed text indexes & Burrows-Wheeler transformation
- Metric space indexing
- Music information retrieval
- Detection of virus DNA in Human Genome
- 3D matching under rotation invariance with applications
- Gapped q-grams
- Unexpected words discovery (based on Apostolico's work)
- Support-vector machines, string kernels
- New trends in information retrieval
- External memory string structures
- Other?
Papers
Pattern Discovery
- G. S. Brodal, R. B. Lyngsø, C. N. S. Pedersen, J. Stoye: Finding Maximal Pairs with Bounded Gap.
In CPM 1999.
[SpringerLink] (Veli) - Pelfrene, Abdeddaiam, Alexandre: Extracting Approximate Patterns.
In CPM 2003.
[SpringerLink] (Kjell) - Crochemore et al.: Longest Repeats with a Block of Dont Cares.
In LATIN 2004.
[SpringerLink.] (Veli) -
M. Takeda, S. Inenaga, H. Bannai, A. Shinohara, and S. Arikawa:
Discovering Most Classificatory Patterns for Very Expressive Pattern
Classes.
In Proc. DS'03, LNCS 2843, pp. 486-493.
[SpringerLink] - S. Inenaga, H. Bannai, A. Shinohara, M. Takeda, S. Arikawa:
Discovering Best Variable-Length-Don't-Care Patterns.
In Proc. DS'02, LNCS 2534, pp. 86-97.
[SpringerLink] - J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang:
Distinguishing string selection problems.
Information and Computation 185(1):41-55, 2003.
[.pdf] - N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot:
A Basis of Tiling Motifs for Generating Repeated Patterns and Its
Complexity for Higher Quorum.
In Proc. MFCS'03, LNCS 2747, pp. 622-631.
[SpringerLink] (Kjell) - P. Nicodeme, Salvy, Flajolet: Motif Statistics.
Technical report 2004.
[.ps ] -
Pavel A. Pevzner and Sing-Hoi Sze.
Combinatorial Approaches to Finding Subtle Signals in DNA
Sequences.
In Proc. ISMB'00, AAAI 2000, pp. 269-278.
[.pdf] (Juha) -
Jeremy Buhler and Martin Tompa.
Finding Motifs Using Random Projections.
Journal of Computational Biology 9 (2), 2002, pp. 225-242.
[.ps (preliminary version?)] [ACM Portal (shorter conference version)] (Juha) -
Uri Keich and Pavel A. Pevzner.
Finding motifs in the twilight zone.
Bioinformatics 18 (10), 2002, pp. 1374-1381.
[Bioinformatics] (Ari) -
Uri Keich and Pavel A. Pevzner.
Subtle motifs: defining the limits of motif finding algorithms.
Bioinformatics 18 (10), 2002, pp. 1382-1390.
[Bioinformatics] (Ari) -
Yuh-Jyh Hu.
Finding subtle motifs with variable gaps in unaligned DNA
sequences.
Computer Methods and Programs in Biomedicine. Volume 70, Issue 1 , January 2003, Pages 11-20.
[ScienceDirect] - Motif Discovery Assestment
Inverse problems
- H. Bannai, S. Inenaga, A. Shinohara, and M. Takeda:
Inferring Strings from Graphs and Arrays.
In Proc. MFCS'03, LNCS 2747, pp. 208-217.
[SpringerLink] - J.-P. Duval, T. Lecroq and A. Lefebvre:
Border Array on Bounded Alphabet.
In Proc. PSC'02,
[.ps]
Compressed text indexes & Burrows-Wheeler
- Giovanni Manzini: An Analysis of the Burrows-Wheeler Transform.
JACM 2001 .
[.ps] - R. Giancarlo and M. Sciortino: Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms.
In Proc. CPM 2003.
[SpringerLink] - P. Ferragina and G. Manzini: On Compressing and Indexing Data.
Technical report on FOCS 2000 & SODA 2001 papers.
[link] - Wing-Kai Hon, Kunihiko Sadakane, Wing-Kin Sung.
Breaking a Time-and-Space Barrier in Constructing Full-Text
Indices.
In Proc. FOCS'03, IEEE 2003, pp. 251-260.
[IEEE Xplore] [.ps] -
Wing-Kai Hon, Tak-Wah Lam, Kunihiko Sadakane and Wing-Kin Sung.
Constructing Compressed Suffix Arrays with Large Alphabets
In Proc. ISAAC'03, LNCS 2906, Springer 2003, pp. 240-249.
[SpringerLink] -
Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung and Siu-Ming Yiu.
A Space and Time Efficient Algorithm for Constructing Compressed
Suffix Arrays
In Proc. COCOON'02, LNCS 2387, Springer 2002, pp. 401-410.
[SpringerLink] -
Roberto Grossi, Ankur Gupta and Jeffrey Scott Vitter.
High-order entropy-compressed text indexes.
In Proc. SODA'03, ACM-SIAM 2003, pp. 841-850.
[ACM Portal] -
Kunihiko Sadakane.
New text indexing functionalities of the compressed suffix
arrays.
Journal of Algorithms 48 (2), 2003, pp. 294-313.
[Science Direct] -
Kunihiko Sadakane.
Compressed Suffix Trees with Full Functionality.
Manuscript, 2003.
[.ps] -
Wing-Kai Hon, Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung and
Siu-Ming Yiu.
Compressed index for dynamic text.
In DCC'04, 2004, pp. 102-111.
[IEEE Xplore] -
Kendall Willets.
Full-Text Searching & the Burrows-Wheeler Transform.
Dr. Dobb's Journal, December 2003.
[Willets' compressed index page] -
Jeong Seop Sim, Dong Kyue Kim, Heejin Park and Kunsoo Park.
Linear-time search in suffix arrays
InProc. AWOCA'03, pp. 139-146.
[.ppt] -
Veli Mäkinen and Gonzalo Navarro.
New search algorithms and time/space tradeoffs for succinct suffix arrays
Technical report C-2004-20, Univ. Helsinki, Dept. CS, April 2004.
[.ps] - Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo
Navarro.
An Alphabet-Friendly FM-Index.
To appear in Proc. SPIRE'04, October 5-8, Padova, Italy, 2004.
[.ps]
Analysis of longest common subsequences
- The following paper solved an old conjecture on the expected length of longest common subsequences. See the references in that paper for the history of this problem.
- Marcos Kiwi, Martin Loebl, and Jiri Matousek.Expected Length of
the Longest Common Subsequence for Large Alphabets.
In Proc. LATIN, LNCS 2976, pp. 302-311, 2004.
[SpringerLink]
Metric space indexes
-
Edgar Chávez and Gonzalo Navarro.
A Metric Index for Approximate String Matching.
In Proc. LATIN'02, LNCS 2286, Springer 2002, pp. 181-195.
[SpringerLink] [.ps] (Aristides)
External memory string structures
-
Juha Kärkkäinen and S. Srinivasa Rao.
Full-Text Indexes in External Memory.
Chapter 7 in U. Meyer, P. Sanders, J. Sibeyn (eds.), Algorithms for Memory Hierarchies (Advanced Lectures). LNCS 2625, Springer 2003, pp. 149-170.
[SpringerLink (chapter)] [SpringerLink (volume)] [.pdf (local)] -
Paolo Ferragina and Roberto Grossi.
The string B-tree: a new data structure for string search in
external memory and its applications
JACM 46 (2), 1999, pp. 236-280.
[ACM Portal] (Kimmo)
String Kernels
- Vishwanathan and A. Smola. Fast Kernels for String and Tree Matching.
Advances in Neural Information Processing Systems 15, 2003, pp. 569-576. (Juho) - C. Leslie and R. Kuang. Fast Kernels for Inexact String Matching.
In Proc. 16th Conference on Computational Learning Theory and 7th Kernel Workshop, COLT/Kernel'2003. Lecture Notes in Computer Science 2777 (2003), pp. 114 - 128.
[SpringerLink] (Juho) - Juho Rousu, John Shawe-Taylor: Efficient Computation of Gap-Weighted String Kernels on Large Alphabets.
In PASCAL workshop, January 2004.
[.ps] (Juho) - Lodhi, Saunders, Shawe-Taylor, Cristianini, Watkins: Text Classification using String Kernels.
Journal of Machine Learning Research 2(2002):419-444. (Juho)
Information Retrieval
- Holger Bast: Dimension Reduction: A Powerful Principle for Automatically
Finding Concepts in Unstructured Data
Survey manuscript, 2004.
[.pdf] - Thomas Hoffman: Unsupervised Learning by Probabilistic Latent
Semantic Analysis.
Machine Learning, 42, 177–196, 2001.
[.pdf ]
Music Information Retrieval
- See C-BRAHMS.
Miscellaneous
-
Adam Kalai.
Efficient Pattern-Matching with Don't Cares.
In Proc. SODA'02, ACM-SIAM 2002, pp. 655-656.
[ACM Portal] (Stefan) - Richard Cole, Lee-Ad Gottlieb, Moshe Lewenstein: Dictionary matching and indexing with errors and don't cares.
To appear in STOC 2004.
[.ps] (Stefan)
Veli Mäkinen