LEA

Source code and tools

Our index structures and algorithms for aproximate pattern matching were implemented in the software library SeqAn. The source code is available on GitHub.

We designed and implemented some tools for generating and analyzing test data for approximate pattern matching:

strip_headers This tool preprocesses text files from the Project Gutenberg and strips for example the headers and footers. download documentation download source download Win32 download Mac
file_statistics Efficient computation of some text statistics for a given file, including the text length, alphabet size, number of distinct q-grams and empirical entropy. download documentation download source download Win32 download Mac
generate_patterns This tool extracts substrings from a text and generates a set of search patterns for approximate pattern matching. download documentation download source download Win32 download Mac
tt-analyze Calculates efficiently some statistical properties of texts and estimates parameters for probability models implemented by tt-generate. download documentation download source download Win32 download Mac
tt-generate This tool generates random texts using different models (such as markov chain, discrete autoregressive process, uniform distribution, or fibonacci word). download documentation download Win32 download Mac

(Deprecated: The source code of the dissertation project of Johannes Krugel is available for download download as zip file and has to be included as sandbox project in the folder sandbox/tum.)