Source code and tools
Our index structures and algorithms for aproximate pattern matching were implemented in the software library
. The source code is available on GitHub.
We designed and implemented some tools for generating and analyzing test data for approximate pattern matching:
| strip_headers | This tool preprocesses text files from the Project Gutenberg and strips for example the headers and footers. | ||||
| file_statistics | Efficient computation of some text statistics for a given file, including the text length, alphabet size, number of distinct q-grams and empirical entropy. | ||||
| generate_patterns | This tool extracts substrings from a text and generates a set of search patterns for approximate pattern matching. | ||||
| tt-analyze | Calculates efficiently some statistical properties of texts and estimates parameters for probability models implemented by tt-generate. | ||||
| tt-generate | This tool generates random texts using different models (such as markov chain, discrete autoregressive process, uniform distribution, or fibonacci word). |
(Deprecated: The source code of the dissertation project of Johannes Krugel is available for
download as zip file and has to be included as sandbox project in the folder sandbox/tum.)