SPP 1307 Home Related Work Goals References Source code and tools Test instances: Texts Test instances: Patterns

Source code and tools

Our index structures and algorithms for aproximate pattern matching were implemented in the software library . The source code is available on GitHub.

We designed and implemented some tools for generating and analyzing test data for approximate pattern matching:

`strip_headers`	This tool preprocesses text files from the Project Gutenberg and strips for example the headers and footers.	documentation	source	Win32	Mac
`file_statistics`	Efficient computation of some text statistics for a given file, including the text length, alphabet size, number of distinct q-grams and empirical entropy.	documentation	source	Win32	Mac
`generate_patterns`	This tool extracts substrings from a text and generates a set of search patterns for approximate pattern matching.	documentation	source	Win32	Mac
`tt-analyze`	Calculates efficiently some statistical properties of texts and estimates parameters for probability models implemented by `tt-generate`.	documentation	source	Win32	Mac
`tt-generate`	This tool generates random texts using different models (such as markov chain, discrete autoregressive process, uniform distribution, or fibonacci word).	documentation	source	Win32	Mac

(Deprecated: The source code of the dissertation project of Johannes Krugel is available for download as zip file and has to be included as sandbox project in the folder sandbox/tum.)