generate_patterns.cpp File Reference
Detailed Description
Extracts substrings from a text and generates a set of search patterns for approximate pattern matching.
The text is read from INFILE
and the patterns are printed to stdout
, separated by line breaks. It is also possible to handle UTF-8 encoded strings.
The program will extract PATTERN_COUNT
substrings from the file INFILE
, each having length PATTERN_LENGTH
. Then it will for each pattern introduce MAX_DISTANCE
modifications. If DISTANCE_MEASURE
is edit
, these modifications can be deletions, insertions and substitutions of single characters. If DISTANCE_MEASURE
is hamming
, the operations are only be substitutions.
Line breaks are ignored, so that patterns can also span two or more lines (this is especially useful for FASTA files).
Usage:
generate_patterns PATTERN_COUNT PATTERN_LENGTH DISTANCE_MEASURE MAX_DISTANCE INFILE [ENCODING=single-byte [FILETYPE=plain]]
- Parameters:
-
PATTERN_COUNT Maximum number of patterns to be generated. PATTERN_LENGTH Length of the patterns to be generated. DISTANCE_MEASURE Distance measure. Has to be one of the following (edit | hamming)
.MAX_DISTANCE Maximum numbers of operations to perform on the substrings. INFILE Name of the input text file to extract the patterns. ENCODING The encoding of the input file. Has to be one of the following: (single-byte | UTF-8)
.FILETYPE Whether the input file should be treated as a regular text file or as a FASTA file. The only difference is that for a FASTA file all lines starting with a >
will be ignored, and patterns are not allowed to span two different FASTA sequences. Has to be one of the following:(plain | fasta)
.
Examples:
- Generate 5 patterns from this source code file allowing no modifications:
$ ./generate_patterns 5 10 edit 1 generate_patterns.cpp oximate Pa tream>#inc SoFar = 0; MeasureStr exical_cas
- Generate 5 patterns from this source code file using edit distance and allowing one modification:
$ ./generate_patterns 5 10 edit 1 generate_patterns.cpp oximatre Pa tr_am>#inc SoFar , 0; MeasureSr exicalcas
- Returns:
- 0 on success, something else on error
Download:
- The newest version of this tool can be downloaded from http://wwwmayr.in.tum.de/spp1307/downloads.html