Efficient Learning of One-Variable Pattern Languages from Positive Data
Thomas Erlebach, Peter Rossmanith, Hans Stadtherr,
Angelika Steger, and Thomas Zeugmann
DOI Technical Report DOI-TR-128
Department of Informatics
Kyushu University
Fukuoka, Japan, December 1996
Abstract.
A pattern is a finite string of constant and variable
symbols. The language generated by a pattern is the set of
all strings of constant symbols which can be obtained from
the pattern by substituting non-empty strings for variables.
Descriptive patterns are a key concept for inductive
inference of pattern languages. A pattern p is descriptive
for a given sample if the sample is contained in the language
L(p)
generated by p and no other pattern having this property generates
a proper subset of the language L(p). The best previously
known algorithm for computing descriptive one-variable patterns
requires time O(n4*log(n)), where n
is the size of the
sample. We present a simpler and more efficient algorithm
solving the same problem in time O(n2*log(n)).
In addition, we give a parallel version of this algorithm that
requires time O(log(n)) and
O(n3/log(n)) processors on an EREW-PRAM.
Previously, no parallel algorithm was known for this problem.
Using a modified version of the algorithm computing descriptive
one-variable patterns as a subroutine, we
devise a learning algorithm for one-variable patterns whose
expected total learning time is O(l2*log(l))
provided the sample strings are drawn from the target language according
to a probability distribution with expected string length l.
The probability distribution must be such that
strings of equal length have equal probability, but can
be arbitrary otherwise.
Furthermore, we show how the algorithm for descriptive
one-variable patterns can be used for
learning one-variable patterns with a polynomial number
of superset queries.
Hans Stadtherr, 1997/09/16