Ukkonen helpers
This page contains the documentation for various helper functions for
manipulating Ukkonen
objects.
Contents
In libsemigroups_pybind11
:
|
Check and add a word to the suffix tree. |
|
Add all words in a list to an |
|
Returns a |
|
Check if a word is a piece (occurs in two distinct places in the words of the suffix tree). |
|
Check if a word is a subword of any word in a suffix tree. |
|
Check if a word is a suffix of any word in a suffix tree. |
Find the length of the maximal prefix of a word occurring in two different places in a word in a suffix tree. |
|
Find the length of the maximal suffix of a word occurring in two different places in a word in a suffix tree. |
|
Find the maximal prefix of a word occurring in two different places in a word in a suffix tree. |
|
Find the maximal suffix of a word occurring in two different places in a word in a suffix tree. |
|
Returns the total number of distinct subwords of the words in the suffix tree u. |
|
Find the number of pieces in a decomposition of a word (if any). |
|
|
Find the pieces in a decomposition of a word (if any). |
|
Overloaded function. |
Full API
This page contains the documentation for the ukkonen
subpackage, that
contains helper functions for the Ukkonen
class.
- ukkonen.add_word(u: Ukkonen, w: str | list[int]) None
Check and add a word to the suffix tree.
Calling this first checks that none of the letters in w is equal to any of the existing unique letters. It then invokes Ukkonen’s algorithm to add the given word to the suffix tree (if it is not already contained in the tree). If an identical word is already in the tree, then this function does nothing except increase the multiplicity of that word. If w is empty, then this function does nothing.
- Parameters:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.add_words(u: Ukkonen, words: list[str] | list[list[int]]) None
Add all words in a list to an
Ukkonen
object.
- ukkonen.dot(u: Ukkonen) Dot
Returns a
Dot
object representing a suffix tree.This function returns a
Dot
object representing the suffix tree defined by u.Internally, all words added to the suffix tree are stored as a single string delimited by unique letters. The edge labels present in this
Dot
object correspond to intervals of letters in that delimited string.- Parameters:
- Returns:
A
Dot
object representing u.- Return type:
- Raises:
LibsemigroupsError – if u does not contain any words.
LibsemigroupsError – if the number of words in u is greater than
24
.
- ukkonen.is_piece(u: Ukkonen, w: str | list[int]) bool
Check if a word is a piece (occurs in two distinct places in the words of the suffix tree).
This function returns
True
if w occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, thenFalse
is returned.- Parameters:
- Returns:
Whether w is a piece.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.is_subword(u: Ukkonen, w: str | list[int]) bool
Check if a word is a subword of any word in a suffix tree.
This function returns
True
if w is a subword of one of the words in the suffix tree represented by theUkkonen
instance u.- Parameters:
- Returns:
Whether w is a subword of any word in u.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.is_suffix(u: Ukkonen, w: str | list[int]) bool
Check if a word is a suffix of any word in a suffix tree.
This function returns
True
if w is a suffix of one of the words in the suffix tree represented by theUkkonen
instance u.- Parameters:
- Returns:
Whether w is a suffix of any word in u.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.length_maximal_piece_prefix(u: Ukkonen, w: str | list[int]) int
Find the length of the maximal prefix of a word occurring in two different places in a word in a suffix tree.
This function returns the length of the maximal length prefix of w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then
0
is returned.- Parameters:
- Returns:
The length of the maximal piece prefix.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.length_maximal_piece_suffix(u: Ukkonen, w: str | list[int]) int
Find the length of the maximal suffix of a word occurring in two different places in a word in a suffix tree.
This function returns the length of the maximal length prefix of w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then
0
is returned.- Parameters:
- Returns:
The length of the maximal piece suffix.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.maximal_piece_prefix(u: Ukkonen, w: str | list[int]) str | list[int]
Find the maximal prefix of a word occurring in two different places in a word in a suffix tree.
This function returns the maximal length prefix of the word corresponding w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then an empty word is returned.
- Parameters:
- Returns:
The maximal piece prefix.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.maximal_piece_suffix(u: Ukkonen, w: str | list[int]) str | list[int]
Find the maximal suffix of a word occurring in two different places in a word in a suffix tree.
This function returns the maximal length suffix of the word corresponding w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such suffix exists, then an empty word is returned.
- Parameters:
- Returns:
The maximal piece suffix.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.number_of_distinct_subwords(u: Ukkonen) int
Returns the total number of distinct subwords of the words in the suffix tree u.
- ukkonen.number_of_pieces(u: Ukkonen, w: str | list[int]) int | PositiveInfinity
Find the number of pieces in a decomposition of a word (if any).
This function returns the minimum number of pieces whose product equals w if such a product exists, and
0
if no such product exists.Recall that a piece is a word that occurs in two distinct positions (possibly overlapping) of the words in the suffix tree u.
- Parameters:
- Returns:
The number of pieces.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.pieces(u: Ukkonen, w: str | list[int]) list[str] | list[list[int]]
Find the pieces in a decomposition of a word (if any).
This function returns a list of pieces whose product equals w if such a product exists, and an empty list if no such product exists.
Recall that a piece is a word that occurs in two distinct positions (possibly overlapping) of the words in the suffix tree u.
- Parameters:
- Returns:
The of pieces in the decomposition of w.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.traverse(*args, **kwargs)
Overloaded function.
- ukkonen.traverse(u: Ukkonen, w: str | list[int]) tuple[Ukkonen.State, str | list[int]]
Traverse the suffix tree from the root.
This function traverses the edges in the suffix tree, starting at the root node, that are labelled by the letters in W. The suffix tree is traversed until the end of W is reached, or a letter not corresponding to an edge is encountered. A pair consisting of the state reached, and the portion of w consumed in the traversal is returned.
- Parameters:
- Returns:
A tuple containing the
State
reached, and the word consumed.- Return type:
tuple[Ukkonen.State, str | list[int]]
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also
- ukkonen.traverse(u: Ukkonen, st: Ukkonen.State, w: str | list[int]) str | list[int]
Traverse the suffix tree from the root.
This function traverses the edges in the suffix tree, starting at the
Ukkonen.State
st, that are labelled by the letters in w. The suffix tree is traversed until the end of w*is reached, or a letter not corresponding to an edge is encountered. The state *st is modified in-place to contain the last state in the tree reached in the traversal. The returned word represents the portion of w that was consumed in the traversal.- Parameters:
st (Ukkonen.State) – the
Ukkonen.State
object from which to traverse.
- Returns:
The portion of w that was consumed in the traversal.
- Return type:
- Raises:
LibsemigroupsError – if
u.throw_if_contains_unique_letter(w)
throws.- Complexity:
Linear in the length of w.
See also