The Ukkonen class
For an implementation of Ukkonen’s algorithm.
This class implements Ukkonen’s algorithm for constructing a generalised suffix
tree consisting of list[int]
. The implementation in this class is based on:
https://cp-algorithms.com/string/suffix-tree-ukkonen.html
The suffix tree is updated when the member function ukkonen.add_word
is
invoked. Every non-duplicate word added to the tree has a unique letter appended
to the end. If a duplicate word is added, then the tree is not modified, but the
multiplicity
of the word is increased.
Contents
For an implementation of Ukkonen's algorithm. |
|
|
Copy a |
Returns the distance of a node from the root. |
|
Find the index of a word in the suffix tree. |
|
|
Initialize an existing Ukkonen object. |
Check if a state corresponds to a suffix. |
|
Returns the sum of the lengths of the distinct words in the suffix tree. |
|
Returns the sum of the lengths of all of the words in the suffix tree. |
|
Returns the maximum length of word in the suffix tree. |
|
Returns the multiplicity of a word by index. |
|
Returns the nodes in the suffix tree. |
|
Returns the number of distinct non-empty words in the suffix tree. |
|
Returns the number of non-empty words in the suffix tree. |
|
Throw if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree. |
|
Returns the unique letter added to the end of the |
|
Overloaded function. |
Full API
- class Ukkonen
-
- distance_from_root(self: Ukkonen, n: Ukkonen.Node) int
Returns the distance of a node from the root.
- Parameters:
n (Ukkonen.Node) – the node.
- Returns:
The distance from the root.
- Return type:
- Complexity:
At worst linear in the distance of the node n from the root.
- index(self: Ukkonen, w: str | list[int]) int | Undefined
Find the index of a word in the suffix tree.
If the w is one of the words that the suffix tree contains (the words added to the suffix tree via
ukkonen.add_word
), then this function returns the index of that word. If the word w is not one of the words that the suffix tree represents, thenUNDEFINED
is returned.
- init(self: Ukkonen) Ukkonen
Initialize an existing Ukkonen object.
This function puts an
Ukkonen
object back into the same state as if it had been newly default constructed.- Returns:
self.
- Return type:
See also
- is_suffix(self: Ukkonen, st: Ukkonen.State) int | Undefined
Check if a state corresponds to a suffix.
This function returns a an int if the state st corresponds to a suffix of any word in the suffix tree. The value returned is the index of the word which the state is a suffix of if such a word exists, and
UNDEFINED
otherwise.- Parameters:
st (Ukkonen.State) – the state.
- Returns:
The index of a word for which st is a suffix, or
UNDEFINED
.- Return type:
- length_of_distinct_words(self: Ukkonen) int
Returns the sum of the lengths of the distinct words in the suffix tree.
- Returns:
The length of the distinct words.
- Return type:
- Complexity:
Constant.
- length_of_words(self: Ukkonen) int
Returns the sum of the lengths of all of the words in the suffix tree.
This is the total length of all the words added to the suffix tree including duplicates, if any.
- Returns:
The length of the words.
- Return type:
- Complexity:
\(O(n)\) where \(n\) is the return value of
number_of_distinct_words
.
- max_word_length(self: Ukkonen) int
Returns the maximum length of word in the suffix tree.
- Returns:
The maximum length of a word.
- Return type:
- Complexity:
Constant.
- multiplicity(self: Ukkonen, i: int) int
Returns the multiplicity of a word by index.
This function returns the number of times that the word corresponding to the index i was added to the suffix tree.
- nodes(self: Ukkonen) list[Ukkonen.Node]
Returns the nodes in the suffix tree.
- Returns:
A list of nodes.
- Return type:
- Complexity:
Constant.
- number_of_distinct_words(self: Ukkonen) int
Returns the number of distinct non-empty words in the suffix tree.
This is the number of distinct non-empty words added via
ukkonen.add_word
.- Returns:
The number of distinct non-empty words.
- Return type:
- Complexity:
Constant.
- number_of_words(self: Ukkonen) int
Returns the number of non-empty words in the suffix tree.
This is the number of all words added via
ukkonen.add_word
including duplicates, if any.- Returns:
The number of words.
- Return type:
- Complexity:
\(O(n)\) where \(n\) is the return value of
number_of_distinct_words
.
- throw_if_contains_unique_letter(self: Ukkonen, w: str | list[int]) None
Throw if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
This function throws an exception if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
- Parameters:
- Raises:
LibsemigroupsError – if w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
- Complexity:
Linear in the length of w.
- unique_letter(self: Ukkonen, i: int) int
Returns the unique letter added to the end of the
i
-th distinct word in the suffix tree.
- word_index(self: Ukkonen, i: int) int
Overloaded function.
- word_index(self: Ukkonen, i: int) int
Returns the index of the word corresponding to a position.
This function returns the least non-negative integer
j
such that the i-th letter in the underlying string appears in thej
-th word added to the suffix tree.
- word_index(self: Ukkonen, n: Ukkonen.Node) int
Returns the index of the word corresponding to a node.
This function returns the least non-negative integer
i
such that the node n corresponds to thei
-th word added to the suffix tree.- Parameters:
n (Ukkonen.Node) – the node.
- Returns:
The index of a word.
- Return type:
- Complexity:
Constant.