The Ukkonen class

For an implementation of Ukkonen’s algorithm.

This class implements Ukkonen’s algorithm for constructing a generalised suffix tree consisting of list[int]. The implementation in this class is based on: https://cp-algorithms.com/string/suffix-tree-ukkonen.html

The suffix tree is updated when the member function ukkonen.add_word is invoked. Every non-duplicate word added to the tree has a unique letter appended to the end. If a duplicate word is added, then the tree is not modified, but the multiplicity of the word is increased.

Contents

Ukkonen

For an implementation of Ukkonen's algorithm.

Ukkonen.copy(…)

Copy a Ukkonen object.

Ukkonen.distance_from_root(…)

Returns the distance of a node from the root.

Ukkonen.index(…)

Find the index of a word in the suffix tree.

Ukkonen.init(…)

Initialize an existing Ukkonen object.

Ukkonen.is_suffix(…)

Check if a state corresponds to a suffix.

Ukkonen.length_of_distinct_words(…)

Returns the sum of the lengths of the distinct words in the suffix tree.

Ukkonen.length_of_words(…)

Returns the sum of the lengths of all of the words in the suffix tree.

Ukkonen.max_word_length(…)

Returns the maximum length of word in the suffix tree.

Ukkonen.multiplicity(…)

Returns the multiplicity of a word by index.

Ukkonen.nodes(…)

Returns the nodes in the suffix tree.

Ukkonen.number_of_distinct_words(…)

Returns the number of distinct non-empty words in the suffix tree.

Ukkonen.number_of_words(…)

Returns the number of non-empty words in the suffix tree.

Ukkonen.throw_if_contains_unique_letter(…)

Throw if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.

Ukkonen.unique_letter(…)

Returns the unique letter added to the end of the i-th distinct word in the suffix tree.

Ukkonen.word_index(…)

Overloaded function.

Full API

class Ukkonen
__init__(self: Ukkonen) None

Constructs an empty generalised suffix tree.

copy(self: Ukkonen) Ukkonen

Copy a Ukkonen object.

Returns:

A copy.

Return type:

Ukkonen

distance_from_root(self: Ukkonen, n: Ukkonen.Node) int

Returns the distance of a node from the root.

Parameters:

n (Ukkonen.Node) – the node.

Returns:

The distance from the root.

Return type:

int

Complexity:

At worst linear in the distance of the node n from the root.

index(self: Ukkonen, w: str | list[int]) int | Undefined

Find the index of a word in the suffix tree.

If the w is one of the words that the suffix tree contains (the words added to the suffix tree via ukkonen.add_word), then this function returns the index of that word. If the word w is not one of the words that the suffix tree represents, then UNDEFINED is returned.

Parameters:

w (str | list[int]) – the word to check.

Returns:

The index of w.

Return type:

int | Undefined

Raises:

LibsemigroupsError – if throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

init(self: Ukkonen) Ukkonen

Initialize an existing Ukkonen object.

This function puts an Ukkonen object back into the same state as if it had been newly default constructed.

Returns:

self.

Return type:

Ukkonen

See also

Ukkonen()

is_suffix(self: Ukkonen, st: Ukkonen.State) int | Undefined

Check if a state corresponds to a suffix.

This function returns a an int if the state st corresponds to a suffix of any word in the suffix tree. The value returned is the index of the word which the state is a suffix of if such a word exists, and UNDEFINED otherwise.

Parameters:

st (Ukkonen.State) – the state.

Returns:

The index of a word for which st is a suffix, or UNDEFINED.

Return type:

int | Undefined

length_of_distinct_words(self: Ukkonen) int

Returns the sum of the lengths of the distinct words in the suffix tree.

Returns:

The length of the distinct words.

Return type:

int

Complexity:

Constant.

length_of_words(self: Ukkonen) int

Returns the sum of the lengths of all of the words in the suffix tree.

This is the total length of all the words added to the suffix tree including duplicates, if any.

Returns:

The length of the words.

Return type:

int

Complexity:

\(O(n)\) where \(n\) is the return value of number_of_distinct_words.

max_word_length(self: Ukkonen) int

Returns the maximum length of word in the suffix tree.

Returns:

The maximum length of a word.

Return type:

int

Complexity:

Constant.

multiplicity(self: Ukkonen, i: int) int

Returns the multiplicity of a word by index.

This function returns the number of times that the word corresponding to the index i was added to the suffix tree.

Parameters:

i (int) – the node index.

Returns:

The multiplicity.

Return type:

int

Complexity:

Constant.

nodes(self: Ukkonen) list[Ukkonen.Node]

Returns the nodes in the suffix tree.

Returns:

A list of nodes.

Return type:

list[Ukkonen.Node]

Complexity:

Constant.

number_of_distinct_words(self: Ukkonen) int

Returns the number of distinct non-empty words in the suffix tree.

This is the number of distinct non-empty words added via ukkonen.add_word.

Returns:

The number of distinct non-empty words.

Return type:

int

Complexity:

Constant.

number_of_words(self: Ukkonen) int

Returns the number of non-empty words in the suffix tree.

This is the number of all words added via ukkonen.add_word including duplicates, if any.

Returns:

The number of words.

Return type:

int

Complexity:

\(O(n)\) where \(n\) is the return value of number_of_distinct_words.

throw_if_contains_unique_letter(self: Ukkonen, w: str | list[int]) None

Throw if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.

This function throws an exception if the word w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.

Parameters:

w (str | list[int]) – the word to check.

Raises:

LibsemigroupsError – if w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.

Complexity:

Linear in the length of w.

unique_letter(self: Ukkonen, i: int) int

Returns the unique letter added to the end of the i-th distinct word in the suffix tree.

Parameters:

i (int) – the index of an added word.

Returns:

The unique letter.

Return type:

int

Complexity:

Constant.

word_index(self: Ukkonen, i: int) int

Overloaded function.

word_index(self: Ukkonen, i: int) int

Returns the index of the word corresponding to a position.

This function returns the least non-negative integer j such that the i-th letter in the underlying string appears in the j-th word added to the suffix tree.

Parameters:

i (int) – the position.

Returns:

The index of a word.

Return type:

int

Complexity:

Constant.

word_index(self: Ukkonen, n: Ukkonen.Node) int

Returns the index of the word corresponding to a node.

This function returns the least non-negative integer i such that the node n corresponds to the i-th word added to the suffix tree.

Parameters:

n (Ukkonen.Node) – the node.

Returns:

The index of a word.

Return type:

int

Complexity:

Constant.