Ukkonen helpers

This page contains the documentation for various helper functions for manipulating Ukkonen objects.

Contents

In libsemigroups_pybind11:

add_word(…)

Check and add a word to the suffix tree.

add_words(…)

Add all words in a list to an Ukkonen object.

dot(…)

Returns a Dot object representing a suffix tree.

is_piece(…)

Check if a word is a piece (occurs in two distinct places in the words of the suffix tree).

is_subword(…)

Check if a word is a subword of any word in a suffix tree.

is_suffix(…)

Check if a word is a suffix of any word in a suffix tree.

length_maximal_piece_prefix(…)

Find the length of the maximal prefix of a word occurring in two different places in a word in a suffix tree.

length_maximal_piece_suffix(…)

Find the length of the maximal suffix of a word occurring in two different places in a word in a suffix tree.

maximal_piece_prefix(…)

Find the maximal prefix of a word occurring in two different places in a word in a suffix tree.

maximal_piece_suffix(…)

Find the maximal suffix of a word occurring in two different places in a word in a suffix tree.

number_of_distinct_subwords(…)

Returns the total number of distinct subwords of the words in the suffix tree u.

number_of_pieces(…)

Find the number of pieces in a decomposition of a word (if any).

pieces(…)

Find the pieces in a decomposition of a word (if any).

traverse(…)

Overloaded function.

Full API

This page contains the documentation for the ukkonen subpackage, that contains helper functions for the Ukkonen class.

ukkonen.add_word(u: Ukkonen, w: str | list[int]) None

Check and add a word to the suffix tree.

Calling this first checks that none of the letters in w is equal to any of the existing unique letters. It then invokes Ukkonen’s algorithm to add the given word to the suffix tree (if it is not already contained in the tree). If an identical word is already in the tree, then this function does nothing except increase the multiplicity of that word. If w is empty, then this function does nothing.

Parameters:
Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.add_words(u: Ukkonen, words: list[str] | list[list[int]]) None

Add all words in a list to an Ukkonen object.

Parameters:
Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws for any w in words.

ukkonen.dot(u: Ukkonen) Dot

Returns a Dot object representing a suffix tree.

This function returns a Dot object representing the suffix tree defined by u.

Internally, all words added to the suffix tree are stored as a single string delimited by unique letters. The edge labels present in this Dot object correspond to intervals of letters in that delimited string.

Parameters:

u (Ukkonen) – the Ukkonen object.

Returns:

A Dot object representing u.

Return type:

Dot

Raises:
ukkonen.is_piece(u: Ukkonen, w: str | list[int]) bool

Check if a word is a piece (occurs in two distinct places in the words of the suffix tree).

This function returns True if w occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then False is returned.

Parameters:
Returns:

Whether w is a piece.

Return type:

bool

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.is_subword(u: Ukkonen, w: str | list[int]) bool

Check if a word is a subword of any word in a suffix tree.

This function returns True if w is a subword of one of the words in the suffix tree represented by the Ukkonen instance u.

Parameters:
Returns:

Whether w is a subword of any word in u.

Return type:

bool

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.is_suffix(u: Ukkonen, w: str | list[int]) bool

Check if a word is a suffix of any word in a suffix tree.

This function returns True if w is a suffix of one of the words in the suffix tree represented by the Ukkonen instance u.

Parameters:
Returns:

Whether w is a suffix of any word in u.

Return type:

bool

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.length_maximal_piece_prefix(u: Ukkonen, w: str | list[int]) int

Find the length of the maximal prefix of a word occurring in two different places in a word in a suffix tree.

This function returns the length of the maximal length prefix of w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then 0 is returned.

Parameters:
Returns:

The length of the maximal piece prefix.

Return type:

int

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.length_maximal_piece_suffix(u: Ukkonen, w: str | list[int]) int

Find the length of the maximal suffix of a word occurring in two different places in a word in a suffix tree.

This function returns the length of the maximal length prefix of w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then 0 is returned.

Parameters:
Returns:

The length of the maximal piece suffix.

Return type:

int

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.maximal_piece_prefix(u: Ukkonen, w: str | list[int]) str | list[int]

Find the maximal prefix of a word occurring in two different places in a word in a suffix tree.

This function returns the maximal length prefix of the word corresponding w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such prefix exists, then an empty word is returned.

Parameters:
Returns:

The maximal piece prefix.

Return type:

str | list[int]

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.maximal_piece_suffix(u: Ukkonen, w: str | list[int]) str | list[int]

Find the maximal suffix of a word occurring in two different places in a word in a suffix tree.

This function returns the maximal length suffix of the word corresponding w that occurs in at least \(2\) different (possibly overlapping) places in the words contained in u. If no such suffix exists, then an empty word is returned.

Parameters:
Returns:

The maximal piece suffix.

Return type:

str | list[int]

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.number_of_distinct_subwords(u: Ukkonen) int

Returns the total number of distinct subwords of the words in the suffix tree u.

Parameters:

u (Ukkonen) – the Ukkonen object.

Returns:

The total number of distinct subwords.

Return type:

int

Complexity:

Linear in Ukkonen.length_of_distinct_words.

ukkonen.number_of_pieces(u: Ukkonen, w: str | list[int]) int | PositiveInfinity

Find the number of pieces in a decomposition of a word (if any).

This function returns the minimum number of pieces whose product equals w if such a product exists, and 0 if no such product exists.

Recall that a piece is a word that occurs in two distinct positions (possibly overlapping) of the words in the suffix tree u.

Parameters:
Returns:

The number of pieces.

Return type:

int | PositiveInfinity

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.pieces(u: Ukkonen, w: str | list[int]) list[str] | list[list[int]]

Find the pieces in a decomposition of a word (if any).

This function returns a list of pieces whose product equals w if such a product exists, and an empty list if no such product exists.

Recall that a piece is a word that occurs in two distinct positions (possibly overlapping) of the words in the suffix tree u.

Parameters:
Returns:

The of pieces in the decomposition of w.

Return type:

list[str] | list[list[int]]

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.traverse(*args, **kwargs)

Overloaded function.

ukkonen.traverse(u: Ukkonen, w: str | list[int]) tuple[Ukkonen.State, str | list[int]]

Traverse the suffix tree from the root.

This function traverses the edges in the suffix tree, starting at the root node, that are labelled by the letters in W. The suffix tree is traversed until the end of W is reached, or a letter not corresponding to an edge is encountered. A pair consisting of the state reached, and the portion of w consumed in the traversal is returned.

Parameters:
Returns:

A tuple containing the State reached, and the word consumed.

Return type:

tuple[Ukkonen.State, str | list[int]]

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.

ukkonen.traverse(u: Ukkonen, st: Ukkonen.State, w: str | list[int]) str | list[int]

Traverse the suffix tree from the root.

This function traverses the edges in the suffix tree, starting at the Ukkonen.State st, that are labelled by the letters in w. The suffix tree is traversed until the end of w*is reached, or a letter not corresponding to an edge is encountered. The state *st is modified in-place to contain the last state in the tree reached in the traversal. The returned word represents the portion of w that was consumed in the traversal.

Parameters:
Returns:

The portion of w that was consumed in the traversal.

Return type:

str | list[int]

Raises:

LibsemigroupsError – if u.throw_if_contains_unique_letter(w) throws.

Complexity:

Linear in the length of w.