libsemigroups  v3.0.0
C++ library for semigroups and monoids
Loading...
Searching...
No Matches
Ukkonen

This class implements Ukkonen's algorithm for constructing a generalised suffix tree consisting of word_type. The implementation in this class is based on:

https://cp-algorithms.com/string/suffix-tree-ukkonen.html

The suffix tree is updated when the member function add_word is invoked. Every non-duplicate word added to the tree has a unique letter appended to the end. If a duplicate word is added, then the tree is not modified, but the multiplicity of the word is increased.

Classes

struct  Node
 The type of the nodes in the tree. More...
 
struct  State
 The return type of traverse. More...
 

Public Types

using const_iterator = typename word_type::const_iterator
 Alias for word_type iterators.
 
using index_type = size_t
 Alias for an index between begin and end.
 
using unique_letter_type = size_t
 Alias for any letter that is added by Ukkonen (so that unique strings end in unique letters).
 
using word_index_type = size_t
 Alias for order that words were added.
 

Public Member Functions

 Ukkonen ()
 Default constructor.
 
 Ukkonen (Ukkonen &&)
 Default move constructor.
 
 Ukkonen (Ukkonen const &)
 Default copy constructor.
 
void add_word (const_iterator first, const_iterator last)
 Check and add a word to the suffix tree.
 
void add_word_no_checks (const_iterator first, const_iterator last)
 Add a word to the suffix tree.
 
const_iterator begin () const noexcept
 Returns an iterator pointing to the first letter of the first word in the suffix tree.
 
const_iterator cbegin () const noexcept
 Returns an iterator pointing to the first letter of the first word in the suffix tree.
 
const_iterator cend () const noexcept
 Returns an iterator pointing one past the last letter of the last word in the suffix tree.
 
size_t distance_from_root (Node const &n) const
 Returns the distance of a node from the root.
 
const_iterator end () const noexcept
 Returns an iterator pointing one past the last letter of the last word in the suffix tree.
 
template<typename Iterator>
word_index_type index (Iterator first, Iterator last) const
 Find the index of a word in the suffix tree.
 
template<typename Iterator>
word_index_type index_no_checks (Iterator first, Iterator last) const
 Find the index of a word in the suffix tree.
 
Ukkoneninit ()
 Initialize an existing Ukkonen object.
 
word_index_type is_suffix (State const &st) const
 Check if a state corresponds to a suffix.
 
bool is_unique_letter (letter_type l) const noexcept
 Check if a letter is a unique letter added to the end of a word in the suffix tree.
 
size_t length_of_distinct_words () const noexcept
 Returns the sum of the lengths of the distinct words in the suffix tree.
 
size_t length_of_words () const noexcept
 Returns the sum of the lengths of all of the words in the suffix tree.
 
size_t max_word_length () const noexcept
 Returns the maximum length of word in the suffix tree.
 
size_t multiplicity (word_index_type i) const
 Returns the multiplicity of a word by index.
 
size_t multiplicity_no_checks (word_index_type i) const
 Returns the multiplicity of a word by index.
 
std::vector< Node > const & nodes () const noexcept
 Returns the nodes in the suffix tree.
 
size_t number_of_distinct_words () const noexcept
 Returns the number of distinct non-empty words in the suffix tree.
 
size_t number_of_words () const noexcept
 Returns the number of non-empty words in the suffix tree.
 
Ukkonenoperator= (Ukkonen &&)
 Default move assignment.
 
Ukkonenoperator= (Ukkonen const &)
 Default copy assignment.
 
template<typename Iterator>
void throw_if_contains_unique_letter (Iterator first, Iterator last) const
 Throw if the word [first, last) contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
 
void throw_if_contains_unique_letter (word_type const &w) const
 Throw if w contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
 
template<typename Iterator>
std::pair< State, Iterator > traverse (Iterator first, Iterator last) const
 Traverse the suffix tree from the root.
 
template<typename Iterator>
Iterator traverse (State &st, Iterator first, Iterator last) const
 Traverse the suffix tree from the root.
 
template<typename Iterator>
std::pair< State, Iterator > traverse_no_checks (Iterator first, Iterator last) const
 Traverse the suffix tree from the root.
 
template<typename Iterator>
Iterator traverse_no_checks (State &st, Iterator first, Iterator last) const
 Traverse the suffix tree from the root.
 
unique_letter_type unique_letter (word_index_type i) const noexcept
 Returns the unique letter added to the end of a word in the suffix tree.
 
word_index_type word_index (index_type i) const
 Returns the index of the word corresponding to a position.
 
word_index_type word_index (Node const &n) const
 Returns the index of the word corresponding to a node.
 
word_index_type word_index_no_checks (index_type i) const
 Returns the index of the word corresponding to a position.
 
word_index_type word_index_no_checks (Node const &n) const
 Returns the index of the word corresponding to a node.
 

Constructor & Destructor Documentation

◆ Ukkonen() [1/3]

Ukkonen ( )

Constructs an empty generalised suffix tree.

Exceptions
This function guarantees not to throw a LibsemigroupsException.

◆ Ukkonen() [2/3]

Ukkonen ( Ukkonen const & )

Default copy constructor.

◆ Ukkonen() [3/3]

Ukkonen ( Ukkonen && )

Default move constructor.

Member Function Documentation

◆ add_word()

void add_word ( const_iterator first,
const_iterator last )
inline

This function does the same as add_word_no_checks(const_iterator, const_iterator) after first checking that none of the letters in the word corresponding to first and last is equal to any of the existing unique letters.

Exceptions
LibsemigroupsExceptionif throw_if_contains_unique_letter(first, last) throws.

◆ add_word_no_checks()

void add_word_no_checks ( const_iterator first,
const_iterator last )

Calling this function immediately invokes Ukkonen's algorithm to add the given word to the suffix tree (if it is not already contained in the tree). If an identical word is already in the tree, then this function does nothing except increase the multiplicity of that word. If first == last, then this function does nothing.

Parameters
firstiterator pointing to the first letter of the word to add.
lastone beyond the last letter of the word to add.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Linear in the distance between first and last.
Warning
This function does no checks on its arguments whatsoever. In particular, if the word corresponding to first and last contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.

◆ begin()

const_iterator begin ( ) const
inlinenoexcept

Returns an iterator pointing to the first letter of the first word in the suffix tree.

Returns
A value of type const_iterator.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ cbegin()

const_iterator cbegin ( ) const
inlinenoexcept

Returns an iterator pointing to the first letter of the first word in the suffix tree.

Returns
A value of type const_iterator.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ cend()

const_iterator cend ( ) const
inlinenoexcept

Returns an iterator pointing one past the last letter of the last word in the suffix tree.

Returns
A value of type const_iterator.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ distance_from_root()

size_t distance_from_root ( Node const & n) const

Returns the distance of a node from the root.

Parameters
nthe node.
Returns
A value of type size_t.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
At worst linear in the distance of the node n from the root.

◆ end()

const_iterator end ( ) const
inlinenoexcept

Returns an iterator pointing one past the last letter of the last word in the suffix tree.

Returns
A value of type const_iterator.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ index()

template<typename Iterator>
word_index_type index ( Iterator first,
Iterator last ) const
inline

See index_no_checks.

Exceptions
LibsemigroupsExceptionif throw_if_contains_unique_letter(first, last) throws.

◆ index_no_checks()

template<typename Iterator>
word_index_type index_no_checks ( Iterator first,
Iterator last ) const

If the word corresponding to first and last is one of the words that the suffix tree contains (the words added to the suffix tree via add_word or add_word_no_checks), then this function returns the index of that word. If the word corresponding to first and last is not one of the words that the suffix tree represents, then UNDEFINED is returned.

Template Parameters
Iteratorthe type of the arguments.
Parameters
firstiterator pointing to the first letter of the word to check.
lastone beyond the last letter of the word to check.
Returns
A value of type word_index_type.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Linear in the distance from first to last.
Warning
This function does no checks on its arguments whatsoever. In particular, if the word corresponding to first and last contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.

◆ init()

Ukkonen & init ( )

This function puts an Ukkonen object back into the same state as if it had been newly default constructed.

Returns
A reference to *this.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
See also
Ukkonen()

◆ is_suffix()

word_index_type is_suffix ( State const & st) const

This function returns a word_index_type if the state st corresponds to a suffix of any word in the suffix tree. The value returned is the index of the word which the state is a suffix of.

Parameters
stthe state.
Returns
A value of type word_index_type.
Exceptions
This function guarantees not to throw a LibsemigroupsException.

◆ is_unique_letter()

bool is_unique_letter ( letter_type l) const
inlinenoexcept

Returns true if l is one of the unique letters added to the end of a word in the suffix tree.

Parameters
lthe letter_type to check.
Returns
A value of type bool.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ length_of_distinct_words()

size_t length_of_distinct_words ( ) const
inlinenoexcept

Returns the sum of the lengths of the distinct words in the suffix tree.

Returns
A value of type size_t.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ length_of_words()

size_t length_of_words ( ) const
noexcept

Returns the sum of the lengths of all of the words in the suffix tree. This is the total length of all the words added to the suffix tree including duplicates, if any.

Returns
A value of type size_t.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
\(O(n)\) where \(n\) is the return value of number_of_distinct_words.

◆ max_word_length()

size_t max_word_length ( ) const
inlinenoexcept

Returns the maximum length of word in the suffix tree.

Returns
A value of type size_t.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ multiplicity()

size_t multiplicity ( word_index_type i) const
inline

This function returns the number of times that the word corresponding to the index i was added to the suffix tree.

Parameters
ithe node.
Returns
A value of type size_t.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Constant.

◆ multiplicity_no_checks()

size_t multiplicity_no_checks ( word_index_type i) const
inline

This function returns the number of times that the word corresponding to the index i was added to the suffix tree.

Parameters
ithe node.
Returns
A value of type size_t.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Constant.

◆ nodes()

std::vector< Node > const & nodes ( ) const
inlinenoexcept

Returns the nodes in the suffix tree.

Returns
A const reference to a std::vector<Node> of Ukkonen::Node objects.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ number_of_distinct_words()

size_t number_of_distinct_words ( ) const
inlinenoexcept

Returns the number of distinct non-empty words in the suffix tree. This is the number of distinct non-empty words added via add_word or add_word_no_checks.

Returns
A value of type size_t.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ number_of_words()

size_t number_of_words ( ) const
inlinenoexcept

Returns the number of non-empty words in the suffix tree. This is the number of all words added via add_word or add_word_no_checks including duplicates, if any.

Returns
A value of type size_t.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
\(O(n)\) where \(n\) is the return value of number_of_distinct_words.

◆ operator=() [1/2]

Ukkonen & operator= ( Ukkonen && )

Default move assignment.

◆ operator=() [2/2]

Ukkonen & operator= ( Ukkonen const & )

Default copy assignment.

◆ throw_if_contains_unique_letter() [1/2]

template<typename Iterator>
void throw_if_contains_unique_letter ( Iterator first,
Iterator last ) const

This function throws an exception if the word corresponding to first and last contains a letter equal to any of the unique letters added to the end of words in the suffix tree.

Template Parameters
Iteratorthe type of the arguments.
Parameters
firstiterator pointing to the first letter of the word.
lastone beyond the last letter of the word.
Exceptions
LibsemigroupsExceptionif is_unique_letter(*it) returns true for any it in [first, last).
Complexity
Linear in the distance from first to last.

◆ throw_if_contains_unique_letter() [2/2]

void throw_if_contains_unique_letter ( word_type const & w) const
inline

◆ traverse() [1/2]

template<typename Iterator>
std::pair< State, Iterator > traverse ( Iterator first,
Iterator last ) const
inline

See traverse_no_checks(Iterator, Iterator) const.

Exceptions
LibsemigroupsExceptionif throw_if_contains_unique_letter(first, last) throws.

◆ traverse() [2/2]

template<typename Iterator>
Iterator traverse ( State & st,
Iterator first,
Iterator last ) const
inline

See traverse_no_checks(State&, Iterator, Iterator) const.

Exceptions
LibsemigroupsExceptionif throw_if_contains_unique_letter(first, last) throws.

◆ traverse_no_checks() [1/2]

template<typename Iterator>
std::pair< State, Iterator > traverse_no_checks ( Iterator first,
Iterator last ) const
inline

This function traverses the edges in the suffix tree, starting at the root node, that are labelled by the letters in the word corresponding to first and last. The suffix tree is traversed until the end of the word is reached, or a letter not corresponding to an edge is encountered. A pair consisting of the state reached, and one past the last letter consumed in the traversal is returned.

Template Parameters
Iteratorthe type of the arguments.
Parameters
firstiterator pointing to the first letter of the word.
lastone beyond the last letter of the word.
Returns
A value of type std::pair<State, Iterator>.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Linear in the distance from first to last.
Warning
This function does no checks on its arguments whatsoever. In particular, if the word corresponding to first and last contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.

◆ traverse_no_checks() [2/2]

template<typename Iterator>
Iterator traverse_no_checks ( State & st,
Iterator first,
Iterator last ) const

This function traverses the edges in the suffix tree, starting at the state st, that are labelled by the letters in the word corresponding to first and last. The suffix tree is traversed until the end of the word is reached, or a letter not corresponding to an edge is encountered. The state st is modified in-place to contain the last state in the tree reached in the traversal. The returned iterator points one past the last letter consumed in the traversal.

Template Parameters
Iteratorthe type of the 2nd and 3rd arguments.
Parameters
stthe initial state.
firstiterator pointing to the first letter of the word.
lastone beyond the last letter of the word.
Returns
A value of type Iterator.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Linear in the distance from first to last.
Warning
This function does no checks on its arguments whatsoever. In particular, if the word corresponding to first and last contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.

◆ unique_letter()

unique_letter_type unique_letter ( word_index_type i) const
inlinenoexcept

Returns the unique letter added to the end of the i-th distinct word added to the suffix tree.

Parameters
ithe index of an added word.
Returns
A value of type unique_letter_type.
Exceptions
This function is noexcept and is guaranteed never to throw.
Complexity
Constant.

◆ word_index() [1/2]

word_index_type word_index ( index_type i) const
inline

This function returns the least non-negative integer j such that the Ukkonen::begin() + i points to a character in the j-th word added to the suffix tree.

Parameters
ithe position.
Returns
A value of type word_index_type.
Exceptions
LibsemigroupsExceptionif i is greater than length_of_distinct_words() + number_of_distinct_words()
Complexity
Constant.

◆ word_index() [2/2]

word_index_type word_index ( Node const & n) const
inline

This function returns the least non-negative integer i such that the node n corresponds to the i-th word added to the suffix tree.

Parameters
nthe node.
Returns
A value of type word_index_type.
Exceptions
LibsemigroupsExceptionif n.parent == UNDEFINED
Complexity
Constant.

◆ word_index_no_checks() [1/2]

word_index_type word_index_no_checks ( index_type i) const
inline

This function returns the least non-negative integer j such that the Ukkonen::begin() + i points to a character in the j-th word added to the suffix tree.

Parameters
ithe position.
Returns
A value of type word_index_type.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Constant.
Warning
This function does no checks on its arguments whatsoever. In particular, if i is greater than length_of_words + number_of_distinct_words, then bad things will happen.

◆ word_index_no_checks() [2/2]

word_index_type word_index_no_checks ( Node const & n) const
inline

This function returns the least non-negative integer i such that the node n corresponds to the i-th word added to the suffix tree.

Parameters
nthe node.
Returns
A value of type word_index_type.
Exceptions
This function guarantees not to throw a LibsemigroupsException.
Complexity
Constant.
Warning
This function does no checks on its arguments whatsoever. In particular, if n.parent == UNDEFINED then bad things will happen.

The documentation for this class was generated from the following file: