![]() |
libsemigroups
v3.0.0
C++ library for semigroups and monoids
|
This class implements Ukkonen's algorithm for constructing a generalised suffix tree consisting of word_type. The implementation in this class is based on:
https://cp-algorithms.com/string/suffix-tree-ukkonen.html
The suffix tree is updated when the member function add_word is invoked. Every non-duplicate word added to the tree has a unique letter appended to the end. If a duplicate word is added, then the tree is not modified, but the multiplicity of the word is increased.
Classes | |
struct | Node |
The type of the nodes in the tree. More... | |
struct | State |
The return type of traverse. More... | |
Public Types | |
using | const_iterator = typename word_type::const_iterator |
Alias for word_type iterators. | |
using | index_type = size_t |
Alias for an index between begin and end. | |
using | unique_letter_type = size_t |
Alias for any letter that is added by Ukkonen (so that unique strings end in unique letters). | |
using | word_index_type = size_t |
Alias for order that words were added. | |
Public Member Functions | |
Ukkonen () | |
Default constructor. | |
Ukkonen (Ukkonen &&) | |
Default move constructor. | |
Ukkonen (Ukkonen const &) | |
Default copy constructor. | |
void | add_word (const_iterator first, const_iterator last) |
Check and add a word to the suffix tree. | |
void | add_word_no_checks (const_iterator first, const_iterator last) |
Add a word to the suffix tree. | |
const_iterator | begin () const noexcept |
Returns an iterator pointing to the first letter of the first word in the suffix tree. | |
const_iterator | cbegin () const noexcept |
Returns an iterator pointing to the first letter of the first word in the suffix tree. | |
const_iterator | cend () const noexcept |
Returns an iterator pointing one past the last letter of the last word in the suffix tree. | |
size_t | distance_from_root (Node const &n) const |
Returns the distance of a node from the root. | |
const_iterator | end () const noexcept |
Returns an iterator pointing one past the last letter of the last word in the suffix tree. | |
template<typename Iterator> | |
word_index_type | index (Iterator first, Iterator last) const |
Find the index of a word in the suffix tree. | |
template<typename Iterator> | |
word_index_type | index_no_checks (Iterator first, Iterator last) const |
Find the index of a word in the suffix tree. | |
Ukkonen & | init () |
Initialize an existing Ukkonen object. | |
word_index_type | is_suffix (State const &st) const |
Check if a state corresponds to a suffix. | |
bool | is_unique_letter (letter_type l) const noexcept |
Check if a letter is a unique letter added to the end of a word in the suffix tree. | |
size_t | length_of_distinct_words () const noexcept |
Returns the sum of the lengths of the distinct words in the suffix tree. | |
size_t | length_of_words () const noexcept |
Returns the sum of the lengths of all of the words in the suffix tree. | |
size_t | max_word_length () const noexcept |
Returns the maximum length of word in the suffix tree. | |
size_t | multiplicity (word_index_type i) const |
Returns the multiplicity of a word by index. | |
size_t | multiplicity_no_checks (word_index_type i) const |
Returns the multiplicity of a word by index. | |
std::vector< Node > const & | nodes () const noexcept |
Returns the nodes in the suffix tree. | |
size_t | number_of_distinct_words () const noexcept |
Returns the number of distinct non-empty words in the suffix tree. | |
size_t | number_of_words () const noexcept |
Returns the number of non-empty words in the suffix tree. | |
Ukkonen & | operator= (Ukkonen &&) |
Default move assignment. | |
Ukkonen & | operator= (Ukkonen const &) |
Default copy assignment. | |
template<typename Iterator> | |
void | throw_if_contains_unique_letter (Iterator first, Iterator last) const |
Throw if the word [first, last) contains a letter equal to any of the unique letters added to the end of words in the suffix tree. | |
void | throw_if_contains_unique_letter (word_type const &w) const |
Throw if w contains a letter equal to any of the unique letters added to the end of words in the suffix tree. | |
template<typename Iterator> | |
std::pair< State, Iterator > | traverse (Iterator first, Iterator last) const |
Traverse the suffix tree from the root. | |
template<typename Iterator> | |
Iterator | traverse (State &st, Iterator first, Iterator last) const |
Traverse the suffix tree from the root. | |
template<typename Iterator> | |
std::pair< State, Iterator > | traverse_no_checks (Iterator first, Iterator last) const |
Traverse the suffix tree from the root. | |
template<typename Iterator> | |
Iterator | traverse_no_checks (State &st, Iterator first, Iterator last) const |
Traverse the suffix tree from the root. | |
unique_letter_type | unique_letter (word_index_type i) const noexcept |
Returns the unique letter added to the end of a word in the suffix tree. | |
word_index_type | word_index (index_type i) const |
Returns the index of the word corresponding to a position. | |
word_index_type | word_index (Node const &n) const |
Returns the index of the word corresponding to a node. | |
word_index_type | word_index_no_checks (index_type i) const |
Returns the index of the word corresponding to a position. | |
word_index_type | word_index_no_checks (Node const &n) const |
Returns the index of the word corresponding to a node. | |
Ukkonen | ( | ) |
Constructs an empty generalised suffix tree.
|
inline |
This function does the same as add_word_no_checks(const_iterator,
const_iterator) after first checking that none of the letters in the word corresponding to first
and last
is equal to any of the existing unique letters.
LibsemigroupsException | if throw_if_contains_unique_letter(first, last) throws. |
void add_word_no_checks | ( | const_iterator | first, |
const_iterator | last ) |
Calling this function immediately invokes Ukkonen's algorithm to add the given word to the suffix tree (if it is not already contained in the tree). If an identical word is already in the tree, then this function does nothing except increase the multiplicity of that word. If first == last
, then this function does nothing.
first | iterator pointing to the first letter of the word to add. |
last | one beyond the last letter of the word to add. |
first
and last
.first
and last
contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.
|
inlinenoexcept |
Returns an iterator pointing to the first letter of the first word in the suffix tree.
const_iterator
.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns an iterator pointing to the first letter of the first word in the suffix tree.
const_iterator
.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns an iterator pointing one past the last letter of the last word in the suffix tree.
const_iterator
.noexcept
and is guaranteed never to throw.size_t distance_from_root | ( | Node const & | n | ) | const |
Returns the distance of a node from the root.
n | the node. |
size_t
.n
from the root.
|
inlinenoexcept |
Returns an iterator pointing one past the last letter of the last word in the suffix tree.
const_iterator
.noexcept
and is guaranteed never to throw.
|
inline |
See index_no_checks.
LibsemigroupsException | if throw_if_contains_unique_letter(first, last) throws. |
word_index_type index_no_checks | ( | Iterator | first, |
Iterator | last ) const |
If the word corresponding to first
and last
is one of the words that the suffix tree contains (the words added to the suffix tree via add_word
or add_word_no_checks
), then this function returns the index of that word. If the word corresponding to first
and last
is not one of the words that the suffix tree represents, then UNDEFINED is returned.
Iterator | the type of the arguments. |
first | iterator pointing to the first letter of the word to check. |
last | one beyond the last letter of the word to check. |
word_index_type
.first
to last
.first
and last
contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen. Ukkonen & init | ( | ) |
This function puts an Ukkonen object back into the same state as if it had been newly default constructed.
*this
.word_index_type is_suffix | ( | State const & | st | ) | const |
This function returns a word_index_type
if the state st
corresponds to a suffix of any word in the suffix tree. The value returned is the index of the word which the state is a suffix of.
st | the state. |
word_index_type
.
|
inlinenoexcept |
Returns true
if l
is one of the unique letters added to the end of a word in the suffix tree.
l | the letter_type to check. |
bool
.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns the sum of the lengths of the distinct words in the suffix tree.
size_t
.noexcept
and is guaranteed never to throw.
|
noexcept |
Returns the sum of the lengths of all of the words in the suffix tree. This is the total length of all the words added to the suffix tree including duplicates, if any.
size_t
.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns the maximum length of word in the suffix tree.
size_t
.noexcept
and is guaranteed never to throw.
|
inline |
This function returns the number of times that the word corresponding to the index i
was added to the suffix tree.
i | the node. |
size_t
.
|
inline |
This function returns the number of times that the word corresponding to the index i
was added to the suffix tree.
i | the node. |
size_t
.
|
inlinenoexcept |
Returns the nodes in the suffix tree.
Ukkonen::Node
objects.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns the number of distinct non-empty words in the suffix tree. This is the number of distinct non-empty words added via add_word or add_word_no_checks.
size_t
.noexcept
and is guaranteed never to throw.
|
inlinenoexcept |
Returns the number of non-empty words in the suffix tree. This is the number of all words added via add_word or add_word_no_checks including duplicates, if any.
size_t
.noexcept
and is guaranteed never to throw.void throw_if_contains_unique_letter | ( | Iterator | first, |
Iterator | last ) const |
This function throws an exception if the word corresponding to first
and last
contains a letter equal to any of the unique letters added to the end of words in the suffix tree.
Iterator | the type of the arguments. |
first | iterator pointing to the first letter of the word. |
last | one beyond the last letter of the word. |
LibsemigroupsException | if is_unique_letter(*it) returns true for any it in [first, last) . |
first
to last
.
|
inline |
|
inline |
See traverse_no_checks(Iterator, Iterator) const.
LibsemigroupsException | if throw_if_contains_unique_letter(first, last) throws. |
|
inline |
See traverse_no_checks(State&, Iterator, Iterator) const.
LibsemigroupsException | if throw_if_contains_unique_letter(first, last) throws. |
|
inline |
This function traverses the edges in the suffix tree, starting at the root node, that are labelled by the letters in the word corresponding to first
and last
. The suffix tree is traversed until the end of the word is reached, or a letter not corresponding to an edge is encountered. A pair consisting of the state reached, and one past the last letter consumed in the traversal is returned.
Iterator | the type of the arguments. |
first | iterator pointing to the first letter of the word. |
last | one beyond the last letter of the word. |
first
to last
.first
and last
contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen. Iterator traverse_no_checks | ( | State & | st, |
Iterator | first, | ||
Iterator | last ) const |
This function traverses the edges in the suffix tree, starting at the state st
, that are labelled by the letters in the word corresponding to first
and last
. The suffix tree is traversed until the end of the word is reached, or a letter not corresponding to an edge is encountered. The state st
is modified in-place to contain the last state in the tree reached in the traversal. The returned iterator points one past the last letter consumed in the traversal.
Iterator | the type of the 2nd and 3rd arguments. |
st | the initial state. |
first | iterator pointing to the first letter of the word. |
last | one beyond the last letter of the word. |
Iterator
.first
to last
.first
and last
contains any of the unique letters appended to the end of any existing word in the tree, then bad things will happen.
|
inlinenoexcept |
Returns the unique letter added to the end of the i-th
distinct word added to the suffix tree.
i | the index of an added word. |
unique_letter_type
.noexcept
and is guaranteed never to throw.
|
inline |
This function returns the least non-negative integer j
such that the Ukkonen::begin() + i
points to a character in the j
-th word added to the suffix tree.
i | the position. |
word_index_type
.LibsemigroupsException | if i is greater than length_of_distinct_words() + number_of_distinct_words() |
|
inline |
This function returns the least non-negative integer i
such that the node n
corresponds to the i
-th word added to the suffix tree.
n | the node. |
word_index_type
.LibsemigroupsException | if n.parent == UNDEFINED |
|
inline |
This function returns the least non-negative integer j
such that the Ukkonen::begin() + i
points to a character in the j
-th word added to the suffix tree.
i | the position. |
word_index_type
.i
is greater than length_of_words + number_of_distinct_words, then bad things will happen.
|
inline |
This function returns the least non-negative integer i
such that the node n
corresponds to the i
-th word added to the suffix tree.
n | the node. |
word_index_type
.n.parent == UNDEFINED
then bad things will happen.