Defined in aho-corasick.hpp.
This class implements a trie based data structure with suffix links to be used with the Aho-Corasick dictionary searching algorithm. An introduction to this algorithm can be found at:
https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
The implementation of AhoCorasick uses two different types of node; active and inactive. An active node is a node that is currently a node in the trie. An inactive node is a node that used to be part of the trie, but has since been removed. It may later become active again after being reinitialised, and exists as a way of minimising how frequently memory needs to be allocated and deallocated for nodes.
Several helper functions are provided in the aho_corasick namespace.
Public Types | |
| using | const_iterator = typename word_type::const_iterator |
Used for word_type iterators. | |
| using | index_type = size_t |
| Alias for the index of a node in the trie. | |
Public Member Functions | |
| AhoCorasick () | |
| Construct an empty AhoCorasick. | |
| AhoCorasick (AhoCorasick &&)=default | |
| Default move constructor. | |
| AhoCorasick (AhoCorasick const &)=default | |
| Default copy constructor. | |
| rx::iterator_range< std::set< index_type >::const_iterator > | active_nodes () const |
| Return the active nodes. | |
| template<typename Iterator> | |
| index_type | add_word (Iterator first, Iterator last) |
| Check and add a word to the trie. | |
| template<typename Iterator> | |
| index_type | add_word_no_checks (Iterator first, Iterator last) |
| Add a word to the trie. | |
| std::set< index_type >::iterator | begin_nodes () const noexcept |
| Return an iterator pointing to the first active node. | |
| std::set< index_type >::const_iterator | cbegin_nodes () const noexcept |
| Return a const iterator pointing to the first active node. | |
| std::set< index_type >::const_iterator | cend_nodes () const noexcept |
| Return a const iterator pointing one beyond the last active node. | |
| index_type | child (index_type parent, letter_type letter) const |
Return the child of parent with edge-label letter. | |
| index_type | child_no_checks (index_type parent, letter_type letter) const |
Return the child of parent with edge-label letter. | |
| std::set< index_type >::iterator | end_nodes () const noexcept |
| Return an iterator pointing one beyond the last active node. | |
| size_t | height (index_type i) const |
| Calculate the height of a node. | |
| size_t | height_no_checks (index_type i) const |
| Calculate the height of a node. | |
| AhoCorasick & | init () |
| Reinitialise an existing AhoCorasick object. | |
| Node const & | node (index_type i) const |
| Return the node given an index. | |
| Node const & | node_no_checks (index_type i) const |
| Return the node given an index. | |
| size_t | number_of_nodes () const noexcept |
| Returns the number of nodes in the trie. | |
| AhoCorasick & | operator= (AhoCorasick &&)=default |
| Default move assignment. | |
| AhoCorasick & | operator= (AhoCorasick const &)=default |
| Default copy assignment. | |
| template<typename Iterator> | |
| index_type | rm_word (Iterator first, Iterator last) |
| Check and add a word to the trie. | |
| template<typename Iterator> | |
| index_type | rm_word_no_checks (Iterator first, Iterator last) |
| Remove a word from the trie. | |
| word_type | signature (index_type i) const |
| Find the signature of a node (out-of-place). | |
| void | signature (word_type &w, index_type i) const |
| Find the signature of a node (in-place). | |
| word_type | signature_no_checks (index_type i) const |
| Find the signature of a node (out-of-place). | |
| void | signature_no_checks (word_type &w, index_type i) const |
| Find the signature of a node (in-place). | |
| index_type | suffix_link (index_type current) const |
| Calculate the index of the suffix link of a node. | |
| index_type | suffix_link_no_checks (index_type current) const |
| Calculate the index of the suffix link of a node. | |
| void | throw_if_node_index_not_active (index_type i) const |
| Check if an index corresponds to a node currently in the trie. | |
| void | throw_if_node_index_out_of_range (index_type i) const |
| Check if an index corresponds to a node. | |
| index_type | traverse (index_type current, letter_type a) const |
| Traverse the trie using suffix links where necessary. | |
| index_type | traverse_no_checks (index_type current, letter_type a) const |
| Traverse the trie using suffix links where necessary. | |
| AhoCorasick | ( | ) |
Construct an AhoCorasick containing only the root that corresponds to the empty word \(\varepsilon\).
|
default |
Default copy constructor.
|
default |
Default move constructor.
|
inlinenodiscard |
This function returns the active nodes of the trie.
rx::iterator_range<std::set<index_type>::const_iterator>.| index_type add_word | ( | Iterator | first, |
| Iterator | last ) |
This function does the same as add_word_no_checks(Iterator, Iterator) after first checking that the word corresponding to first and last does not correspond to an existing terminal node in the trie.
| LibsemigroupsException | if the word corresponding to first and last corresponds to an existing terminal node in the trie. |
| index_type add_word_no_checks | ( | Iterator | first, |
| Iterator | last ) |
Calling this function immediately adds the given word to the trie, and makes the final node on the path labelled by this word terminal (if it wasn't already). After adding a word, existing suffix links become invalid. If an identical word has already been added to the trie, then this function does nothing. If first == last, then this function does nothing.
| Iterator | the type of the 1st and 2nd parameters. |
| first | iterator pointing to the first letter of the word to add. |
| last | one beyond the last letter of the word to add. |
first and last.
|
inlinenodiscardnoexcept |
This function returns an iterator pointing to the first active node in the trie.
std::set<index_type>::iterator.noexcept and is guaranteed never to throw.
|
inlinenodiscardnoexcept |
This function returns a const iterator pointing to the first active node in the trie.
std::set<index_type>::const_iterator.noexcept and is guaranteed never to throw.
|
inlinenodiscardnoexcept |
This function returns a const iterator pointing one beyond the last active node in the trie.
std::set<index_type>::const_iterator.noexcept and is guaranteed never to throw.
|
inlinenodiscard |
After validating parent, this function performs the same as child_no_checks(parent, letter).
| LibsemigroupsException | if throw_if_node_index_not_active(parent) throws. |
|
inlinenodiscard |
This function returns the index of the child of the node with index parent along the edge labelled by letter. If no such child exists, UNDEFINED is returned.
| parent | the index of the node whose child is sought. |
| letter | the edge-label connecting the parent to the desired child. |
index_type.parent is greater than the number of nodes that have ever been created, then bad things will happen.
|
inlinenodiscardnoexcept |
This function returns an iterator pointing one beyond the last active node in the trie.
std::set<index_type>::iterator.noexcept and is guaranteed never to throw.
|
inlinenodiscard |
After validating i, this function performs the same as height_no_checks(i).
| LibsemigroupsException | if throw_if_node_index_not_active(i) throws. |
|
nodiscard |
Calculate the height of a node.
| i | the index of the node whose height is sought. |
size_t.i is greater than the number of nodes that have ever been created, then bad things will happen. | AhoCorasick & init | ( | ) |
This function puts an AhoCorasick object back into the same state as if it had been newly default constructed.
this.
|
inlinenodiscard |
After validating i, this function performs the same as node_no_checks(i).
| LibsemigroupsException | if throw_if_node_index_out_of_range(i) throws. |
|
inlinenodiscard |
This function returns the node stored in the trie given by the index i.
| i | the index of the node to return. |
Node.i is greater than the number of nodes that have ever been created, then bad things will happen.
|
inlinenodiscardnoexcept |
This function returns the number of nodes in the trie.
size_t.noexcept and is guaranteed never to throw.
|
default |
Default move assignment.
|
default |
Default copy assignment.
| index_type rm_word | ( | Iterator | first, |
| Iterator | last ) |
This function does the same as rm_word_no_checks(Iterator, Iterator) after first checking that the word corresponding to first and last is terminal node in the trie.
| LibsemigroupsException | if the word corresponding to first and last does not correspond to an existing terminal node in the trie. |
| index_type rm_word_no_checks | ( | Iterator | first, |
| Iterator | last ) |
From the trie, remove each node of the given word that is not part of the prefix of a different word.
If the given word \(W\) corresponds to a terminal node with no children, then calling this function removes the nodes \(n_i\) from the trie that correspond to the largest suffix \(W\), such that each \(n_i\) has either zero children or one. After this, existing suffix links become invalid.
If \(W\) corresponds to a terminal node \(n\) with children, then calling this function makes \(n\) not terminal.
If \(W\) does not correspond to a terminal node, then calling this function does nothing.
| Iterator | the type of the 1st and 2nd parameters. |
| first | iterator pointing to the first letter of the word to add. |
| last | one beyond the last letter of the word to add. |
first and last.
|
inline |
After validating i, this function performs the same as signature_no_checks(i).
| LibsemigroupsException | if throw_if_node_index_not_active(i) throws. |
|
inline |
After validating i, this function performs the same as signature_no_checks(w, i).
| LibsemigroupsException | if throw_if_node_index_not_active(i) throws. |
|
inline |
Return the the signature of the node with index i. Recall that the signature of a node \(n\) is the word consisting of the edge labels of the unique path from the root to \(n\).
| i | the index of the node whose signature is sought. |
word_type.i is greater than the number of nodes that have ever been created, then bad things will happen. | void signature_no_checks | ( | word_type & | w, |
| index_type | i ) const |
Changes w in-place to contain the signature of the node with index i . Recall that the signature of a node \(n\) is the word consisting of the edge labels of the unique path from the root to \(n\).
| w | the word to clear and change in-place. |
| i | the index of the node whose signature is sought. |
i is greater than the number of nodes that have ever been created, then bad things will happen.
|
inlinenodiscard |
After validating current, this function performs the same as suffix_link_no_checks(current).
| LibsemigroupsException | if throw_if_node_index_not_active(current) throws. |
|
nodiscard |
Calculate the index of a suffix link of a node. Recall that the suffix link of a node with signature \(W\) is the node with the signature equal to that of the longest proper suffix of \(W\) contained in the trie.
| current | the index of the node whose suffix link is sought. |
index_type.current is greater than the number of nodes that have ever been created, then bad things will happen. | void throw_if_node_index_not_active | ( | index_type | i | ) | const |
This function checks whether the given index i corresponds to an active node.
| i | the index to check. |
| LibsemigroupsException | if throw_if_node_index_out_of_range(i) throws, or if i is not an active node. |
| void throw_if_node_index_out_of_range | ( | index_type | i | ) | const |
This function checks if the given index i corresponds to the index of a node; either active or inactive.
| i | the index to check. |
| LibsemigroupsException | if i does not correspond to the index of a node; that is, if i is larger than the size of the container storing the indices of nodes. |
|
inlinenodiscard |
After validating current, this function performs the same as traverse_no_checks(current, a).
| LibsemigroupsException | if throw_if_node_index_not_active(current) throws. |
|
nodiscard |
This function traverses the trie using suffix links where necessary, behaving like a combination of the goto function and the fail function in [3].
If current is the index of a node with signature \(W\), and a is the letter \(a\), then traverse_no_checks(current, a) returns the index of the node with signature equal to the longest suffix of \(Wa\) contained in the trie.
| current | the index of the node to traverse from. |
| a | the letter to traverse. |
index_type.