The AhoCorasick class
For an implementation of the Aho-Corasick algorithm.
This class implements a trie based data structure with suffix links to be used with the Aho-Corasick dictionary searching algorithm. An introduction to this algorithm can be found here.
The implementation of AhoCorasick uses two different types of node;
active and inactive. An active node is a node that is currently a node
in the trie. An inactive node is a node that used to be part of the trie, but
has since been removed. It may later become active again after being
reinitialised, and exists as a way of minimising how frequently memory needs
to be allocated and deallocated for nodes. This function validates whether the
given index i corresponds to an active node.
Several helper functions are provided in the aho_corasick module, documented
here.
Contents
| For an implementation of the Aho-Corasick algorithm. | |
| Return the child of parent with edge-label letter | |
| Copy a  | |
| Calculate the height of a node. | |
| Reinitialise an existing AhoCorasick object. | |
| Check if a node is terminal (by index). | |
| Returns the number of nodes in the trie. | |
| Find the signature of a node | |
| Calculate the index of the suffix link of a node. | |
| Check if an index corresponds to a node currently in the trie. | |
| Check if an index corresponds to a node. | |
| Traverse the trie using suffix links where necessary. | 
Full API
- class AhoCorasick
- __init__(self: AhoCorasick) None
- Construct an empty AhoCorasick. - Construct an - AhoCorasickcontaining only the root that corresponds to the empty word \(\varepsilon\).
 - child(self: AhoCorasick, parent: int, letter: int) int | Undefined
- Return the child of parent with edge-label letter - This function returns the index of the child of the node with index parent along the edge labelled by letter. If no such child exists, - UNDEFINEDis returned.- Parameters:
- Returns:
- the index of the child. 
- Return type:
- Raises:
- LibsemigroupsError – if - throw_if_node_index_not_active(parent)throws.
- Complexity:
- Constant. 
 - See also 
 - copy(self: AhoCorasick) AhoCorasick
- Copy a - AhoCorasickobject.- Returns:
- A copy. 
- Return type:
 
 - height(self: AhoCorasick, i: int) int
- Calculate the height of a node. - Parameters:
- i (int) – the index of the node whose height is sought 
- Returns:
- the height. 
- Return type:
- Raises:
- LibsemigroupsError – if - throw_if_node_index_not_active(i)throws.
- Complexity:
- Linear in the return value which is, at worst, the maximum length of a word in the trie 
 - See also 
 - init(self: AhoCorasick) AhoCorasick
- Reinitialise an existing AhoCorasick object. - This function puts an - AhoCorasickobject back into the same state as if it had been newly default constructed.- Returns:
- self. 
- Return type:
- Complexity:
- Linear in the number of nodes in the trie 
 
 - is_terminal(self: AhoCorasick, i: int) bool
- Check if a node is terminal (by index). - This function checks if the node with index i is terminal or not. - Parameters:
- i (int) – the index. 
- Returns:
- Whether or not the node is terminal 
- Return type:
- Raises:
- LibsemigroupsError – if i does not correspond to the index of a node; that is, if i is larger than the size of the container storing the indices of nodes. 
- LibsemigroupsError – if i does not correspond to an active node. 
 
- Complexity:
- Constant 
 
 - number_of_nodes(self: AhoCorasick) int
- Returns the number of nodes in the trie. - This function Returns the number of nodes in the trie. - Returns:
- The number of nodes> 
- Return type:
- Complexity:
- Constant 
 
 - signature(self: AhoCorasick, i: int) list[int]
- Find the signature of a node - Return the the signature of the node with index i. Recall that the signature of a node \(n\) is the word consisting of the edge labels of the unique path from the root to \(n\). 
 - suffix_link(self: AhoCorasick, current: int) int
- Calculate the index of the suffix link of a node. - Calculate the index of a suffix link of a node. Recall that the suffix link of a node with signature \(W\) is the node with the signature equal to that of the longest proper suffix of \(W\) contained in the trie. - Parameters:
- current (int) – the index of the node whose suffix link is sought 
- Returns:
- The index of the suffix link. 
- Return type:
- Raises:
- LibsemigroupsError – if - throw_if_node_index_not_active(current)throws.
- Complexity:
- Linear in the height of the node. 
 - See also 
 - throw_if_node_index_not_active(self: AhoCorasick, i: int) None
- Check if an index corresponds to a node currently in the trie. - Parameters:
- i (int) – the index to validate 
- Raises:
- LibsemigroupsError – if - throw_if_node_index_out_of_range(i)throws, or if i is not an active node.
- Complexity:
- Constant 
 - See also 
 - throw_if_node_index_out_of_range(self: AhoCorasick, i: int) None
- Check if an index corresponds to a node. - This function checks if the given index i corresponds to the index of a node; either active or inactive. - Parameters:
- i (int) – the index to validate. 
- Raises:
- LibsemigroupsError – if i does not correspond to the index of a node; that is, if i is larger than the size of the container storing the indices of nodes. 
- Complexity:
- Constant 
 
 - traverse(self: AhoCorasick, current: int, a: int) int
- Traverse the trie using suffix links where necessary. - This function traverses the trie using suffix links where necessary, behaving like a combination of the goto function and the fail function in [AC75]. - If current is the index of a node with signature \(W\), and a is the letter \(a\), then traverse_no_checks(current, a) returns the index of the node with signature equal to the longest suffix of \(Wa\) contained in the trie. - Parameters:
- Returns:
- The index of the node traversed to 
- Return type:
- Raises:
- LibsemigroupsError – if - throw_if_node_index_not_active(current)throws.
 - See also