AhoCorasick helpers
This module contains various helper functions for the class AhoCorasick
.
These functions could be functions of AhoCorasick
but they only use
public member functions of AhoCorasick
, and so they are declared
as free functions instead.
>>> from libsemigroups_pybind11 import AhoCorasick, aho_corasick
>>> # Construct an empty AhoCorasick
>>> ac = AhoCorasick()
>>> # Add words
>>> aho_corasick.add_word(ac, [0, 1, 0, 1])
4
>>> aho_corasick.add_word(ac, [0, 1, 1, 0])
6
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 1])
7
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 0])
8
>>> # Can't add a word that already exists
>>> aho_corasick.add_word(ac, [0, 1, 1, 0, 0])
Traceback (most recent call last):
...
LibsemigroupsError: the word [0, 1, 1, 0, 0] given by the arguments [first, last) already belongs to the trie
>>> # Remove words
>>> aho_corasick.rm_word(ac, [0, 1, 0, 1])
4
>>> # Can't remove a word that is not a terminal node in the trie
>>> aho_corasick.rm_word(ac, [0, 1, 0, 1])
Traceback (most recent call last):
...
LibsemigroupsError: cannot remove the word [0, 1, 0, 1] given by the arguments [first, last), as it does not correspond to a node in the trie
>>> # Traverse
>>> aho_corasick.traverse_word(ac, 5, [0, 1])
7
>>> aho_corasick.traverse_word(ac, [0, 1, 0, 1, 1, 0])
6
Contents
|
Add a word to the trie of ac |
|
Construct a |
|
Remove a word from the trie of ac. |
Overloaded function. |
Full API
This module contains various helper functions for the class AhoCorasick
.
These functions could be functions of AhoCorasick
but they only use
public member functions of AhoCorasick
, and so they are declared
as free functions instead.
- aho_corasick.add_word(ac: AhoCorasick, w: list[int] | str) int
Add a word to the trie of ac
Calling this function immediately adds the word w to the trie of ac, and makes the final node on the path labelled by this word terminal (if it wasn’t already). After adding a word, existing suffix links become invalid. If an identical word has already been added to the trie of ac, then this function does nothing.
- Parameters:
ac (AhoCorasick) – object whose trie is to be added to
- Returns:
The index corresponding to the final node added to the trie of ac. This node will have a
signature
equal to that of w.- Return type:
- Complexity:
Linear in the length of w.
See also
- aho_corasick.dot(ac: AhoCorasick) Dot
Construct a
Dot
object representing the trie of ac.- Parameters:
ac (AhoCorasick) – the
AhoCorasick
object whose trie we are trying to visualise.- Returns:
A
Dot
object representing ac.- Return type:
- aho_corasick.rm_word(ac: AhoCorasick, w: list[int] | str) int
Remove a word from the trie of ac.
From the trie of ac, remove each node of the given word w that is not part of the prefix of a different word.
If the word w corresponds to a terminal node with no children, then calling this function removes the nodes \(n_i\) from the trie of ac that correspond to the largest suffix w, such that each \(n_i\) has either zero children or one. After this, existing suffix links become invalid.
If w corresponds to a terminal node \(n\) with children, then calling this function makes :math`n` not terminal.
If w does not correspond to a terminal node, then calling this function does nothing.
- Parameters:
ac (AhoCorasick) – the trie.
- Returns:
The index corresponding to the node with signature equal to w.
- Return type:
- Complexity:
Linear in the length of w.
See also
- aho_corasick.traverse_word(*args, **kwargs)
Overloaded function.
- aho_corasick.traverse_word(ac: AhoCorasick, start: int, w: list[int] | str) int
Traverse the trie of ac using suffix links where necessary.
This function traverses the trie of ac, starting at the node with index start, and traversing using the letters in the word w.
- aho_corasick.traverse_word(ac: AhoCorasick, w: list[int] | str) int
Traverse the trie of ac from the root using suffix links where necessary.
This function traverses the trie of ac, starting from the root, and traversing using the letters in the word w.
- Parameters:
ac (AhoCorasick) – the trie to traverse.
- Returns:
The index of the node reached by traversing.
- Return type:
Note
This value returned by this function is the same as the value returned by
traverse_word(ac, AhoCorasick.root, w)
.