HPCombi
High Performance Combinatorics in C++ using vector instructions v1.0.3
|
Namespaces | |
namespace | detail |
namespace | power_helper |
Classes | |
class | BMat16 |
Class for fast boolean matrices of dimension up to 16 x 16. More... | |
class | BMat8 |
Boolean matrices of dimension up to 8×8, stored as a single uint64; isomorph to binary relations with methods for composition. More... | |
struct | Perm16 |
Permutations of \(\{0\dots 15\}\): A permutation is a bijective mapping of a set of n elements onto itself. More... | |
struct | PermGeneric |
Vanilla (ie NOT optimized) implementation of a permutation, used to check for test correctness and as baseline to measure speedup. More... | |
struct | PPerm16 |
Partial permutation of \(\{0\dots 15\}\); see also HPCombi::Perm16; partial means it might not be defined everywhere (but where it's defined, it's injective). More... | |
struct | PTransf16 |
Partial transformation of \(\{0\dots 15\}\); see HPCombi::Transf16; partial means it might not be defined everywhere. More... | |
struct | TPUBuild |
Given a transformation from 0..15 → 0..15, build at compile-time the array representing the transformation. More... | |
struct | Transf16 |
Full transformation of \(\{0\dots 15\}\): a transformation is a mapping of a set of n elements into itself; ie as opposed to a permutation, it is not necessarily injective. More... | |
struct | Vect16 |
Vector of 16 bytes, with some optimized methods, superclass of HPCombi::Transf16. More... | |
struct | VectGeneric |
VectGeneric is to Vect16 what PermGeneric is to Perm16; see PermGeneric. More... | |
Typedefs | |
using | xpu16 = uint16_t __attribute__((vector_size(32))) |
using | xpu64 = uint64_t __attribute__((vector_size(32))) |
using | epu64 = uint64_t __attribute__((__vector_size__(16), __may_alias__)) |
using | epu8 = uint8_t __attribute__((vector_size(16))) |
epu8 stands for Extended Packed Unsigned, grouped by 8 bits; this is the low level type chosen by Intel for their API to intrinsics, ie a SIMD vector of 16 unsigned bytes (16×8 = 128bits). | |
Functions | |
xpu64 | to_line (xpu64 vect) |
Converting storage type from blocks to rows of a xpu64 representing a 16x16 matrix (used in BMat16). | |
xpu64 | to_block (xpu64 vect) |
Converting storage type from rows to blocks of a xpu64 representing a 16x16 matrix (used in BMat16). | |
template<class TPU> | |
TPUBuild< TPU >::array & | as_array (TPU &v) noexcept |
Cast a TPU to a c++ std::array . | |
template<class TPU> | |
const TPUBuild< TPU >::array & | as_array (const TPU &v) noexcept |
Cast a constant TPU to a constant c++ std::array . | |
template<class TPU> | |
VectGeneric< TPUBuild< TPU >::size > & | as_VectGeneric (TPU &v) |
Cast a HPCombi::epu8 to a c++ HPCombi::VectGeneric. | |
template<class TPU> | |
const VectGeneric< TPUBuild< TPU >::size > & | as_VectGeneric (const TPU &v) |
Cast a HPCombi::epu8 to a c++ HPCombi::VectGeneric. | |
constexpr uint8_t | operator""_u8 (unsigned long long arg) noexcept |
Unsigned 8 bits int constant. | |
bool | is_all_zero (epu8 a) noexcept |
Test whether all the entries of a HPCombi::epu8 are zero. | |
bool | is_all_one (epu8 a) noexcept |
Test whether all the entries of a HPCombi::epu8 are one. | |
bool | equal (epu8 a, epu8 b) noexcept |
Equality of HPCombi::epu8. | |
bool | not_equal (epu8 a, epu8 b) noexcept |
Non equality of HPCombi::epu8. | |
epu8 | permuted_ref (epu8 a, epu8 b) noexcept |
Apply a permutation b on the vector a: for i=0..16 {result[i] = a[b[i]}. | |
epu8 | permuted (epu8 a, epu8 b) noexcept |
Same as permuted_ref but with an optimized implementation using intrinsics. | |
epu8 | shifted_right (epu8 a) noexcept |
Left shifted of a HPCombi::epu8 inserting a 0. | |
epu8 | shifted_left (epu8 a) noexcept |
Right shifted of a HPCombi::epu8 inserting a 0. | |
epu8 | reverted (epu8 a) noexcept |
Reverting a HPCombi::epu8. | |
epu8 | min (epu8 a, epu8 b) noexcept |
Vector min between two HPCombi::epu8 0. | |
epu8 | max (epu8 a, epu8 b) noexcept |
Vector max between two HPCombi::epu8 0. | |
bool | is_sorted (epu8 a) noexcept |
Testing if a HPCombi::epu8 is sorted. | |
epu8 | sorted (epu8 a) noexcept |
Return a sorted HPCombi::epu8. | |
epu8 | sorted8 (epu8 a) noexcept |
Return a HPCombi::epu8 with both halves sorted. | |
epu8 | revsorted (epu8 a) noexcept |
Return a reverse sorted HPCombi::epu8. | |
epu8 | revsorted8 (epu8 a) noexcept |
Return a HPCombi::epu8 with both halves reverse sorted. | |
epu8 | sort_perm (epu8 &a) noexcept |
Sort this and return the sorting permutation. | |
epu8 | sort8_perm (epu8 &a) noexcept |
Sort this and return the sorting permutation. | |
void | merge (epu8 &a, epu8 &b) noexcept |
Merge two sorted epu8. | |
epu8 | permutation_of_ref (epu8 a, epu8 b) noexcept |
Same interface as permutation_of but with a different implementation. | |
epu8 | permutation_of (epu8 a, epu8 b) noexcept |
Find if a vector is a permutation of another one. | |
epu8 | random_epu8 (uint16_t bnd) |
A random HPCombi::epu8. | |
epu8 | remove_dups (epu8 a, uint8_t repl=0) noexcept |
Remove duplicates in a sorted HPCombi::epu8. | |
uint8_t | horiz_sum_ref (epu8) noexcept |
Same interface as horiz_sum but with a different implementation. | |
uint8_t | horiz_sum_gen (epu8) noexcept |
Same interface as horiz_sum but with a different implementation. | |
uint8_t | horiz_sum4 (epu8) noexcept |
Same interface as horiz_sum but with a different implementation. | |
uint8_t | horiz_sum3 (epu8) noexcept |
Same interface as horiz_sum but with a different implementation. | |
uint8_t | horiz_sum (epu8 v) noexcept |
Horizontal sum of a HPCombi::epu8. | |
epu8 | partial_sums_ref (epu8) noexcept |
Same interface as partial_sums but with a different implementation. | |
epu8 | partial_sums_gen (epu8) noexcept |
Same interface as partial_sums but with a different implementation. | |
epu8 | partial_sums_round (epu8) noexcept |
Same interface as partial_sums but with a different implementation. | |
epu8 | partial_sums (epu8 v) noexcept |
Horizontal partial sum of a HPCombi::epu8. | |
uint8_t | horiz_max_ref (epu8) noexcept |
Same interface as horiz_max but with a different implementation. | |
uint8_t | horiz_max_gen (epu8) noexcept |
Same interface as horiz_max but with a different implementation. | |
uint8_t | horiz_max4 (epu8) noexcept |
Same interface as horiz_max but with a different implementation. | |
uint8_t | horiz_max3 (epu8) noexcept |
Same interface as horiz_max but with a different implementation. | |
uint8_t | horiz_max (epu8 v) noexcept |
Horizontal sum of a HPCombi::epu8. | |
epu8 | partial_max_ref (epu8) noexcept |
Same interface as partial_max but with a different implementation. | |
epu8 | partial_max_gen (epu8) noexcept |
Same interface as partial_max but with a different implementation. | |
epu8 | partial_max_round (epu8) noexcept |
Same interface as partial_max but with a different implementation. | |
epu8 | partial_max (epu8 v) noexcept |
Horizontal partial sum of a HPCombi::epu8. | |
uint8_t | horiz_min_ref (epu8) noexcept |
Same interface as horiz_min but with a different implementation. | |
uint8_t | horiz_min_gen (epu8) noexcept |
Same interface as horiz_min but with a different implementation. | |
uint8_t | horiz_min4 (epu8) noexcept |
Same interface as horiz_min but with a different implementation. | |
uint8_t | horiz_min3 (epu8) noexcept |
Same interface as horiz_min but with a different implementation. | |
uint8_t | horiz_min (epu8 v) noexcept |
Horizontal sum of a HPCombi::epu8. | |
epu8 | partial_min_ref (epu8) noexcept |
Same interface as partial_min but with a different implementation. | |
epu8 | partial_min_gen (epu8) noexcept |
Same interface as partial_min but with a different implementation. | |
epu8 | partial_min_round (epu8) noexcept |
Same interface as partial_min but with a different implementation. | |
epu8 | partial_min (epu8 v) noexcept |
Horizontal partial sum of a HPCombi::epu8. | |
epu8 | eval16_ref (epu8 v) noexcept |
Same interface as eval16 but with a different implementation. | |
epu8 | eval16_arr (epu8 v) noexcept |
Same interface as eval16 but with a different implementation. | |
epu8 | eval16_cycle (epu8 v) noexcept |
Same interface as eval16 but with a different implementation. | |
epu8 | eval16_popcount (epu8 v) noexcept |
Same interface as eval16 but with a different implementation. | |
epu8 | eval16 (epu8 v) noexcept |
Evaluation of a HPCombi::epu8: count how many times each int of 0..15 appears in the input. | |
uint64_t | first_diff_ref (epu8 a, epu8 b, size_t bound=16) noexcept |
Same interface as first_diff but with a different implementation. | |
uint64_t | first_diff_mask (epu8 a, epu8 b, size_t bound=16) noexcept |
Same interface as first_diff but with a different implementation. | |
uint64_t | first_diff (epu8 a, epu8 b, size_t bound=16) noexcept |
The first difference between two HPCombi::epu8. | |
uint64_t | last_diff_ref (epu8 a, epu8 b, size_t bound=16) noexcept |
Same interface as last_diff but with a different implementation. | |
uint64_t | last_diff_mask (epu8 a, epu8 b, size_t bound=16) noexcept |
Same interface as last_diff but with a different implementation. | |
uint64_t | last_diff (epu8 a, epu8 b, size_t bound=16) noexcept |
The last difference between two HPCombi::epu8. | |
bool | less (epu8 a, epu8 b) noexcept |
Lexicographic comparison between two HPCombi::epu8. | |
int8_t | less_partial (epu8 a, epu8 b, int k) noexcept |
Partial lexicographic comparison between two HPCombi::epu8. | |
uint64_t | first_zero (epu8 v, int bnd) noexcept |
return the index of the first zero entry or 16 if there are none Only index smaller than bound are taken into account. | |
uint64_t | last_zero (epu8 v, int bnd) noexcept |
return the index of the last zero entry or 16 if there are none Only index smaller than bound are taken into account. | |
uint64_t | first_non_zero (epu8 v, int bnd) noexcept |
return the index of the first non zero entry or 16 if there are none Only index smaller than bound are taken into account. | |
uint64_t | last_non_zero (epu8 v, int bnd) noexcept |
return the index of the last non zero entry or 16 if there are none Only index smaller than bound are taken into account. | |
epu8 | popcount16 (epu8 v) noexcept |
a vector popcount function | |
bool | is_partial_transformation (epu8 v, const size_t k=16) noexcept |
Test for partial transformation. | |
bool | is_transformation (epu8 v, const size_t k=16) noexcept |
Test for transformation. | |
bool | is_partial_permutation (epu8 v, const size_t k=16) noexcept |
Test for partial permutations. | |
bool | is_permutation_sort (epu8 v, const size_t k=16) noexcept |
Same interface as is_permutation but with a different implementation. | |
bool | is_permutation_eval (epu8 v, const size_t k=16) noexcept |
Same interface as is_permutation but with a different implementation. | |
bool | is_permutation (epu8 v, const size_t k=16) noexcept |
uint64_t | first_mask (epu8 msk, size_t bound) |
uint64_t | last_mask (epu8 msk, size_t bound) |
template<bool Increasing = true, size_t sz> | |
epu8 | network_sort (epu8 res, std::array< epu8, sz > rounds) |
Apply a sorting network. | |
template<bool Increasing = true, size_t sz> | |
epu8 | network_sort_perm (epu8 &v, std::array< epu8, sz > rounds) |
Apply a sorting network in place and return the permutation. | |
void | merge_rev (epu8 &a, epu8 &b) noexcept |
epu8 | eval16_gen (epu8 v) noexcept |
template<typename T, typename M = power_helper::Monoid<T>> | |
const T | square (const T x) |
A generic compile time squaring function. | |
template<unsigned exp, typename T, typename M = power_helper::Monoid<T>> | |
const T | pow (const T x) |
A generic compile time exponentiation function. | |
template<size_t Size, typename Expo = uint8_t> | |
std::array< Expo, Size > | sorted_vect (std::array< Expo, Size > v) |
Variables | |
constexpr std::array< epu8, 4 > | masks |
constexpr TPUBuild< epu8 > | Epu8 {} |
Factory object acting as a class constructor for type HPCombi::epu8. | |
constexpr uint64_t | prime = 0x9e3779b97f4a7bb9 |
A prime number good for hashing. | |
constexpr std::array< epu8, 9 > | sorting_rounds |
A 16-way sorting network. | |
constexpr std::array< epu8, 6 > | sorting_rounds8 |
A duplicated 8-way sorting network. | |
constexpr std::array< epu8, 6 > | merge_rounds |
constexpr std::array< epu8, 3 > | inverting_rounds |
constexpr std::array< epu8, 4 > | summing_rounds |
Permutation Round for partial and horizontal sums. | |
constexpr std::array< epu8, 4 > | mining_rounds |
using HPCombi::epu64 = uint64_t __attribute__((__vector_size__(16), __may_alias__)) |
using HPCombi::epu8 = uint8_t __attribute__((vector_size(16))) |
epu8 stands for Extended Packed Unsigned, grouped by 8 bits; this is the low level type chosen by Intel for their API to intrinsics, ie a SIMD vector of 16 unsigned bytes (16×8 = 128bits).
Functions using this type use semantically equivalent types, eg a _m128 which is a vector containing 2 signed 64 bits integers. A flag tells the compiler to silently consider those types equivalent.
using HPCombi::xpu16 = uint16_t __attribute__((vector_size(32))) |
using HPCombi::xpu64 = uint64_t __attribute__((vector_size(32))) |
|
inlinenoexcept |
Cast a constant TPU to a constant c++ std::array
.
This is usually faster for algorithm using a lot of indexed access.
|
inlinenoexcept |
Cast a TPU to a c++ std::array
.
This is usually faster for algorithm using a lot of indexed access.
|
inline |
Cast a HPCombi::epu8 to a c++ HPCombi::VectGeneric.
This is usually faster for algorithm using a lot of indexed access.
|
inline |
Cast a HPCombi::epu8 to a c++ HPCombi::VectGeneric.
This is usually faster for algorithm using a lot of indexed access.
Equality of HPCombi::epu8.
Evaluation of a HPCombi::epu8: count how many times each int of 0..15 appears in the input.
v | : a HPCombi::epu8 |
r
such that r
[i] is the number of occurrence of i
in the input v
{ 1, 1, 2, 1, 1, 3, 1, 0, 0, 0, 0, 1, 2, 1, 1, 1}
Same interface as eval16 but with a different implementation.
Same interface as eval16 but with a different implementation.
Same interface as eval16 but with a different implementation.
Same interface as eval16 but with a different implementation.
The first difference between two HPCombi::epu8.
a,b | : two HPCombi::epu8 |
bound | : a size_t |
a
[i] and b
[i] differ, 16 if there is no differences before bound. first_diff(a, b)
returns 3
, first_diff(a, b, 3)
returns 16
, first_diff(a, b, 4)
returns 3
, first_diff(a, b, 7)
returns 3
. bound
is assumed to be smaller or equal than 16 Same interface as first_diff but with a different implementation.
Same interface as first_diff but with a different implementation.
|
inline |
|
inlinenoexcept |
return the index of the first non zero entry or 16 if there are none Only index smaller than bound are taken into account.
|
inlinenoexcept |
return the index of the first zero entry or 16 if there are none Only index smaller than bound are taken into account.
|
inlinenoexcept |
Horizontal sum of a HPCombi::epu8.
12
|
inlinenoexcept |
Same interface as horiz_max but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_max but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_max but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_max but with a different implementation.
|
inlinenoexcept |
Horizontal sum of a HPCombi::epu8.
1
|
inlinenoexcept |
Same interface as horiz_min but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_min but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_min but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_min but with a different implementation.
|
inlinenoexcept |
Horizontal sum of a HPCombi::epu8.
110
uint8_t
|
inlinenoexcept |
Same interface as horiz_sum but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_sum but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_sum but with a different implementation.
|
inlinenoexcept |
Same interface as horiz_sum but with a different implementation.
|
inlinenoexcept |
Test whether all the entries of a HPCombi::epu8 are one.
|
inlinenoexcept |
Test whether all the entries of a HPCombi::epu8 are zero.
|
inlinenoexcept |
Test for partial permutations.
*this
is a partial permutation. v | the vector to test |
k | the size of *this (default 16) |
Points where the function is undefined are mapped to 0xff
. If *this
is a partial permutation of \(0\dots n-1\) for \(n<16\), it should be completed to a partial permutation of \(0\dots 15\) by adding fixed points. That is the values \(i\geq n\) should be mapped to themself.
|
inlinenoexcept |
Test for partial transformation.
v
is a partial transformation. v | the vector to test |
k | the size of *this (default 16) |
Points where the function is undefined are mapped to 0xff
. If *this
is a transformation of \(0\dots n-1\) for \(n<16\), it should be completed to a transformation of \(0\dots 15\) by adding fixed points. That is the values \(i\geq n\) should be mapped to themself.
|
inlinenoexcept |
*this
is a permutation. v | the vector to test |
k | the size of *this (default 16) |
If *this
is a permutation of \(0\dots n-1\) for \(n<16\), it should be completed to a permutation of \(0\dots 15\) by adding fixed points. That is the values \(i\geq n\) should be mapped to themselves.
|
inlinenoexcept |
Same interface as is_permutation but with a different implementation.
|
inlinenoexcept |
Same interface as is_permutation but with a different implementation.
|
inlinenoexcept |
Testing if a HPCombi::epu8 is sorted.
|
inlinenoexcept |
Test for transformation.
*this
is a transformation. v | the vector to test |
k | the size of *this (default 16) |
If *this
is a transformation of \(0\dots n-1\) for \(n<16\), it should be completed to a transformation of \(0\dots 15\) by adding fixed points. That is the values \(i\geq n\) should be mapped to themself.
The last difference between two HPCombi::epu8.
a,b | : HPCombi::epu8 |
bound | : a size_t |
a
[i] and b
[i] differ, 16 if there is no differences before bound. last_diff(a, b)
returns 11
, last_diff(a, b, 3)
returns 16
, last_diff(a, b, 4)
returns 3
, last_diff(a, b, 7)
returns 3
. bound
is assumed to be smaller or equal than 16 Same interface as last_diff but with a different implementation.
Same interface as last_diff but with a different implementation.
|
inline |
|
inlinenoexcept |
return the index of the last non zero entry or 16 if there are none Only index smaller than bound are taken into account.
|
inlinenoexcept |
return the index of the last zero entry or 16 if there are none Only index smaller than bound are taken into account.
Lexicographic comparison between two HPCombi::epu8.
Partial lexicographic comparison between two HPCombi::epu8.
a,b | : the vectors to compare |
k | : the bound for the lexicographic comparison |
Vector max between two HPCombi::epu8 0.
Merge two sorted epu8.
a,b | two HPCombi::epu8 after executing merge, a and b are sorted and a [15] <= b [0] |
Vector min between two HPCombi::epu8 0.
|
inline |
Apply a sorting network.
|
inline |
Apply a sorting network in place and return the permutation.
Non equality of HPCombi::epu8.
|
inlineconstexprnoexcept |
Unsigned 8 bits int constant.
Horizontal partial sum of a HPCombi::epu8.
{ 5, 5, 5, 5, 5, 6,12,12,12,12,12,12,12,13,14,15}
Same interface as partial_max but with a different implementation.
Same interface as partial_max but with a different implementation.
Same interface as partial_max but with a different implementation.
Horizontal partial sum of a HPCombi::epu8.
{ 5, 5, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0}
Same interface as partial_min but with a different implementation.
Same interface as partial_min but with a different implementation.
Same interface as partial_min but with a different implementation.
Horizontal partial sum of a HPCombi::epu8.
{ 5,10,12,17,18,24,36,40,40,43,45,56,68,81,95,110}
Same interface as partial_sums but with a different implementation.
Same interface as partial_sums but with a different implementation.
Same interface as partial_sums but with a different implementation.
Find if a vector is a permutation of another one.
a,b | two HPCombi::epu8 |
res
[i] is the position in a
of b
[i] if b
[i] appears exactly once in a
, or undefined if not. Same interface as permutation_of but with a different implementation.
Same as permuted_ref but with an optimized implementation using intrinsics.
Apply a permutation b
on the vector a:
for i=0..16 {result[i] = a[b[i]}.
Permuting a HPCombi::epu8.
const T HPCombi::pow | ( | const T | x | ) |
A generic compile time exponentiation function.
exp | the power |
x | the number to exponentiate |
Raise x to the exponent exp where exp is known at compile time. We use the classical recursive binary algorithm, but the recursion is unfolded and optimized at compile time giving an assembly code which is just a sequence of multiplication.
To use for a specific type the user should pass a Monoid structure (see below) as third parameter to the template. Alternatively a default monoid structure can be defined for a given type by specializing the template struct HPCombi::power_helper::Monoid
|
inline |
A random HPCombi::epu8.
bnd | : the upper bound for the value of the entries. bnd must verify \( 0 < bnd \leq 256 \). This is not checked. |
Remove duplicates in a sorted HPCombi::epu8.
a | supposed to be sorted |
repl | the value replacing the duplicate entries (default to 0) |
a
where repeated occurrences of entries are replaced by repl
Reverting a HPCombi::epu8.
Return a reverse sorted HPCombi::epu8.
Return a HPCombi::epu8 with both halves reverse sorted.
Right shifted of a HPCombi::epu8 inserting a 0.
Left shifted of a HPCombi::epu8 inserting a 0.
Sort this
and return the sorting permutation.
Sort this
and return the sorting permutation.
Return a sorted HPCombi::epu8.
Return a HPCombi::epu8 with both halves sorted.
std::array< Expo, Size > HPCombi::sorted_vect | ( | std::array< Expo, Size > | v | ) |
const T HPCombi::square | ( | const T | x | ) |
A generic compile time squaring function.
x | the number to square |
To use for a specific type the user should pass a monoid structure as second parameter to the template. Alternatively a default monoid structure can be defined for a given type by specializing the template struct HPCombi::power_helper::Monoid
Factory object acting as a class constructor for type HPCombi::epu8.
see HPCombi::TPUBuild for usage and capability
|
constexpr |
|
constexpr |
|
constexpr |
|
constexpr |
|
constexpr |
A prime number good for hashing.
|
constexpr |
A 16-way sorting network.
Sorting network from Knuth [AoCP3] Fig. 51 p 229. used by the sorted function
[AoCP3]: "D. Knuth, The art of computer programming vol. 3"
|
constexpr |
A duplicated 8-way sorting network.
Batcher odd-Even mergesort sorting network used by the sorted function
|
constexpr |
Permutation Round for partial and horizontal sums.