Character coding for bitwise tree. More...
#include <PrefixNodeBase.h>
Inherited by PrefixGraph::Node.
Public Member Functions | |
void | addLetter (Head encodedLetter) |
Add character to the node. | |
int | countLetters () |
Count all characters in the node. | |
int | countLetters (Head encodedLetter) |
Count characters before encodedLetter in the sufix array. | |
bool | hasLetter (Head encodedLetter) |
Check if character is in the node. | |
Static Public Member Functions | |
static Head | encode (unsigned char letter, Head *codes) |
Convert a character to a coresponding code. | |
static Head | autocode (unsigned char letter, Head *codes) |
Automatically assign new code for letter . | |
static void | resetCodes (Head *codes) |
Clear codes array and add default end of word markers. | |
Static Public Attributes | |
static const Head | endBitMask = (Head(1))<<(8*sizeof(Head)-1) |
the last bit in the head is the end of word marker | |
Protected Attributes | |
Head | head |
All characters of the node are coded in the head of this type. |
Character coding for bitwise tree.
This class describes character coding for bitwise tree node. All characters of the node are coded in the head. Pointers on sufix are in the sufix array which should be desrcribed in derivated class. Position of a particullar character's sufix in this array is calculated from head by a pop count as a hash function.
Optimized for processors with POPCNT isntruction (it is in the SSE4a instruction set). Probably you need to use -mabm flag while compiling with gcc for this processors. In other cases it'll use something like popcountdi2 function which also should be very fast (maybe you should use -O2 flag to enable lookup based pop count).
Head | is a type of variable containing encoded characters. Use uint32_t for alphabets with less than 32 characters like the english alphabet or if you want to encode only numbers. This reduces memory consumption. Otherwise use uint64_t. |
void PrefixNodeBase< Head >::addLetter | ( | Head | encodedLetter | ) | [inline] |
Add character to the node.
This declare existance of a character in the node.
static Head PrefixNodeBase< Head >::autocode | ( | unsigned char | letter, | |
Head * | codes | |||
) | [inline, static] |
Automatically assign new code for letter
.
Do nothing if letter
already coded.
[in,out] | codes | array with codes, the last value array [256] is the number of already ocupied codes |
codes
. int PrefixNodeBase< Head >::countLetters | ( | ) | [inline] |
Count all characters in the node.
This gives the length of the sufix array.
static void PrefixNodeBase< Head >::resetCodes | ( | Head * | codes | ) | [inline, static] |
Clear codes array and add default end of word markers.
[out] | codes | array of 257 elements should be allocated first |
Head PrefixNodeBase< Head >::head [protected] |
All characters of the node are coded in the head of this type.
There is 35 letters in polish alphabet. Each letter is coded on one bit. One extra bit is reserved as the end of word marker. Other alphabets can be more complex or you may want to add digits or other characters. Let's use 64 bit variable for the head.
If you fit your needs to 32 bits consider changing type Head to uin32_t for reduce memory consumptions. You should change also __builtin_popcountl on __builtin_popcount in this case. all characters of the node are coded in it