PrefixNodeBase< Head > Class Template Reference

Character coding for bitwise tree. More...

#include <PrefixNodeBase.h>

Inherited by PrefixGraph::Node.

List of all members.

Public Member Functions

void addLetter (Head encodedLetter)
 Add character to the node.
int countLetters ()
 Count all characters in the node.
int countLetters (Head encodedLetter)
 Count characters before encodedLetter in the sufix array.
bool hasLetter (Head encodedLetter)
 Check if character is in the node.

Static Public Member Functions

static Head encode (unsigned char letter, Head *codes)
 Convert a character to a coresponding code.
static Head autocode (unsigned char letter, Head *codes)
 Automatically assign new code for letter.
static void resetCodes (Head *codes)
 Clear codes array and add default end of word markers.

Static Public Attributes

static const Head endBitMask = (Head(1))<<(8*sizeof(Head)-1)
 the last bit in the head is the end of word marker

Protected Attributes

Head head
 All characters of the node are coded in the head of this type.

Detailed Description

template<class Head>
class PrefixNodeBase< Head >

Character coding for bitwise tree.

This class describes character coding for bitwise tree node. All characters of the node are coded in the head. Pointers on sufix are in the sufix array which should be desrcribed in derivated class. Position of a particullar character's sufix in this array is calculated from head by a pop count as a hash function.

Optimized for processors with POPCNT isntruction (it is in the SSE4a instruction set). Probably you need to use -mabm flag while compiling with gcc for this processors. In other cases it'll use something like popcountdi2 function which also should be very fast (maybe you should use -O2 flag to enable lookup based pop count).

Todo:
sort codes by frequency then variable-length coding
Template Parameters:
Head is a type of variable containing encoded characters. Use uint32_t for alphabets with less than 32 characters like the english alphabet or if you want to encode only numbers. This reduces memory consumption. Otherwise use uint64_t.

Member Function Documentation

template<class Head >
void PrefixNodeBase< Head >::addLetter ( Head  encodedLetter  )  [inline]

Add character to the node.

This declare existance of a character in the node.

Warning:
You should also add the character to the sufix array of the node explicitly. Otherwise it can cause errors.
template<class Head >
static Head PrefixNodeBase< Head >::autocode ( unsigned char  letter,
Head *  codes 
) [inline, static]

Automatically assign new code for letter.

Do nothing if letter already coded.

Parameters:
[in,out] codes array with codes, the last value array[256] is the number of already ocupied codes
Warning:
Use PrefixNodeBase::resetCodes before the first use of codes.
template<class Head >
int PrefixNodeBase< Head >::countLetters (  )  [inline]

Count all characters in the node.

This gives the length of the sufix array.

template<class Head >
static void PrefixNodeBase< Head >::resetCodes ( Head *  codes  )  [inline, static]

Clear codes array and add default end of word markers.

Parameters:
[out] codes array of 257 elements should be allocated first

Member Data Documentation

template<class Head >
Head PrefixNodeBase< Head >::head [protected]

All characters of the node are coded in the head of this type.

There is 35 letters in polish alphabet. Each letter is coded on one bit. One extra bit is reserved as the end of word marker. Other alphabets can be more complex or you may want to add digits or other characters. Let's use 64 bit variable for the head.

If you fit your needs to 32 bits consider changing type Head to uin32_t for reduce memory consumptions. You should change also __builtin_popcountl on __builtin_popcount in this case. all characters of the node are coded in it


The documentation for this class was generated from the following file:
Generated on Mon Oct 24 19:37:38 2011 for PrefixTree by  doxygen 1.6.3