|
|
|
FNV quick index
- Portable FNV hashing of numeric values
- Extended FNV type hashing
(top)
FNV hash history
The basis of the FNV hash algorithm was taken from an idea sent
as reviewer comments to the IEEE POSIX P1003.2 committee by
Glenn Fowler and
Phong Vo.
In a subsequent ballot round:
Landon Curt Noll
improved on their algorithm.
Some people tried this hash and found that it worked rather well.
In an EMail message
to Landon, they named it the ``Fowler/Noll/Vo'' or FNV hash.
FNV hashes are designed to be fast while maintaining
a low collision rate.
The FNV speed allows one to quickly hash lots
of data while maintaining a reasonable collision rate.
The high dispersion of the FNV hashes makes them
well suited for hashing nearly identical strings such as
URLs, hostnames, filenames, text, IP addresses, etc.
The FNV hash is in wide spread use:
- calc
- Domain Name Servers
- mdbm key/value data lookup functions
- Database indexing hashes
- major web search / indexing engines
- high performance EMail servers
- Netnews history file Message-ID lookup functions
- Anti-spam filters
- NFS implementations (e.g.,
FreeBSD 4.3,
IRIX, Linux (NFS v4))
- Cohesia MASS project
server collision avoidance
- spellchecker programmed in Ada 95
- flatassembler's
open source x86 assembler -
user-defined
symbol hashtree
- PowerBASIC inline assembly routine
- text based referenced resources for video games on the PS2, Gamecube
and XBOX
- non-cryptographic file fingerprints
- FRET - a tool
to identify file data structures / helps to understand file formats
- used to in the process of computing Unique IDs in DASM (DTN Applications for Symbian Mobile-phones)
- Used by Microsoft in their hash_map implementation for VC++ 2005
- Used in an implementation of
libketama
for use in items such as
memcache.
- Used in the realpath cache in
PHP 5.x (php-5.2.3/TSRM/tsrm_virtual_cwd.c).
FNV hash algorithms and
source code
have been released into the public domain.
The authors of the FNV algorithmm look deliberate steps
to disclose the algorhtm in a public forum soon after
it was invented.
More than a year passed after this public disclosure and the
authors deliberatly took no steps to patent the FNV algorithm.
Therefore it is safe to say that the FNV authors have
no patent claims on the FNV algorithm as published.
If you use an FNV function in an application, tells us about
it by sending EMail to:

We will be happy to add your application to the list.
Comments are welcome.
(top)
The core of the FNV hash
The core of the FNV-1 hash algorithm is as follows:
hash = offset_basis
for each octet_of_data to be hashed
hash = hash * FNV_prime
hash = hash xor octet_of_data
return hash
The offset_basis and
FNV_prime can be found in the
parameters of the FNV-1 hash section below.
(top)
FNV-1a alternate algorithm
There is a minor variation of the FNV hash algorithm known as
FNV-1a:
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash
The only difference between the FNV-1a hash and the FNV-1 hash
is the order of the xor and multiply.
The FNV-1a hash
uses the same FNV_prime and offset_basis
as the FNV-1 hash of the same n-bit size.
Some people use FNV-1a instead of FNV-1
because they see
slightly better dispersion for tiny (<4 octets) chunks of memory.
Either FNV-1 or FNV-1a make a fine hash.
(Try it with with
just a dash of Sage and ground Cloves :-))
(top)
Parameters of the FNV-1 hash
The FNV-1 hash parameters are as follows:
- hash is an n bit unsigned integer,
where n is the bit length of hash.
- The multiplication is performed modulo 2n
where n is the bit length of hash.
- The xor is performed on the low order
octet (8 bits) of hash.
- The FNV_prime is dependent on n, the size of the hash:
32 bit FNV_prime = 16777619
64 bit FNV_prime = 1099511628211
128 bit FNV_prime = 309485009821345068724781401
256 bit FNV_prime = 374144419156711147060143317175368453031918731002211
512 bit FNV_prime =
35835915874844867368919076489095108449946327955754392558399825615420669938882575
126094039892345713852759
1024 bit FNV_prime =
50164565101131186554345988110352789550307653454047907443030175238311120551081474
51509157692220295382716162651878526895249385292291816524375083746691371804094271
873160484737966720260389217684476157468082573
Part of the magic of FNV is the selection of the FNV_prime
for a given sized unsigned integer.
Some primes do hash better than other primes for a given integer size.
The theory behind which primes make good FNV_prime's
is beyond the scope of this web page.
- The offset_basis for FNV-1 is dependent on n, the size of the hash:
32 bit offset_basis = 2166136261
64 bit offset_basis = 14695981039346656037
128 bit offset_basis = 275519064689413815358837431229664493455
256 bit offset_basis =
100029257958052580907070968620625704837092796014241193945225284501741471925557
512 bit offset_basis =
96593031294966694980094354007163104660904187456726378961083743294344626579945829
32197716438449813051892206539805784495328239340083876191928701583869517785
1024 bit offset_basis =
14197795064947621068722070641403218320880622795441933960878474914617582723252296
73230371772215086409652120235554936562817466910857181476047101507614802975596980
40773201576924585630032153049571501574036444603635505054127112859663616102678680
82893823963790439336411086884584107735010676915
These non-zero integers are the FNV-0 hashes of
the following 32 octets:
chongo <Landon Curt Noll> /\../\
The \'s in the above string are not C-style escape characters.
In C-string notation, these 32 octets are:
"chongo <Landon Curt Noll> /\\../\\"
The following
calc
script was used to compute the offset_basis
for FNV-1 hashes:
hash_bits = insert_the_hash_size_in_bits_here;
FNV_prime = insert_the_FNV_prime_here;
offset_basis = 0;
offset_str = "chongo <Landon Curt Noll> /\\../\\";
hash_mod = 2^hash_bits;
str_len = strlen(offset_str);
for (i=1; i <= str_len; ++i) {
offset_basis = (offset_basis * FNV_prime) % hash_mod;
offset_basis = xor(offset_basis, ord(substr(offset_str,i,1)));
}
print hash_bits, "bit offset_basis =", offset_basis;
NOTE: The above code fragment example is written in the
calc
language, not in C.
FNV-0 Historic note:
The FNV-0 is the historic FNV algorithm that
is now deprecated.
It has an
offset_basis of 0.
Unless the FNV-0 hash is required for historical purposes,
the FNV-1 or FNV-1a
should be used in place of the FNV-0 hash.
Use FNV-1 and FNV-1a hashes, with their non-zero
offset_basis instead.
The FNV-0 hashes all buffers that contain only 0 octets to a
hash value of 0.
The FNV-1 and FNV-1a hash do not suffer from this minor problem.
(top)
Changing the FNV hash size - xor-folding
If you need an x-bit hash where x is not a power of 2, then
we recommend that you compute the FNV hash that is just larger
than x-bits and xor-fold the result down to x-bits.
By xor-folding we mean shift the excess high order bits down
and xor them with the lower x-bits.
For example to produce a 24 bit FNV-1 hash in C
we xor-fold fold a 32 bit FNV-1 hash:
#define MASK_24 (((u_int32_t)1<<24)-1) /* i.e., (u_int32_t)0xffffff */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (hash>>24) ^ (hash & MASK_24);
To produce a 16 bit FNV-1 hash in C
we xor-fold fold a 32 bit FNV-1 hash:
#define MASK_16 (((u_int32_t)1<<16)-1) /* i.e., (u_int32_t)0xffff */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (hash>>16) ^ (hash & MASK_16);
To produce a 56 bit FNV-1 hash in C (on a machine
with 64 bit unsigned values)
we xor-fold fold a 64 bit FNV-1 hash:
#define MASK_56 (((u_int64_t)1<<56)-1) /* i.e., (u_int64_t)0xffffffffffffff */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
u_int64_t hash;
void *data;
size_t data_len;
hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
hash = (hash>>56) ^ (hash & MASK_56);
The above process assumes that you are using an FNV hash
that at most twice as large as the x-bits that you need.
For x < 16, there is no 16 bit (or less) FNV-1
hash to use.
For tiny x < 16 bit values, we recommend using a 32 bit FNV-1
hash as follows:
/* NOTE: for 0 < x < 16 ONLY!!! */
#define TINY_MASK(x) (((u_int32_t)1<<(x))-1)
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (((hash>>x) ^ hash) & TINY_MASK(x));
At the expense of CPU performance, one may use a larger
FNV-1 hash that is normally required in any of the above
xor-folding examples.
This will not produce the same standard output,
but it may provide slightly better dispersion at the expense of more CPU time.
All that is needed is to use a larger FNV-1 hash
and to move the mask outside of the expression
on the final statement.
For example, to produce a 24 bit FNV-1 hash could have used
a 64 bit FNV-1 hash, in a non-standard way, as follows:
/* NOTE: non-standard use of a larger hash */
#define MASK_24 (((u_int64_t)1<<24)-1) /* i.e., (u_int64_t)0xffffff */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
u_int64_t hash;
void *data;
size_t data_len;
hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
hash = (((hash>>24) ^ hash) & MASK_24);
Replacing a 32 bit FNV-1 hash with a 64 bit FNV-1 hash
during xor-folding might yield better dispersion at the expense
of CPU time.
However using an even larger FNV-1 hash is
almost certainly a waste of CPU time.
If you are going to use this non-standard xor-folding
method, we recommend that you only do it for x < 32 bits, and
only replace the 32 bit FNV-1 hash with a 64 bit FNV-1 hash.
NOTE: One may substitute the FNV-1a hash for the
FNV-1 hash in any of the xor-folding examples.
Some people believe that FNV-1a xor-folding gives then
slightly better dispersion without any impact on CPU performance.
See the FNV-1a hash description for more information.
If you really need an x-bit hash
for x > 1024 bits, send us EMail.
(top)
Changing the FNV hash size - non-powers of 2
The FNV hash is designed for hash sizes that are a power of 2.
If you need a hash size that is not a power of two, then you have
two choices.
One method id called the
lazy mod mapping method
and the other is called the
retry method.
Both involve mapping a range that is a power of 2 onto an
arbitrary range.
- Lazy mod mapping method:
The lazy mod mapping method
uses a simple mod on an n-bit hash
to yield an arbitrary range.
To produce a hash range between 0 and X
use a n-bit FNV hash where n is smallest
FNV hash that will produce values larger than X
without the need for
xor-folding.
For example, to produce a value between 0 and 2142779559
using the
lazy mod mapping method, we select a 32-bit FNV
hash because:
232 > 2142779559
We compute the 32-bit FNV hash value and then perform a final mod:
#define TRUE_HASH_SIZE ((u_int32_t)2142779560) /* range top plus 1 */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash %= TRUE_HASH_SIZE;
An advantage of the lazy mod mapping method
is that it requires only 1 more operation:
only an additional mod is performed at the end.
The disadvantage of the lazy mod mapping method
is that there is
a bias against the larger values.
To understand this bias consider the a need
to produce a value between 0 and 999999.
We will compute a 32-bit FNV hash value because:
232 > 999999
We compute the 32-bit FNV hash value using the
and then perform the final mod:
#define TRUE_HASH_SIZE ((u_int32_t)1000000) /* range top plus 1 */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash %= TRUE_HASH_SIZE;
The bias introduced by the final mod is slight.
The values 0 through 967295 will be created by
4295 different 32-bit FNV hash values whereas
the values 967296 through 999999 will be created by
only 4294 different 32-bit FNV hash values.
In other words, the values 0 through 967295 will
occur ~1.0002328 times as often as the values 967296
through 999999.
The bias can be larger when the range is nearly as large as the
range of values produced by the FNV hash.
Consider using the lazy mod mapping method
to produce
values between 0 and 9999999999999999999.
We use a 64-bit FNV hash because:
264 > 9999999999999999999
We compute the 64-bit FNV hash value using the
and then perform the final mod:
#define TRUE_HASH_SIZE ((u_int64_t)10000000000000000000) /* range top plus 1 */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
u_int64_t hash;
void *data;
size_t data_len;
hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
hash %= TRUE_HASH_SIZE;
Here the bias introduced by the final mod is more noticeable.
The values 0 through 9999999999999999999 will be created by
2 different 64-bit FNV hash values whereas
the values 10000000000000000000 through 18446744073709551615 will be created by
only 1 64-bit FNV hash value.
NOTE:
This bias issue may not be of concern to you, but we thought we should
point out this issue just in case you care.
Many applications should / will not care about this bias.
Most applications can use the lazy mod mapping method without
any problems.
Your application, may vary however.
NOTE: One may substitute the FNV-1a hash for the
FNV-1 hash in any of the lazy mod mapping method examples.
Some people believe that FNV-1a lazy mod mapping method gives then
slightly better dispersion without any impact on CPU performance.
See the FNV-1a hash description for more information.
- Retry method:
The retry method also performs a final mod in order
to produce a hash range between 0 and X.
Unlike lazy mod mapping method,
the retry method avoids the bias by
additional computation.
To produce a hash range between 0 and X
use a n-bit FNV hash where n is smallest
FNV hash that will produce values larger than X
without the need for
xor-folding.
For example, to produce a value between 0 and 49999
using the
retry method, we select a 32-bit FNV
hash because:
232 > 49999
Before the final mod 50000 is performed, we check to
see if the 32-bit FNV hash value is one of the upper
biased values.
If it is, we perform additional loop cycles until is below
the bias level.
For example:
#define TRUE_HASH_SIZE ((u_int32_t)50000) /* range top plus 1 */
#define FNV_32_PRIME ((u_int32_t)16777619)
#define FNV1_32_INIT ((u_int32_t)2166136261)
#define MAX_32BIT ((u_int32_t)0xffffffff) /* largest 32 bit unsigned value */
#define RETRY_LEVEL ((MAX_32BIT / TRUE_HASH_SIZE) * TRUE_HASH_SIZE)
u_int32_t hash;
void *data;
size_t data_len;
hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
while (hash >= RETRY_LEVEL) {
hash = (hash * FNV_32_PRIME) + FNV1_32_INIT;
}
hash %= TRUE_HASH_SIZE;
The disadvantage of the retry method
is that it sometimes requires additional calculations.
An advantage of the retry method
it avoids slightly biased values.
For another example, we will
produce a value between 0 and 999999999999
using the
retry method, we select a 64-bit FNV
hash because:
264 > 999999999999
Before the final mod 1000000000000 is performed, we check to
see if the 64-bit FNV hash value is one of the upper
biased value.
If it is, we perform additional loop cycles until it is not.
#define TRUE_HASH_SIZE ((u_int64_t)1000000000000) /* range top plus 1 */
#define FNV_64_PRIME ((u_int64_t)1099511628211)
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
#define MAX_64BIT ((u_int64_t)0xffffffffffffffff) /* largest 64 bit unsigned value */
#define RETRY_LEVEL ((MAX_64BIT / TRUE_HASH_SIZE) * TRUE_HASH_SIZE)
u_int64_t hash;
void *data;
size_t data_len;
hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
while (hash >= RETRY_LEVEL) {
hash = (hash * FNV_64_PRIME) + FNV1_64_INIT;
}
hash %= TRUE_HASH_SIZE;
NOTE: One may substitute the FNV-1a hash for the
FNV-1 hash in any of the retry method examples.
Some people believe that FNV-1a retry method gives then
slightly better dispersion without any impact on CPU performance.
See the FNV-1a hash description for more information.
- To summarize:
When dealing with an application that needs to
generate a hash value over an
arbitrary range, one can do one of the following:
- Change the application to use hash values that range
between 0 and 2n-1.
Use a n-bit FNV hash or
xor-folding if needed.
Pro: Yields the best results in the shortest
amount of CPU time.
Con: Requires source code change to force hash range to be
a power of 2 in size.
- Use the
lazy mod mapping method
if one does
not care about the slight hash bias and does not want (or cannot change)
the hash range.
Pro: Yields the fastest results for a non-power
of 2 range.
Con: Produces a slight bias against larger hash values.
However if one does not care about the slight bias, then there
is no problem using this technique.
- Use the
retry method
if one wants to avoid the
hash bias and does not want / cannot change
the hash range.
Pro: Produces non-biased values for a non-power
of 2 range.
Con: Requires slightly more CPU time in some cases.
(top)
Portable FNV hashing of numeric values
Some people have asked how one might FNV hash objects such as integers,
structures and other non-octet based data.
We recommend that you FNV hash such objects, an octet at a time
in a fixed byte order.
To ensure that the FNV hash values are the same on both Big Endian
and Little Endian CPUs, we recommend that objects you always hash
object such as integers from low to high byte significance order.
To ensure that structures (due to architecture based alignment
differences) are always hashed the same, we recommend that you
FNV hash each element in a structure in order.
If interoperability is not an issue, you can always FNV hash
objects as if they were just an array of octets:
/* WARNING: You really do not want hash structures this way!!! */
struct stuff {
int data; /* 32 bits? 64 bits? */
/* how many octets of padding? */
char byte;
/* how many octets of padding? */
short value; /* 16 bits? */
/* how many octets of padding? */
} foo;
/* WARNING: This FNV hash value may not be portable!!! */
/* WARNING: This FNV hash value may be wrong due to pad byte values!!! */
hash = fnv_64a_buf(&foo, sizeof(foo), FNV1_64_INIT);
The hash above may produce different values on Big Endian 32bit,
Big Endian 64bit, Little Endian 32bit, and Little Endian 64bit
architectures just to name a few.
Worse yet, the padding bytes may be non-zero due to the
way the structure was initialized in memory.
The padding bytes may contain whatever malloc might have picked
up from memory, which could have been set to any value.
You should really, really, really sure that FNV value
portability will NEVER be an issue,
and you should really be sure that the padding octets were
set to 0 before using the above hack.
To put the issue into different terms, consider a collection servers
that use FNV to hash SNMP identifiers (collections of integers).
If all that is needed is for a given server to perform a fast table lookup,
then treating SNMP identifiers as just an array of octets (bytes) to
produce a non-portable FNV value will work provided that any
padding octets were initialized to the same (say 0-filled) value.
However, if a server
needs to send an SNMP identifier FNV hash to another server,
then one will need to generate portable FNV hash values.
Consider two servers use FNV hashing to look up numeric and/or
compound values in a table.
If each server builds their own FNV hash table its own use,
then one does not need to produce portable FNV values.
The simple non-portable method of FNV hashing all of the octets will work fine.
On the other hand, if one server sends a FNV hashes to the other
(instead of sending values to save network bandwidth), then both
servers need to agree on how to they will FNV hash numeric values.
Use of portable FNV values is recommended.
When one of the servers is upgraded from a 32 bit AMD CPU to a
64 bit AMD CPU (potentially changing the memory footprint of values),
you will be glad you used portable FNV values!
You will be better off if use the portable hash method below.
We recommend that you FNV hash integers an octet (byte) at a time.
You should FNV hash them from low to high significance order.
Lets give an example of portable hashing,
where any padding octets are ignored,
using the FNV-1a hash:
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash
as a set of C pre-processor macros.
For clarify, we will ignore gcc optimization
and other performance issues.
The 64 bit FNV-1a hash can be expressed as a set of
C pre-processor macros:
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
/* NOTE: u_int64_t is a 64 bit unsigned type */
/* NOTE: u_int32_t is a 32 bit unsigned type */
/* NOTE: u_int16_t is a 16 bit unsigned type */
/* NOTE: u_int8_t is a 8 bit unsigned type */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
#define FNV1_64_PRIME ((u_int64_t)1099511628211)
#define FNV_64A_OP(hash, octet) \
(((u_int64_t)(hash) ^ (u_int8_t)(octet)) * FNV1_64_PRIME)
As an example, the 64 bit FNV-1a hash of a C-style string would become:
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
char *string; /* the string to 64 bit FNV-1a hash */
u_int64_t hash; /* will hold the final value of the hash */
char *p;
hash = FNV1_64_INIT;
for (p=string; *p; ++p) {
hash = FNV_64A_OP(hash, *p);
}
The 64 bit FNV-1a hash of an n octet buffer would become:
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
u_int8_t *buffer; /* octet data buffer of n octets to hash */
u_int64_t hash; /* will hold the final value of the hash */
int n; /* length of the buffer in octets (bytes) */
int i;
hash = FNV1_64_INIT;
for (i=0; i < n; ++i) {
hash = FNV_64A_OP(hash, buffer[i]);
}
All of the above operations are the standard, portable ways to hash
a collection of octets.
To hash a 32 bit integer in a portable way, we hash it an octet at a time
from low to high order:
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
u_int32 value; /* 32 bit integer to hash */
u_int64_t hash; /* will hold the final value of the hash */
hash = FNV1_64_INIT;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);
Note that the FNV_64A_OP macros above only hash the low octet of value.
By shifting value left 8 bits we effectively hash the value from the
low to high byte significance order.
We can create a general integer hashing macro (and function) as follows:
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
#define FNV_64A_INT(hash, integer) \
fnv_64a_int((u_int64_t)(hash), (u_int64_t)(integer), sizeof(integer))
/*
* fnv_64a_int - hash an integer of up to 8 octets in size
*
* given:
* hash current 64 bit FNV-1a hash value
* integer integer value to hash
* size size of the integer in octets (bytes)
*
* returns:
* 64 bit FNV-1a hash of the integer in low to high byte order
*/
u_int64_t
fnv_64a_int(u_int64_t hash, u_int64_t integer, size_t size)
{
switch (size) {
case 8: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 7: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 6: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 5: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 4: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 3: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 2: hash = FNV_64A_OP(hash, integer);
integer >>= 8;
/*FALLTHRU*/
case 1: hash = FNV_64A_OP(hash, integer);
break;
default:
fprintf(stderr, "fnv_64a_int: BOGUS SIZE: %d\n", size);
exit(1);
}
return hash;
}
Returning to our original struct stuff, we can FNV hash
it in a portable way as follows:
struct stuff {
int data; /* 32 bits? 64 bits? */
/* how many octets of padding? */
char byte;
/* how many octets of padding? */
short value; /* 16 bits? */
/* how many octets of padding? */
} foo;
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
hash = FNV1_64_INIT;
hash = FNV_64A_INT(hash, foo.data);
hash = FNV_64A_INT(hash, foo.byte);
hash = FNV_64A_INT(hash, foo.value);
(top)
Extended FNV type hashing
One can extend the concept of
Portable FNV hashing of numeric values
to the FNV hashing of data types.
Let us say you wish for the following two values, whey and bar
typically hash to different values:
struct curds {
u_int16_t left;
u_int16_t right;
} whey = {0x31, 0x31};
u_int32_t bar = 0x3131;
Or let us say that you want these to numeric values to
typically hash to different values:
int32_t value1 = 23209;
u_int32_t value2 = 23209;
If the Portable FNV hashing method were used,
both whey and bar would usually FNV hash to the same value.
And value1 and value2 would certainly hash to the same value.
Let is say, for some reason, you wanted whey and bar
(and value1 and value2)
usually FNV hash different values even though their bit
patterns in memory look the same.
You can accomplish this by hashing in a FNV type value before
hashing its numeric value.
Consider the following FNV type values enum:
/*
* This is an example of how FNV type values could be
* implemented. This is not an exhaustive list.
*
* The intent of these enum values is that they all bit into a
* single octet (i.e., < 256).
*/
enum fnv_type {
/* Common C types */
FNV_VOID = 0,
FNV_CHAR, FNV_UCHAR, FNV_SHORT, FNV_USHORT, FNV_INT,
FNV_UINT, FNV_LONG, FNV_ULONG, FNV_FLOAT, FNV_DOUBLE, FNV_LONGDOUBLE,
/* Common C type extension */
FNV_LONGLONG, FNV_ULONGLONG,
/* Explicit intXY_t and u_intXY_t typedefs */
FNV_INT8_T, FNV_U_INT8_T, FNV_INT16_T, FNV_U_INT16_T,
FNV_INT32_T, FNV_U_INT32_T, FNV_INT64_T, FNV_U_INT64_T,
/* bit fields by length, ctype foo : 7 ==> BITFLD_7 */
FNV_BITFLD1, FNV_BITFLD2, FNV_BITFLD3, FNV_BITFLD4,
FNV_BITFLD5, FNV_BITFLD6, FNV_BITFLD7, FNV_BITFLD8, FNV_BITFLD9,
FNV_BITFLD10, FNV_BITFLD11, FNV_BITFLD12, FNV_BITFLD13, FNV_BITFLD14,
FNV_BITFLD15, FNV_BITFLD16, FNV_BITFLD17, FNV_BITFLD18, FNV_BITFLD19,
FNV_BITFLD20, FNV_BITFLD21, FNV_BITFLD22, FNV_BITFLD23, FNV_BITFLD24,
FNV_BITFLD25, FNV_BITFLD26, FNV_BITFLD27, FNV_BITFLD28, FNV_BITFLD29,
FNV_BITFLD30, FNV_BITFLD31, FNV_BITFLD32, FNV_BITFLD33, FNV_BITFLD34,
FNV_BITFLD35, FNV_BITFLD36, FNV_BITFLD37, FNV_BITFLD38, FNV_BITFLD39,
FNV_BITFLD40, FNV_BITFLD41, FNV_BITFLD42, FNV_BITFLD43, FNV_BITFLD44,
FNV_BITFLD45, FNV_BITFLD46, FNV_BITFLD47, FNV_BITFLD48, FNV_BITFLD49,
FNV_BITFLD50, FNV_BITFLD51, FNV_BITFLD52, FNV_BITFLD53, FNV_BITFLD54,
FNV_BITFLD55, FNV_BITFLD56, FNV_BITFLD57, FNV_BITFLD58, FNV_BITFLD59,
FNV_BITFLD60, FNV_BITFLD61, FNV_BITFLD62, FNV_BITFLD63, FNV_BITFLD64,
/* pointer to some C type */
FNV_POINTER,
/* ... */
};
/*
* FNV macro to hash FNV type values
*
* NOTE: Assumes that the FNV type value < 256.
*/
#define FNV_64A_TYPE(hash, type) \
FNV_64A_OP((hash), (enum fnv_type)(type))
If we precede the FNV hashing of each data type with an FNV hash
of the octet that represents the FNV type values, we can
distinguish among different data types, even when their bit pattern
in memory is identical.
Example:
struct stuff {
int data;
char byte;
short value;
unsigned long longvalue;
int *words;
void *pointer;
char **list;
u_int16_t a : 3;
u_int16_t b : 2;
u_int16_t c : 4;
} baz;
/*
* NOTE: This is for illustration purposes. Optimization,
* if needed, is left as an exercise for the reader. :-)
*/
hash = FNV1_64_INIT;
/* int data */
hash = FNV_64A_TYPE(hash, FNV_INT);
hash = FNV_64A_INT(hash, baz.data);
/* char byte */
hash = FNV_64A_TYPE(hash, FNV_CHAR);
hash = FNV_64A_INT(hash, baz.byte);
/* short value */
hash = FNV_64A_TYPE(hash, FNV_SHORT);
hash = FNV_64A_INT(hash, baz.value);
/* unsigned long longvalue */
hash = FNV_64A_TYPE(hash, FNV_ULONG);
hash = FNV_64A_INT(hash, baz.longvalue);
/* int *words */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_INT);
hash = FNV_64A_INT(hash, baz.words);
/* void *pointer */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_VOID);
hash = FNV_64A_INT(hash, baz.pointer);
/* char **list */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_CHAR);
hash = FNV_64A_INT(hash, baz.list);
/* u_int16_t a : 3 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD3);
hash = FNV_64A_INT(hash, baz.a);
/* u_int16_t b : 2 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD2);
hash = FNV_64A_INT(hash, baz.b);
/* u_int16_t c : 4 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD4);
hash = FNV_64A_INT(hash, baz.c);
Again, if you do NOT care about distinguishing between
different data types even when you have the same numeric values,
you do not need to use something like the method above.
And if you do not need portable FNV files, then:
hash = fnv_64a_buf(&baz, sizeof(baz), FNV1_64_INIT);
will work just fine.
(top)
FNV source
In the C FNV source below, primes are provided
for 32 bit and 64 bit unsigned integers.
For compilers that do not implement the unsigned long long
type, code is provided to quickly simulate the 64 bit multiply
by the particular FNV_prime.
Nicola Bonelli
has an implementation of FNV using the iovec interface:
Nicola Bonelli
also wrote the following C++ FNV implementation:
Wayne Diamond implemented 32-bit FNV algorithm in
PowerBASIC inline x86 assembly:
FUNCTION FNV32(BYVAL dwOffset AS DWORD, BYVAL dwLen AS DWORD, BYVAL offset_basis AS DWORD) AS DWORD
#REGISTER NONE
! mov esi, dwOffset ;esi = ptr to buffer
! mov ecx, dwLen ;ecx = length of buffer (counter)
! mov eax, offset_basis ;set to 2166136261 for FNV-1
! mov edi, &h01000193 ;FNV_32_PRIME = 16777619
! xor ebx, ebx ;ebx = 0
nextbyte:
! mul edi ;eax = eax * FNV_32_PRIME
! mov bl, [esi] ;bl = byte from esi
! xor eax, ebx ;al = al xor bl
! inc esi ;esi = esi + 1 (buffer pos)
! dec ecx ;ecx = ecx - 1 (counter)
! jnz nextbyte ;if ecx is 0, jmp to NextByte
! mov FUNCTION, eax ;else, function = eax
END FUNCTION
Wayne said:
''Just thought I should let you know that I've ported the 32-bit
FNV algorithm over to inline assembly. It's actually in PowerBASIC
(www.powerbasic.com) format - a compiler I use, but the main function
is all assembly. It could be optimized further in terms of saving a
couple of clock cycles, but it's fairly optimized al ready - only 6
instructions in the main loop, plus 5 setup instructions, and compiles
to just 33 bytes.''
M.S.Schulte sent us these 32-bit FNV-1 and FNV-1a x86 assembler
implementations (written in flat
assembler), half of which were optimized for speed, the other half
were optimized for size:
small_fnv32: ;FNV1 32bit (size: 31 bytes)
; Intel Core 2 Duo E6600: 354.20 mb/s
push esi
push edi
mov esi, [esp + 0ch] ;buffer
mov ecx, [esp + 10h] ;length
mov eax, [esp + 14h] ;basis
mov edi, 01000193h ;fnv_32_prime
next:
mul edi
xor al, [esi]
inc esi
loop snext
pop edi
pop esi
retn 0ch
small_fnv32a: ;FNV1a 32bit (size: 31 bytes)
; Intel Core 2 Duo E6600: 327.68 mb/s
push esi
push edi
mov esi, [esp + 0ch] ;buffer
mov ecx, [esp + 10h] ;length
mov eax, [esp + 14h] ;basis
mov edi, 01000193h ;fnv_32_prime
nexta:
xor al, [esi]
mul edi
inc esi
loop nexta
pop edi
pop esi
retn 0ch
fast_fnv32: ;FNV1 32bit (size: 36 bytes)
; Intel Core 2 Duo E6600: 565.12 mb/s
push ebx
push esi
push edi
mov esi, [esp + 10h] ;buffer
mov ecx, [esp + 14h] ;length
mov eax, [esp + 18h] ;basis
mov edi, 01000193h ;fnv_32_prime
xor ebx, ebx
next:
mul edi
mov bl, [esi]
xor eax, ebx
inc esi
dec ecx
jnz next
pop edi
pop esi
pop ebx
retn 0ch
fast_fnv32a: ;FNV1a 32bit (size: 36 bytes)
; Intel Core 2 Duo E6600: 574.95 mb/s
push ebx
push esi
push edi
mov esi, [esp + 10h] ;buffer
mov ecx, [esp + 14h] ;length
mov eax, [esp + 18h] ;basis
mov edi, 01000193h ;fnv_32_prime
xor ebx, ebx
nexta:
mov bl, [esi]
xor eax, ebx
mul edi
inc esi
dec ecx
jnz nexta
pop edi
pop esi
pop ebx
retn 0ch
(top)
gcc optimization
It has been reported by several people that under the gcc compiler
with -O3 on many AMD & Intel CPUs, that replacing the
FNV_prime multiply with a expression of shifts and adds
will improve the performance.
Limited testing on our part confirmed that one can gain a few %
in speed on an 1.6GHz AMD Athlon using gcc version 3.2.2 with -O3 optimization.
For a 32 bit FNV-1, we used:
while (bp < be) {
/* multiply by the 32 bit FNV magic prime mod 2^32 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
hval *= FNV_32_PRIME;
#else
hval += (hval<<1) + (hval<<4) + (hval<<7) + (hval<<8) + (hval<<24);
#endif
/* xor the bottom with the current octet */
hval ^= (Fnv32_t)*bp++;
}
For a 32 bit FNV-1a, we used:
while (bp < be) {
/* xor the bottom with the current octet */
hval ^= (Fnv32_t)*bp++;
/* multiply by the 32 bit FNV magic prime mod 2^32 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
hval *= FNV_32_PRIME;
#else
hval += (hval<<1) + (hval<<4) + (hval<<7) + (hval<<8) + (hval<<24);
#endif
}
For a 64 bit FNV-1, we used:
while (bp < be) {
/* multiply by the 64 bit FNV magic prime mod 2^64 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
hval *= FNV_64_PRIME;
#else
hval += (hval << 1) + (hval << 4) + (hval << 5) +
hval << 7) + (hval << 8) + (hval << 40);
#endif
/* xor the bottom with the current octet */
hval ^= (Fnv64_t)*bp++;
}
For a 64 bit FNV-1a, we used:
while (bp < be) {
/* xor the bottom with the current octet */
hval ^= (Fnv64_t)*bp++;
/* multiply by the 64 bit FNV magic prime mod 2^64 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
hval *= FNV_64_PRIME;
#else
hval += (hval << 1) + (hval << 4) + (hval << 5) +
hval << 7) + (hval << 8) + (hval << 40);
#endif
}
|