Friday, March 16, 2012

The Finer Points of Digest Finalization

I want to go into a bit more detail regarding the digest finalization process, reiterated below:


(Finish computing dot product, y, above, leaving (x, c) as its most recent value, such that x has been used to compute the last term of y.)

z = (y + (c * 2^32) + x) MOD 2^64
x <- z MOD 2^32
c <- z / 2^32
ITERATE_NO_ZERO_CHECK
ITERATE_NO_ZERO_CHECK
ITERATE_NO_ZERO_CHECK
z = (z + (c * 2^32) + x) MOD 2^64

z is the (final) digest. Done!


It occurred to me the that the above process appears somewhat arbitrary. It is not.

The first step, in which we add the dot product, y, to the current (x,c), is important because we want the LMD of a null file to be "random". Otherwise, imagine how easy it would be to have a null file with an LMD of 0, which appears to be valid. Because y can be any of 2^64 states, z has the same domain.

We pass through the iterator 3 times because, as explained previously, that seems to be optimal from an entropy accrual standpoint, balanced against the need to avoid excessive latency.

In the last stage, we combine z with (x,c) from the beginning of the process. This is critical in order to ensure that z can assume any of 2^64 states; otherwise, it would be constrained by virtue of having passed through the iterator thrice.

No comments:

Post a Comment