Thursday, February 28, 2013

LMD7


LMD7, whose C source code is here, is a kilobit hash which operates on 32-kilobit blocks, and largely resembles LMD6, but it's about twice as fast. (I won't repeat all the discussion of how LMD6 was evolved, but it's worth a read if you want to understand LMD7, because this article will focus only the differences.)

[Sorry about the excessive whitespace which Blogspot injects at random just to be annoying.]

Here's the algo:


j=((2^15)/N)-1
x=X0
c=C0
y=Y0
d=D0
FOR (i=0 to (j-1),i+=2){
  x=x xor y
  c=c xor d
  p=A*(x xor L[N, i])+(c xor L[N, i+1])
  x=p MOD 2^N
  c=p/(2^N)
  q=B*(y xor L[N, i xor ((j+1)/2)])+(d xor L[N, (i+1) xor ((j+1)/2)])
  y=q MOD 2^N
  d=q/(2^N)
}
p=p+(y*(2^N))+d
p=p MOD 2^(2N)
z=p xor M


where:


j is the maximum index of the block, each item of which is N bits wide.

N is half the bit width of the hash, which is 512 because an LMD7 hash is 1024 bits wide.

X0, C0, Y0, and D0 are the N-bit hash seeds, which are intended to be implied by a cryptographic context, and which are neither used for any other purpose nor stored anywhere. In the C source code, they appear in memory in the aforementioned order at p_base on input to the hash. Note that we have 4N bits going into a hash which will ultimately be only 2N bits wide.

x, c, y, and d are also N bits wide. (x, c) and (y, d) function in a manner similar to (x, c) in a Marsaglia oscillator. (x is at the memory address immediately below c.)

p is the first oscillator state, 2N bits wide. It's more complicated than a Marsaglia oscillator because its design is a product of evolution, but it contains the components of one.

A is the first Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D0, is (2^512-(LMD7_D0*(2^32))). LMD7_D0=0xD1AEF329.

L is the array of (j+1) N-bit words whose hash to evaluate. (In all cases, L is 2^15 bits long. It's just a question of how wide we consider a word to be, and thus how many words L contains.) Note that L is indexed from 1 through 64 in Mathematica, but from 0 through 63 in the algo above.

q is the second oscillator state, 2N bits wide.

B is the second Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D1, is (2^512-(LMD7_D1*(2^32))). LMD7_D1=0xE5467E8F.

M is a xor mask, essentially a cryptographically generated nonce, which will mask the hash. (Encrypting the hash with this mask would appear to be overkill, but it's dirt cheap compared to encrypting and taking the hash of the payload, so why not.)

z is the hash (the digest) of L, which is 2N bits wide.


And here is a Mathematica example, using exactly the same seeds and hash as over here, which also appears in demo.c in the source code. Cut and paste into the Mathematica Kernel command prompt, without any carriage returns that this website might have injected. Then hit Enter to execute it.


A=(2^512-((2^32)*16^^D1AEF329));B=(2^512-((2^32)*16^^E5467E8F));n=512;t=2^n;u=2^(n+n);X0=1;C0=2;Y0=3;D0=4;M=0;j=((2^15)/n)-1;L=Array[#&,j+1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,((j+1)/2)]+1]]])+BitXor[d,L[[BitXor[i,((j+1)/2)]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=Mod[p+BitShiftLeft[y,n]+d,u];z=BitXor[p,M];Print[BaseForm[z,16]];


The above code, when executed in the Mathematica Kernel, should produce:


23f02b7e5b0d92db672706d46d00b7e19bf25a746dffe8c4afcf7c4d71bdb4df8128af2d41a40605a2c158cb4ea33e776b7da1877f1053bf9fa2d5c40dd01c5a19b4aa63034626d7f6adff9fa871b5709878115ec2e4c5eb1caa4b86d8dea9b28caea6278db03a6b03c6bebaad49cf3279fb724a653225613d9f6422ce63a70e


Thus LMD7 involves 2 oscillators of large mutually prime period -- ((A*(2^511))-1) and ((B*(2^511))-1) -- starting half a block apart and rotating through all words in a block, with feedback moving from oscillator B to oscillator A after each step (but not in the other direction). Because a block is 2^15 bits long, and an LMD7 is 2^10 bits long, that means each oscillator iterates 32 times before the hash is issued.

Thus immediately after oscillator A processes word 0, oscillator B processes word 16; immediately after oscillator A processes word 31, oscillator B processes word 15. Thus in the weakest case, a given word is iterated through one of the oscillators only once, while being iterated 17 times through the other.

The main reason why LMD7 is about twice as fast as LMD6 is that the word width of the former is twice as long as the latter. In other words, while LMD6 only xors X with the plaintext, LMD7 xors both X and C. Normally, this would be unsafe because the high bit of a Marsaglia oscillator is biased. But in this case, the bias of each oscillator is 1 part in O(2^(512-64)), or O(10^134). Is that exploitable? Good luck.

One obvious requirement is that both oscillators come up to essentially maximum entropy in at most 17 iterations. That way, when one oscillator is at its weakest (i.e. a single digesting iteration remaining), the other is sufficiently strong to pick up the slack. To that end, LMD7_D0 and LMD7_D1 have been chosen so that the scintillating entropy profile of the oscillators ends up at 511 bit transitions after 17 iterations, in both cases. In other words, starting from 2 1024-bit random seeds which differ by only a bit, the xor of the 17-iterated versions of these seeds will itself contain 511 bit transitions -- trivially shy of maximum entropy. But there are 2 such oscillators, which after digesting the block, are combined by crossing high and low 512-bit parts, using carry-propagated addition, to yield a 1024-bit hash which is for all practical purposes unbiased. And by the way, this is where the factor of 2^32 comes in A and B: without this factor, the oscillator doesn't entropize fast enough. But with it, we can use simple 32-to-64-bit multiplies to rapidly iterate each oscillator.

In the interest of creating a compelling case for the security of LMD7, perhaps the most important consideration is demonstrating that setting a single 1 bit in a block consisting of all 0s produces a change to the hash of such complexity as to appear random. This "change" is what I call the
"xor compensator" of the 2 hashes, because when this value is xored to the hash of the poisoned block, it morphs into the hash of the legitimate block. Statistically speaking, this means that if we do 2^2N such tests, and thereby generate as many xor compensators, that we should expect to encounter (R*(2^2N)) unique xor compensators, where R is the reflexive density constant.


(By the way, xor is obviously not the only way one could compensate a hash. Addition, modulo multiplication, and an infinite number of other functions are candidates as well. But xor tends to be a good proxy for addition: if the xor compensator shows dangerous skews, then chances are that the addition compensator does so as well, and vice versa. Bias in other compensator spaces are an interesting open question, and I don't pretend to be proving anything in this regard.)


The only minor problem, of course, is that 2N=1024. This forces us into a regime of extrapolatory algogenesis, which is about the best we can do, absent a proof of complexity, which has thus far eluded all hash algos ever invented. We can always to a more thorough job in this regard, but 2 cases are explored below, one with N=8 and the other with N=9. In other words, we are scaling LMD7 down from a 1024-bit hash to a 16-or-18-bit hash, repsectively. The algo is otherwise unchanged, including the number of words per block, 64.

Mainly, we're looking for evidence that setting a single 1 in a block of 0s with 2^2N random seeds creates a xor compansator set which is compliant with R, as described above. Any more complicated change to any more complicated block might lead us to unjustifiably optimistic assumptions about the strength LMD7.

The Marsaglia A and B values in this case will be (t-46) and (t-52), where t=2^8, as you can see in the code below. (By the way, (A*(2^8)-1) and  (A*(2^7)-1) are both prime as required, and similarly for B.) So here's the Mathematica code which will do this (remove any carriage returns injected by this blog website, then paste into the Mathematica Kernel command prompt). (I realize that the code looks horrible, which is because it's only intended for verification purposes; the C is much more elegant, I promise!)


n=8;t=2^n;A=(t-46);B=(t-52);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;F=RandomInteger[j]+1;G=RandomInteger[n-1];L[[F]]=BitShiftLeft[1,G];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[F]]=F;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];


The output might take 10 minutes to generate. It looks like this (but might slightly differ from yours, on account of the RandomInteger[] function being nondeterministic):


w=0
   16
.
.
.

w=f000
      16




R=0.607376
R_ideal=0.632121
population_max=8
population_max_density_log2=13.


(The W's are just progress indicators, and the "16"s are just Mathematica's obnoxious way of telling us that the result is in hexadecimal.) The important part is at the bottom: This was a 16-bit hash (because A and B both have 8 bits), which as you can see from the W's, we've tested 2^16 times. On each test, we hashed a block of 0s against the same block with a randomly selected bit set to 1, starting from the same random seed in both cases. We then xor the resulting hashes. Finally, we set the bit corresponding to this xor-of-hashes (Z) in a table (K) of 2^16 xor compensator population counts, initially all 0s.

When the experiment is done, the fraction of nonzeros we expect to find in the table is R, or 0.632121... As you can see, this hash performed well, but not great, coming in at 0.61. That means that the xor compensator is modestly biased, given a single bit flip in the simplest possible plaintext configuration, which is a block of 0s. You could say that we're measuring the "ground state entropy" of the hash.

As to population_max_density_log2, that's the log2 of ((2^2N)/population_max), or log2(65536/8) in this case. This represents the log2 rarity of the most popular xor compensator. It's subject to a lot of noise, but it functions like a canary in a coal mine: if it's too small, then the most popular xor compensator can be easily guessed by examining test cases involving known seeds. For example, if it were 3, that would mean that 1 in 8 xor compensators were the same value. 13 is probably fine for a 16-bit hash.

In fact, what is the statistically expected value of the maximum population? Combinatorics is informative:


Let Q be the probability of one particular integer in the population map (K, in the code above) being incremented ("hit") T times, i.e. of the corresponding xor compensator occurring T times, assuming that the xor compensators are distributed randomly because the hash is "hard".

Let U=2^(2N), where 2N is the number of bits in the hash.

Then:

Q=((1/U)^T)*((1-(1/U))^(U-T))*(U nCr T)

because:

((1/U)^T) is the probability of being hit T times, at exactly the iteration numbers predicted.

((1-(1/U))^(U-T)) is the probability of not being hit (U-T) times, at exactly the iterations numbers predicted.

(U nCr T) is the number of possible combinations involving T hits over U attempts.

So then:

P=1-((1-Q)^U)

is the probability that at least one integer will be hit exactly T times.

because:

(1-Q) is the probability that a given integer is not hit exactly T times.

((1-Q)^U) is the probability that no integer will be hit exactly T times.

So (1-((1-Q)^U)) is the probability that at least one integer will be hit exactly T times.


If P=0.5, then T is the expected maximum population. Since T is discrete, we don't expect to hit P=0.5 exactly. That's OK, so long as T ends up in a state which is "reasonably probable". T=1000 with N=8, for example, would not be reasonably probable. That would be bad because it would suggest that the xor compensator could easily be guessed based on statistical analysis involving a random set of known seeds.

So in this case, we have T=8 and U=2^16. Find P with Mathematica:


t=8;n=8;u=2^(n+n);p=(1/u)^t*(1-(1/u))^(u-t)*Binomial[u,t];p=1-((1.-p)^u);Print["p=",p];Print["p_reciprocal_log2=",-Log[2.,p]];


which produces:


p=0.449961
p_reciprocal_log2=1.15213


Or if you like WolframAlpha, which I do because I'm a nerd:

http://www.wolframalpha.com/input/?i=1-%28%281-%28%28%281%2F65536%29%5E8%29*%28%281-%281%2F65536%29%29%5E%2865536-8%29%29*Binomial%5B65536%2C8%5D%29%29%5E65536%29

In plain English, the probability of discovering at least one population of 8 in the map is about 44% -- so our result is utterly unremarkable if we assume that the hash is hard. We can't prove the converse, but so far, so good.

p_reciprocal_log2 is just Log[2,1/p], which is just a logarithmic way of looking at p.

Let's kick it up a notch, as Emeril Lagasse would say. Try N=9, meaning an 18-bit hash, with A=(t-17) and B=(t-47), where t=2^9 and the usual primality requirements are satisfied:


n=9;t=2^n;A=(t-17);B=(t-47);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;F=RandomInteger[j]+1;G=RandomInteger[n-1];L[[F]]=BitShiftLeft[1,G];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[F]]=F;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];



...and the results:



R=0.627625
R_ideal=0.632121
population_max=8
population_max_density_log2=15.




Thus R is closer to R_ideal. This isn't an accident. In part, it comes from the fact that we're using less biased oscillators, i.e. A and B values which are proportionally closer to 2^N in with N=9 than N=8. You can imagine how this would work out if the bias were 1 part in O(10^134), as with LMD7: R should be indistinct, in practice, from R_ideal.

And what about population_max_density_log2? This time, N=9 but still T=8:


t=8;n=9;u=2^(n+n);p=(1/u)^t*(1-(1/u))^(u-t)*Binomial[u,t];p=1-((1.-p)^u);Print["p=",p];Print["p_reciprocal_log2=",-Log[2.,p]];


which produces:


p=0.908519
p_reciprocal_log2=0.138411


Which means it's 90% likely that we'll find at least one population of 8 (i.e. T=8) in the map -- a tad high, when we're looking for something nearer 50%. But it turns out that T=9 has a 23% chance of the same. So T=8 is unsurprising.

What this all points to, thus far, is that the xor compensator appears to be randomly distributed, i.e. "mini LMD7" is hard. Extrapolation suggests, then, that LMD7 is also hard, as measured by the same tests.

But it behooves us to pound harder on the most weakly protected bit in the block.

The weakest bit in the block is probably bit ((2^14)-1), as measured on average over a wide variety of seeds. This bit is weakly protected because it's the last bit to be integrated into the B oscillator, and is thus subject to the least number of entropizing iterations after the fact. Additionally, while the state of the B oscillator feeds into the A oscillator (via xor at the top of the loop), the reverse isn't true, so B probably cannot entropize faster than A. We return to the N=8 case, with the difference that we'll only compare a block of 0s to the same block with bit ((2^14)-1) set ("L[[(j+1)/2]]=BitShiftLeft[1,n-1]"):


n=8;t=2^n;A=(t-46);B=(t-52);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#*0&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;L[[(j+1)/2]]=BitShiftLeft[1,n-1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[(j+1)/2]]=0;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];


The results are somewhat disturbing:


R=0.521194
R_ideal=0.632121
population_max=11
population_max_density_log2=12.5406


R is badly skewed away from R_ideal. And population_max is also elevated, with a P value of 1 part in 6x10^(-4). So the xor compensators do not appear randomly distributed. On the other hand, R is still large enough to create intractability when N=512, and population_max is still small enough to be safe, in the sense that population_max_density_log2 exceeds N, which means that a executing a birthday attack on the hash is still easier than guessing the correct xor compensator, even if the correct answer happens to be the most popular one. In other words, the xor compensators have a density which is (exponentially) larger than the square root of the hash space -- even in this uberweak case.

So do things improve when N increases to 9? Let's try:


n=9;t=2^n;A=(t-17);B=(t-47);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#*0&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;L[[(j+1)/2]]=BitShiftLeft[1,n-1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[(j+1)/2]]=0;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];


The results, averaged over 2^18 trials, are much more consistent with randomness:


R=0.627911
R_ideal=0.632121
population_max=8
population_max_density_log2=15.


T=8 at N=9 is, as discussed above, utterly unremarkable. So the statistics of the system are indeed starting to whiten (sic) out.

Results may vary, and it's probably true that, given enough computer power, we would observe scintillating entropy on the way to N=512, but it appears that the LMD7 xor compensator space is asymptotically random. In other words, LMD7 would appear to be hard -- even if left unencrypted (which is not recommended, out of an abundance of caution) -- to the extent that its seeds are opaque. But prove me wrong if you can.

No comments:

Post a Comment