Monday, August 19, 2013

Karacell Encryption vs. Press Releases

You might have found this page because you discovered that I coinvented the Karacell encryption algo with Stuart Christmas. So to whom it may concern, I have been misquoted in the press:


“Karacell has gone through independent rigorous testing to prove it is faster and more efficient that [sic] existing standards and we have resolved the concern on mobile power drain.” adds Russell Leidich, Chief Information Scientist.

I never said this, although I do work for Tigerspike and very happily so. We have a brilliant and gutsy CEO (Luke Janssen), and a group of talented engineers behind him. But we do have a bit of a press release problem. (It's a testament to the company's commitment to openness that they won't fire me for posting this here.) I'm not even sure why our press person wrote the above "quote". I suppose that they somehow thought I had approved of being quoted in this manner. I never issued such approval.

This whole press release, moreover, has a snake oil flavor to it, as though we're cryptographic used car salesmen or something. But I take it all in stride, not the lease of which because I have no management authority to intervene here. What matters most is whether the algo is, in fact, stronger and more power-efficient than the predominant stanardard (chiefly, AES), given the same key length, particularly in a quantum computing world. I do believe that to be the case.

And if you want to attack our algo pulicly, then I encourage you to do so on, which is the de facto discussion forum for such things. Nothing beats freedom of speech when it comes to the question of encryption strength. Thoughtful criticism is good for the industry. On the other hand "it's weak because it adds numbers together, just like this other weak algo I read about" isn't helpful.

Let me be clear: Karacell doesn't rely on any mathematical proof of strength. Nor does any other encryption algo. But it does rely on the well studied Subset Sum problem, which is NP-complete. No proof is available for any algo because doing so would probably be tantamount to deciding the famous P vs. NP question.

If you want the source code and documentation -- without the press release jive -- then please visit the Karacell website linked above and have a look at the video and whitepaper. But yes, Tigerspike still has a monetary prize available for cracking it (and yes, they do have the money!), as described on the site. I personally invited a number of published experts in the knapsack problem field (of which Subset Sum is one example) to participate. But the prize is still outstanding. Perhaps it's because everyone thinks that Karacell is snake oil. With our press release posture of late, it's little wonder. But better a strong algo which has a facade of weakness, than the other way round.

Friday, May 17, 2013

Quantum Foam Computers?

It's my understanding that individual universes within a superposed and entangled quantum computer cannot communicate. Were they permitted to, that would amount to Schrodinger's cats (plural) talking to one other in order to decide whether or not to be alive. Alas, we cannot decide which universe we land in; we are guaranteed only that entangled states shall be consistent when the wave function collapses. Is this not the central point of the EPR paradox?

In practice, the user manual for the DWave quantum computer refers to the probability of obtaining a particular result to a given combinatorial problem. The probability appears to be monotonically related to the reciprocal of the potential energy associated with the result. It's not as though universes can chat with one another while in superposition, then force the system to collapse to the global minimum. Such state may only be reached with luck after several iterations of the same experiment.

But is this really a limitation of quantum physics, or just our understanding thereof? In particular, if the plankscale wormholes of quantum foam actually exist, then it should be possible to create a superposed, entangled, and interuniversally connected quantum computer. This would amount to a network of universes, as opposed to merely a group of universes. Such networked "foambits" would seem to be the next logical evolutionary step of the quantum computer.

Alternatively, if as some physicists hypothesize, gravity can leak from one universe into another, then the equivalent of a quantum foam computer might be constructed on a macroscopic scale, allowing gravity waves to propagate information among universes while in superposition. To this end, masses in superposition are known science (granted, not of the black hole magnitude which popularized gravity waves in the first place).

And if the rumors are true that quantum computers can at best square root the complexity of a classical problem, then what additional reduction might be had by virtue of foambits rather than qubits?

I haven't the vaguest proposal as to how to engineer a quantum foam computer, but the concept seems compelling, if the physics pan out.

Thursday, April 25, 2013

Meet the Microtubule: the Putative Biological Quantum Computer with Error Correction

Might we enslave bacteria to produce a quantum computer?

Central to the entire discussion of hashes and cryptography is the question of whether or not a quantum computer is technologically feasible.

Companies such as DWave Systems would suggest that the answer is yes, although there is an ongoing debate as to whether their special-purpose device is rightly classified as a quantum computer.

As usual, it would appear that biology precedes technology: Dr. Stuart Hameroff et al, in the company of Oxford physicist and mathematician Sir Roger Penrose, have discovered that microtubules, the internal information network within individual cells, appear to perform digital computations in a manner consistent with quantum entaglement, with classical inputs and outputs mediated by microtubule associated protein (MAP). Their hypothesis is rooted in an understanding of the superpositional tendencies of aromatic groups within tubulin, the constituent protein of microtubules, each of which appears to function as a single qubit. Moreover, the system appears to be configured so as to allow millions of neurons to participate in common entanglements involving all of their microtubules, lasting tens of milliseconds before collapsing into some sort of conscious realization. He presented his findings in this video.

If true, the implications are profound. For one thing, neural firing activity might merely amount to the exchange of classical inputs and outputs with the brain's "real" computers, the microtubules within the neurons, as opposed to the neurons themselves. He discusses some of them in this interview, and does well despite the confusion of the interviewer.

Granted, just because a given molecular system can behave as a quantum computer does not mean that it does. Either way, however, I think Dr. Hameroff is onto something here...

Saturday, March 2, 2013


LMD8, whose source code is here, is a 512-bit hash with 32-kilobit blocks. It's about 30% faster than LMD7 (in addition to the reduced IO time), or better if the loops are unrolled at the expense of instruction cache efficiency.

Pursuant to further analysis of the birthday attack myth, I don't think that a strong hash need be longer than 512 bits (and that's probably overstated even so). To the extent that no hash is optimally strong, of course, we need extra bits to compensate. But it's a fool's game to try to guess how much redundancy is sufficient, in the absence of a solid understanding of the weaknesses of a given hash.

It boils down to this: if you have a 2N-bit hash, then a birthday attack can crack it in O(2^N) operations. However, this process requires memory large enough to store O(2^N) hashes. If your memory size is only 2^S, where S<N, then the complexity expands to O(2^(S+2*(N-S))). (The execution time doesn't expand quite as much, on account of faster communication in smaller memories.)

At the extreme where S=0, the memory is only large enough to store 1 working hash (ignoring the readonly hash to which it must be compared), which is minimally sufficient for cracking. A quantum computer comes to mind. In this case, the classical complexity is about O(2^2N), which the quantum computer would then square root to roughly O(2^N), which is no better than the classical complexity, given sufficient memory. 
In principle, O(2^S) qubits would provide a quantum complexity of O(2^(S+(N-S))), which is somewhat better, but probably not much use in practice because we usually have S<<N, especially when dealing with qubits. (So far as I know, multiverses can't communicate with each other, which means that a quantum computer isn't of much use in problems where all the elements need to interact.)

So how long of a hash is enough, assuming that it's strong? Obviously, the answer is subject to massive error, but we can at least turn to physics for a suggestion. Let's assume that you can encode a hash on a neutron, and that you can pack them densely, neutron star style, for the purposes of creating storage media. 2^128 of these bits would then weigh (9x10^(-14)) as much as Earth. That's like the weight of a modest sized asteroid -- huge, but not inconceivable, with all the talk of asteroid mining these days. At neutron star density, it would probably fit in a room.

2^256 of them, on the other hand, would consume (3x10^25) Earth masses. Barring some discovery of much denser matter amenable to data storage, it's not happening.

Bear in mind, however, that all of this is of course predicated on the hash being optimally hard, which is most certainly never the case. And perhaps I'm wrong about quantum computers not being helpful. We could, of course, also use some number of bits that wasn't a power of 2, but that gets yucky. So for now, I think I'm comfortable with a 512-bit hash. In time, maybe I'll feel comfortable with 256 bits as well, and invent LMD9. We'll see.

LMD8 is exactly like LMD7, but LMD8 has N=256. Because there are 64 words in a block with LMD8, instead of 32, the oscillator values LMD8_D0 and LMD8_D1 have been selected to hit maximum entropy after 33 iterations, in order to sufficiently protect the weakest bit in the block, which is still bit ((2^14)-1). Changing this bit when the rest of the block is 0s results in very nearly 256 bit transitions in xor of the hashes, averaged over many identical seed pairs -- in other words, chaos.




I won't reiterate the algo, because it was covered in the LMD7 blog entry. LMD8_A0 and LMD8_A1 satisfy the usual primality requirements for a long Marsaglia oscillator. Specifically, the period is ((LMD8_A0*(2^0x1FF))-1), and similarly for LMD8_A1. The high bit bias in either case is on the order of 1 part in (10^67) in favor of 0, i.e. negligible even for cryptographic purposes. Here's a sample input for Mathematica Kernel:


...and its output hash:


This is the same output generated by demo.c in the source code.

I don't need to reiterate the whole evolutionary process by which its security has been inferred from LMD7 and ultimately LMD6.

Thursday, February 28, 2013


LMD7, whose C source code is here, is a kilobit hash which operates on 32-kilobit blocks, and largely resembles LMD6, but it's about twice as fast. (I won't repeat all the discussion of how LMD6 was evolved, but it's worth a read if you want to understand LMD7, because this article will focus only the differences.)

[Sorry about the excessive whitespace which Blogspot injects at random just to be annoying.]

Here's the algo:

FOR (i=0 to (j-1),i+=2){
  x=x xor y
  c=c xor d
  p=A*(x xor L[N, i])+(c xor L[N, i+1])
  x=p MOD 2^N
  q=B*(y xor L[N, i xor ((j+1)/2)])+(d xor L[N, (i+1) xor ((j+1)/2)])
  y=q MOD 2^N
p=p MOD 2^(2N)
z=p xor M


j is the maximum index of the block, each item of which is N bits wide.

N is half the bit width of the hash, which is 512 because an LMD7 hash is 1024 bits wide.

X0, C0, Y0, and D0 are the N-bit hash seeds, which are intended to be implied by a cryptographic context, and which are neither used for any other purpose nor stored anywhere. In the C source code, they appear in memory in the aforementioned order at p_base on input to the hash. Note that we have 4N bits going into a hash which will ultimately be only 2N bits wide.

x, c, y, and d are also N bits wide. (x, c) and (y, d) function in a manner similar to (x, c) in a Marsaglia oscillator. (x is at the memory address immediately below c.)

p is the first oscillator state, 2N bits wide. It's more complicated than a Marsaglia oscillator because its design is a product of evolution, but it contains the components of one.

A is the first Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D0, is (2^512-(LMD7_D0*(2^32))). LMD7_D0=0xD1AEF329.

L is the array of (j+1) N-bit words whose hash to evaluate. (In all cases, L is 2^15 bits long. It's just a question of how wide we consider a word to be, and thus how many words L contains.) Note that L is indexed from 1 through 64 in Mathematica, but from 0 through 63 in the algo above.

q is the second oscillator state, 2N bits wide.

B is the second Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D1, is (2^512-(LMD7_D1*(2^32))). LMD7_D1=0xE5467E8F.

M is a xor mask, essentially a cryptographically generated nonce, which will mask the hash. (Encrypting the hash with this mask would appear to be overkill, but it's dirt cheap compared to encrypting and taking the hash of the payload, so why not.)

z is the hash (the digest) of L, which is 2N bits wide.

And here is a Mathematica example, using exactly the same seeds and hash as over here, which also appears in demo.c in the source code. Cut and paste into the Mathematica Kernel command prompt, without any carriage returns that this website might have injected. Then hit Enter to execute it.


The above code, when executed in the Mathematica Kernel, should produce:


Thus LMD7 involves 2 oscillators of large mutually prime period -- ((A*(2^511))-1) and ((B*(2^511))-1) -- starting half a block apart and rotating through all words in a block, with feedback moving from oscillator B to oscillator A after each step (but not in the other direction). Because a block is 2^15 bits long, and an LMD7 is 2^10 bits long, that means each oscillator iterates 32 times before the hash is issued.

Thus immediately after oscillator A processes word 0, oscillator B processes word 16; immediately after oscillator A processes word 31, oscillator B processes word 15. Thus in the weakest case, a given word is iterated through one of the oscillators only once, while being iterated 17 times through the other.

The main reason why LMD7 is about twice as fast as LMD6 is that the word width of the former is twice as long as the latter. In other words, while LMD6 only xors X with the plaintext, LMD7 xors both X and C. Normally, this would be unsafe because the high bit of a Marsaglia oscillator is biased. But in this case, the bias of each oscillator is 1 part in O(2^(512-64)), or O(10^134). Is that exploitable? Good luck.

One obvious requirement is that both oscillators come up to essentially maximum entropy in at most 17 iterations. That way, when one oscillator is at its weakest (i.e. a single digesting iteration remaining), the other is sufficiently strong to pick up the slack. To that end, LMD7_D0 and LMD7_D1 have been chosen so that the scintillating entropy profile of the oscillators ends up at 511 bit transitions after 17 iterations, in both cases. In other words, starting from 2 1024-bit random seeds which differ by only a bit, the xor of the 17-iterated versions of these seeds will itself contain 511 bit transitions -- trivially shy of maximum entropy. But there are 2 such oscillators, which after digesting the block, are combined by crossing high and low 512-bit parts, using carry-propagated addition, to yield a 1024-bit hash which is for all practical purposes unbiased. And by the way, this is where the factor of 2^32 comes in A and B: without this factor, the oscillator doesn't entropize fast enough. But with it, we can use simple 32-to-64-bit multiplies to rapidly iterate each oscillator.

In the interest of creating a compelling case for the security of LMD7, perhaps the most important consideration is demonstrating that setting a single 1 bit in a block consisting of all 0s produces a change to the hash of such complexity as to appear random. This "change" is what I call the
"xor compensator" of the 2 hashes, because when this value is xored to the hash of the poisoned block, it morphs into the hash of the legitimate block. Statistically speaking, this means that if we do 2^2N such tests, and thereby generate as many xor compensators, that we should expect to encounter (R*(2^2N)) unique xor compensators, where R is the reflexive density constant.

(By the way, xor is obviously not the only way one could compensate a hash. Addition, modulo multiplication, and an infinite number of other functions are candidates as well. But xor tends to be a good proxy for addition: if the xor compensator shows dangerous skews, then chances are that the addition compensator does so as well, and vice versa. Bias in other compensator spaces are an interesting open question, and I don't pretend to be proving anything in this regard.)

The only minor problem, of course, is that 2N=1024. This forces us into a regime of extrapolatory algogenesis, which is about the best we can do, absent a proof of complexity, which has thus far eluded all hash algos ever invented. We can always to a more thorough job in this regard, but 2 cases are explored below, one with N=8 and the other with N=9. In other words, we are scaling LMD7 down from a 1024-bit hash to a 16-or-18-bit hash, repsectively. The algo is otherwise unchanged, including the number of words per block, 64.

Mainly, we're looking for evidence that setting a single 1 in a block of 0s with 2^2N random seeds creates a xor compansator set which is compliant with R, as described above. Any more complicated change to any more complicated block might lead us to unjustifiably optimistic assumptions about the strength LMD7.

The Marsaglia A and B values in this case will be (t-46) and (t-52), where t=2^8, as you can see in the code below. (By the way, (A*(2^8)-1) and  (A*(2^7)-1) are both prime as required, and similarly for B.) So here's the Mathematica code which will do this (remove any carriage returns injected by this blog website, then paste into the Mathematica Kernel command prompt). (I realize that the code looks horrible, which is because it's only intended for verification purposes; the C is much more elegant, I promise!)


The output might take 10 minutes to generate. It looks like this (but might slightly differ from yours, on account of the RandomInteger[] function being nondeterministic):




(The W's are just progress indicators, and the "16"s are just Mathematica's obnoxious way of telling us that the result is in hexadecimal.) The important part is at the bottom: This was a 16-bit hash (because A and B both have 8 bits), which as you can see from the W's, we've tested 2^16 times. On each test, we hashed a block of 0s against the same block with a randomly selected bit set to 1, starting from the same random seed in both cases. We then xor the resulting hashes. Finally, we set the bit corresponding to this xor-of-hashes (Z) in a table (K) of 2^16 xor compensator population counts, initially all 0s.

When the experiment is done, the fraction of nonzeros we expect to find in the table is R, or 0.632121... As you can see, this hash performed well, but not great, coming in at 0.61. That means that the xor compensator is modestly biased, given a single bit flip in the simplest possible plaintext configuration, which is a block of 0s. You could say that we're measuring the "ground state entropy" of the hash.

As to population_max_density_log2, that's the log2 of ((2^2N)/population_max), or log2(65536/8) in this case. This represents the log2 rarity of the most popular xor compensator. It's subject to a lot of noise, but it functions like a canary in a coal mine: if it's too small, then the most popular xor compensator can be easily guessed by examining test cases involving known seeds. For example, if it were 3, that would mean that 1 in 8 xor compensators were the same value. 13 is probably fine for a 16-bit hash.

In fact, what is the statistically expected value of the maximum population? Combinatorics is informative:

Let Q be the probability of one particular integer in the population map (K, in the code above) being incremented ("hit") T times, i.e. of the corresponding xor compensator occurring T times, assuming that the xor compensators are distributed randomly because the hash is "hard".

Let U=2^(2N), where 2N is the number of bits in the hash.


Q=((1/U)^T)*((1-(1/U))^(U-T))*(U nCr T)


((1/U)^T) is the probability of being hit T times, at exactly the iteration numbers predicted.

((1-(1/U))^(U-T)) is the probability of not being hit (U-T) times, at exactly the iterations numbers predicted.

(U nCr T) is the number of possible combinations involving T hits over U attempts.

So then:


is the probability that at least one integer will be hit exactly T times.


(1-Q) is the probability that a given integer is not hit exactly T times.

((1-Q)^U) is the probability that no integer will be hit exactly T times.

So (1-((1-Q)^U)) is the probability that at least one integer will be hit exactly T times.

If P=0.5, then T is the expected maximum population. Since T is discrete, we don't expect to hit P=0.5 exactly. That's OK, so long as T ends up in a state which is "reasonably probable". T=1000 with N=8, for example, would not be reasonably probable. That would be bad because it would suggest that the xor compensator could easily be guessed based on statistical analysis involving a random set of known seeds.

So in this case, we have T=8 and U=2^16. Find P with Mathematica:


which produces:


Or if you like WolframAlpha, which I do because I'm a nerd:*%28%281-%281%2F65536%29%29%5E%2865536-8%29%29*Binomial%5B65536%2C8%5D%29%29%5E65536%29

In plain English, the probability of discovering at least one population of 8 in the map is about 44% -- so our result is utterly unremarkable if we assume that the hash is hard. We can't prove the converse, but so far, so good.

p_reciprocal_log2 is just Log[2,1/p], which is just a logarithmic way of looking at p.

Let's kick it up a notch, as Emeril Lagasse would say. Try N=9, meaning an 18-bit hash, with A=(t-17) and B=(t-47), where t=2^9 and the usual primality requirements are satisfied:


...and the results:


Thus R is closer to R_ideal. This isn't an accident. In part, it comes from the fact that we're using less biased oscillators, i.e. A and B values which are proportionally closer to 2^N in with N=9 than N=8. You can imagine how this would work out if the bias were 1 part in O(10^134), as with LMD7: R should be indistinct, in practice, from R_ideal.

And what about population_max_density_log2? This time, N=9 but still T=8:


which produces:


Which means it's 90% likely that we'll find at least one population of 8 (i.e. T=8) in the map -- a tad high, when we're looking for something nearer 50%. But it turns out that T=9 has a 23% chance of the same. So T=8 is unsurprising.

What this all points to, thus far, is that the xor compensator appears to be randomly distributed, i.e. "mini LMD7" is hard. Extrapolation suggests, then, that LMD7 is also hard, as measured by the same tests.

But it behooves us to pound harder on the most weakly protected bit in the block.

The weakest bit in the block is probably bit ((2^14)-1), as measured on average over a wide variety of seeds. This bit is weakly protected because it's the last bit to be integrated into the B oscillator, and is thus subject to the least number of entropizing iterations after the fact. Additionally, while the state of the B oscillator feeds into the A oscillator (via xor at the top of the loop), the reverse isn't true, so B probably cannot entropize faster than A. We return to the N=8 case, with the difference that we'll only compare a block of 0s to the same block with bit ((2^14)-1) set ("L[[(j+1)/2]]=BitShiftLeft[1,n-1]"):


The results are somewhat disturbing:


R is badly skewed away from R_ideal. And population_max is also elevated, with a P value of 1 part in 6x10^(-4). So the xor compensators do not appear randomly distributed. On the other hand, R is still large enough to create intractability when N=512, and population_max is still small enough to be safe, in the sense that population_max_density_log2 exceeds N, which means that a executing a birthday attack on the hash is still easier than guessing the correct xor compensator, even if the correct answer happens to be the most popular one. In other words, the xor compensators have a density which is (exponentially) larger than the square root of the hash space -- even in this uberweak case.

So do things improve when N increases to 9? Let's try:


The results, averaged over 2^18 trials, are much more consistent with randomness:


T=8 at N=9 is, as discussed above, utterly unremarkable. So the statistics of the system are indeed starting to whiten (sic) out.

Results may vary, and it's probably true that, given enough computer power, we would observe scintillating entropy on the way to N=512, but it appears that the LMD7 xor compensator space is asymptotically random. In other words, LMD7 would appear to be hard -- even if left unencrypted (which is not recommended, out of an abundance of caution) -- to the extent that its seeds are opaque. But prove me wrong if you can.

Friday, February 22, 2013

Harder than Diehard: The Scintillating Entropy Test

With apologies to the late great George Marsaglia, whose Diehard Tests are a cornerstone in the quest for credible pseudorandomness...

The Second Law of Thermodynamics, which has more to do with math than physics, states that "the entropy of an isolated system never decreases", to quote Wikipedia. But this law applies to physical systems under the classical assumption of continuity among infinite particles, as distinct from quantization among finitely many particles. A Marsaglia oscillator is definitely the latter variety, its state space being constrained by the number of bits in its iterand.

We all know that such finite discrete systems much eventually fall into a limit cycle. Indeed, one of the beauties of Marsaglia oscillators is that their limit cycle lengths can be predicted exactly without actually running an entire cycle.

So what this means is that if we start with a value on the limit cycle, that we will eventually return to the same value, given enough time. Returning to the thermodynamic metaphor, it's as though you have an egg on the table. It then rolls off and splats on the floor. The next day, you find that the egg has suddenly cleaned itself up and reassembled itself on the table.

What would one expect to see on the way to the egg splatting? First, the speed of the egg would increase. Then, as it hit the floor, cracks would spread throughout the shell, migrating from the point of impact, wider and wider, until the shell loses structural integrity, and the yolk spews out all over the place. In other words, a monotonic increase in entropy from start to finish, straight in tune with the Second Law.

We expect to see the same in a Marsaglia oscillator. The trouble is, we know that, eventually, the system will return to its original seed, provided that the seed is indeed on a limit cycle (which is usually the case). How does this occur? Does the egg go from complete splat to happily sitting back on the table, instantaneously? Not exactly. Believe it or not, what we observe is entropy scintillation. Just like a star on the horizon, which rapidly dims and flares with changing atmospheric phenomena, the entropy comes, and goes, then comes back again. (I'm not referring to the fact that the scintillation of a star can be an entropy source, which is clearly true; I'm saying that the degree of entropy itself is scintillating.)

We can see this in practice. Here's one good way to measure entropy accrual in any N-bit integer oscillator:

1. Pick a random N-bit seed from a uniform distribution. Call it S0.

2. Copy S0 to S1.

3. Flip a bit of S1 at random, so S0 and S1 differ by a bit.

4. Iterate S0 and S1 separately, but after each stage, take their xor. (This is called the "xor compensator".) Count the number of 0-to-1 or 1-to-0 transitions, T, in the xor. (Consider the N bits to be arranged in a loop, so that the maximum value of T is N, not (N-1).) [EDIT: On second thought, it's probably better to think of the xor compensator as having bit (negative 1) as 0. That way, (all 1s) is considered to have 1 transition, which is more entropic than (all 0s), which has 0 transitions. (And moreover, the number of transitions can now be odd as well as even.) But this doesn't change the results much, apart from making them a little more accurate.]

So after step 3 and before we iterate at all, we expect to find 2 transitions in the xor, namely, one 0-to-1 and one 1-to-0, because the xor must contain a single 1 bit, corresponding to the bit that was flipped in #3. After each iteration of this oscillator, we should expect to find more and more transitions. If we run many trials from #1 onwards, then, after a certain number of iterations, the average number of transitions should be N/2. (This is easily demonstrated by substituting a true random number generator for the pseudorandom iterator.) This comes from the fact that a TRNG has a 50/50 chance of generating the next bit different from the current bit.

In a perfect TRNG, we hit 50/50 after the first iteration, because there is no iterator! With a pseudorandom iterator, however, we expect to have to "rev it up", like a car starting on a cold day, only approaching the N/2 threshold after so-many iterations. The natural implication is that, if we simply iterate a few times between producing numbers, instead of just once, that we can approach the N/2 limit as closely as desired.

Unfortunately, Marsaglia oscillators (and, I suspect, various other wannabe random iterators) don't behave like this. Yes, they do approach the N/2 limit in 10 or 20 iterations. But then they back off, while the egg puts itself (somewhat) back together again! What we have, most certainly, is scintillating entropy. This is a smoking gun for pseudorandomness.

Let's look at the data. I happen to be investigating a 512-bit Marsaglia oscillator with B=(2^256) and A=(B-0xE66FAD12). So perfect entropy means averaging 256 transitions in the xor of the iterands. (Too many transitions are just as bad as too few: 01010101... is not very entropic!) However, that's not what happens, as you can see below. Entropy maxes out after 10 iterations, then backs off, only to return 6 iterations later:

0: 2
1: 11.0839
2: 37.1358
3: 68.3862
4: 96.3759
5: 125.272
6: 154.67
7: 181.451
8: 205.462
9: 232.128
10: 255.925
11: 264.856
12: 265.29
13: 263.62
14: 259.082
15: 257.353
16: 256.811
17: 255.263
18: 257.525
19: 255.262
20: 249.468
21: 249.762
22: 253.123
23: 256.236
24: 258.244

(The "2" next to "0" (iterations) is just a sanity check. It's correct, meaning that we do indeed always begin with 2 seeds which differ by a single bit.) Now perhaps you're thinking "These numbers are all pretty close to 256 after the 10th iteration. The error is just noise." But you would be wrong: I seed the program with the Jytter TRNG every time, but I get (very nearly) the same numbers, with maximum entropy after iterations 10, 16, 19, and so on. Remember that I'm starting from random values. It's incredible that this pattern exists at all, let alone so many standard deviations out after millions of tests starting from different seeds. What you see above is the Marsaglia system genuinely reversing its entropy, to some extent. (Yes, it's true that my definition of entropy as bit transitions is by no means the ultimate definition, but I think you'll see some form of scintillation, which persists when averaged over many seeds, no matter how you choose to quantify entropy.)

But this should not be surprising, because after all, we know that the oscillator must eventually return to its point of origin, to the extent that such point is on the limit cycle. It would be more surprising if it did so as a singular event, as opposed to the culmination of entropy scintillation throughout the entire cycle.

A TRNG should hit the N/2 limit after a single iteration. Any oscillator which fails this test, remaining defiantly aloof from N/2 despite many trials, is a pseudorandom generator or a broken TRNG. The tantalizing hypothesis is that this remains true, regardless of how many "revving" iterations the generator is allowed. (A really smart PRNG would know exactly where its "entropic nodes" were, and produce numbers at only those points, for example, 10 and 16 and 19 above. But then we would then need to worry about correlations between iterations 10 and 16, etc., which is why this test is so diehard.)

I wonder if this has any security ramifications to existing infrastructure.

But more to the point, I wonder if this has physical equivalents. Perhaps physical systems can oscillate in and out of chaos, despite having essentially no net energy exchange with the environment. Meteorologists say that there are clouds in the sky because energy from the sun drives the processes which form them. But could clouds exist if the atmosphere were a closed system? Could we oscillate between periods of uniformally distributed gasses, and organized thunderstorms? If we view the entire atmosphere as a cellular automaton, of the sort described in A New Kind of Science, then I think the answer would be yes...

Friday, February 15, 2013

The Utility of Extrapolatory Algogenesis

As I sort of implied this post, I think there's something missing in the discussion of allegedly secure systems, whether related to cryptography or authentication: extrapolation. (This technique is also applicable to a wide variety of other algos having nothing to do with security.)

Let me explain what I mean by this. I think most cryptographic algogenesis occurs empirically, in other words, we change our evolving algo to do something that makes things look more random, then run some correlation analyses on "these" integers vs. "those" integers. Finally, when the Poisson distributions look good, it must be a winner!

The whole problem of course, is that we humans are idiots. We see entropy in everything we don't understand. There may be some very simple way to crack these systems, but we can't see the forest because the trees are blocking our view.

Wouldn't it be cool if we could see the entire forest and how it interacts with all of the trees, in some exhaustive sense? In other words, look at entire classes of attacks, all at once? For instance, when I change a single bit of a message, then, given every possible every hash seed, what is the set of possible xor compensations which will legitimize the hash? There are plenty of other, similar questions, but they all ask the same basic thing: what are all possible ways to do operation X, given all possible seeds or keys?

Unfortunately, even with a quantum computer on hand, you would be hard pressed to answer any such question in a useful way because the state space is so large as to defy gathering statistics. In other words, the space may be much narrower than you think, but for practical purposes, 2^100 might as well be 2^1000.

On the other hand, extrapolation is a very powerful tool. More often than not, it seems, the properties of large mathematical systems can be extrapolated from small ones, even when we don't fully understand them. This isn't always true, of course. The history of mathematics is peppered with extrapolations that failed in spectacular fashion. But such eventualities would appear to be the exception.

What I'm advocating, given that by definition cryptosystems are too vast to exhaustively analyze, is that we rely on the method of extrapolatory algogenesis to produce them. It means what it sounds like: extrapolate the properties of a big algo from a small one. Want to produce a 512-bit asymmetric key exchange algo? Fine. Then start with an 8-bit variety, and study every possible key pair. Then move to 9 bits, then 10, etc., until your cloud computing bill is threatening to bankrupt you. At each stage, ask key questions, such as the one above, which test the strength of the system in a variety of commonsense ways. The answer to each question reveals a certain property of the algo, for instance, what is the distribution of the various limit cycle lengths involved?

Next, create a spreadsheet, showing the various properties with an 8-bit seed or key, then 9 bits, etc. Is the complexity scaling exponentially with the number of bits, or more slowly than that? How noisy is the data? Can you use statistical methods, perhaps on the raw numbers or their logarithms, in order to extrapolate out to your 512-bit target? Once you do the extrapolation, what does it tell you? Is your algo "hard" enough?

There's no guarantee that this will work. The extrapolation might be invalid, or, more likely, you will fail to ask all the "right" questions. But what this process can do, is reveal some hidden weaknesses. You can then modify the algo, extrapolate the properties again, and iterate as such until a satisfactory algo is obtained.

Once you have this 512-bit masterpiece, then what? For one thing, you can fall back on the old methods of differential cryptanalysis -- mainly, looking for Poisson distributions where you expect to find them, of the quality that a perfect random number generator would produce. Compare "these" integers to "those" integers, ad nauseum.

The result can only be better than what the last step alone can produce. Can you improve your own favorite algo using this technique?

Landauer's Principle vs. Classical Computability

In the quest for a more secure hash or encryption algorithm, the question naturally arises, how many key bits are enough? Is there a key which is physically impossible to crack?

The short answer is "it depends on whether P=NP". Even then, some algos will be weaker than others, meaning that they would require longer keys to achieve comparable security.

There is, however, a physical limitation here which is informative, namely, the number of bits which we could possibly flip on the way to discovering a solution. Landauer's Principle tells us that the minimum amount of energy required to flip a bit (at least, in a classical computer) is:

kT ln 2

where "where k is the Boltzmann constant (approximately 1.38×10−23 J/K), T is the temperature of the circuit in kelvins, and ln 2 is the natural logarithm of 2 (approximately 0.69315)", according to Wikipedia.

But what about this "circuit temperature"? Considering that the average temperature of the universe is not less than that of the microwave background, or about 3 kelvin, we already have a lower limit on the energy expenditure required, which is about 3x10^(-23) joules. (You could use a cryostat to achieve a lower temperature, but it wouldn't help you, because you would just heat up the universe elsewhere.)

And just how much energy is available to expend? Definitely not more than the total energy of the universe, which WolframAlpha puts at 2x10^69 J. (This value includes the energy equivalent of the mass of the universe, via Einstein's famous equation, E=mc^2. Nonetheless, it's obviously subject to considerable error, and probably fails to account for the vaccuum energy, dark energy, and who knows else. Still, this is a logarithmic analysis, so we can afford to be somewhat cavalier.)

The ratio of these energies is then about 7x10^91. The log to base of 2 of this value is about 305.

What this means, then, is that the longest key need only be 305 bits, because we couldn't count to a larger number, given all the energy in the universe. This upper bound is only valid, of course, if we assume that the fastest way to crack a given algo is to try every key. (We would likely need try only half the keys, so the key should be 306 bits long. Whatever.)

The other question is whether Landauer's Principle pertains to quantum computers, and more importantly whether a useful quantum computer can be constructed. I suspect that the answers are no (because quantum computations are required to be reversible, whereas the above analysis pertains to the opposite, and fundamentally because the Heisenberg Uncertainty Principle allows quantum systems to "borrow" energy) and yes (because it's "only" an engineering problem). So perhaps this upper bound of 306 bits is just a thermodynamic fantasy.

At best, it appears from the literature that a quantum computer can square root the complexity of a given algo (and sometimes cannot help at all). If this concept relates to energy expenditure as well, then perhaps the equivalent limit for quantum computation would be 612 bits. But I have no idea...

Tuesday, February 12, 2013

The Birthday Attack Myth

[EDIT: Rewritten because the first version sucked.]

In cryptographic circles, it's often said that a hash is at best as strong as the square root of the size of its state space. In other words, assume that we have a hash consisting of 2N bits. Then the number of possible hashes is at most H=2^(2N). The difficulty of "cracking the hash" -- in other words, finding any 2 different messages with the same hash, known as a "hash collision" -- is then no harder than O(2^N). This is the fundamental theory behind a birthday attack. (This might allow someone to "prove" that bogus credentials were legitimate, for instance.)

The birthday attack myth is that this complexity estimate applies to the execution time (in processor-seconds), and not only the operation count. It does not, for the reasons that follow.

The first problem is one of practicality: the attack only predicts the creation rate of messages with equal hashes -- not the creation rate of messages with a specific hash, which would cost O(H), assuming an optimally hard hash. So this is not a useful tool for modifying existing authenticated messages whilst ensuring that the hash still works out. Granted, to the extent that (2^N) messages are available in the wild as targets for malicious modification, then generating an additional (2^N) fake messages would be moderately likely to result in one of them having the same hash as one of the originals, the exact probability being given in the article linked above.

The second problem goes to the heart of the complexity question: interprocessor communication. There are 2 extremes to consider, with other configurations lying somewhere in between:

At the serialized extreme, assume we have a single processor. The fastest way to solve the problem is probably to sort the list of (2^N) hashes (in O(2^N) time, with a radix sort). Then we can scan through the sorted hashes, looking for one which occurs twice, again in O(2^N) time.

At the (classically) parallelized extreme, assume we have a processor for each hash, and that each processor is connected to every other. Then the hashes could sort themselves by moving to their expected position in the sort, and adjusting dynamically. A given hash might find its proper place in the list in O(N) time (probably worse on account of network congestion). In any event, this is certainly much faster than the serial case, but the computational complexity is actually greater, because we're using many processors inefficiently in order to run faster than one processor used efficiently. It's O(N*2^N), at best.

Thus we can't expect to do better at minimizing the computational complexity than to stick with the serial case. But that's fine because, as the birthday attack predicts, it's O(2^N), right?

Well, not exactly. The operation count is indeed O(2^N). But what if the time that an operation takes is itself a function of N? Then the execution time must be disproportionately larger.

The problem is one of memory size. The larger the number of hash samples, the larger the memory, in a physical sense. Memory latency is a function of the time it takes for a signal to traverse a 3D memory, which is O((2^N)^(1/3)), assuming that all the samples are packed into the smallest possible space, which is approximately a sphere.

Thus the execution time is O(2^N) times O((2^N)^(1/3)), which is O((2^N)^(4/3)). (Parallelizing the system will reduce the "wall clock" execution time, but not the execution time measured in processor-seconds. Moreover, the same (2^N)^(1/3) factor applies to the parallel case as well, because hashes must fly a certain distance before arriving at their sorted locations.)

By the way, given that the rate of radiation of a sphere is proportional to its surface area rather than its volume, the maximum degree of parallelism would occur with all the processors packed tightly on the surface of a sphere. Any attempt at tighter packing (using the interior of the sphere) whilst running all processors in parallel could only be sustained at such lower rate of computation as to render the topology performance-equivalent to the surface case. (This thermodynamic point bears uncanny resemblance to the Holographic Principle's implication that the information content of a black hole is (merely) proportional to its surface area.) The upshot is that the interprocessor communication latency given above is underestimated, in the regime of maximum efficiency packing of processors executing at maximum sustainable speed.

As to quantum computers, I was originally hopeful that Grover's algo could reduce the execution time. On second thought, I don't think so. The problem is that the hashes must either be loaded into the quantum computer from classical media or they must be generated from candidate messages produced inside quantum superposition. But multiverses in a superposition can't communicate, so far as I know. So in each case, we would have to search for each hash via classical media. So there would be no complexity reduction in either case.

The point of all this is simply that the computational complexity of a birthday attack is exponentially less than its predicted execution time in processor-seconds, and that this would appear to hold true even in the quantum computing regime. Because the latter metric is what matters in the physical world, the implication is that hashes are actually stronger than supposed, independent of the question of weaknesses unrelated to birthday attacks.

Saturday, February 2, 2013

The Marsaglia Limit Cycle Theorem

Or something like that. This is really the "Marsaglia Limit Cycle Theorem for LMD2 and Later". I don't know what else to name it. Anyway here's the proof that you always end up in the bright region as opposed to the dark region when considering Marsaglia oscillators which have  (2^(N-1))<=A<(2^N) with X and C both being N bits, implying an oscillator state of 2N bits. I could probably make it more succint, but I think you'll get the idea.

Again, we're talking about multiply-with-carry oscillators, which I prefer to call "Marsaglia oscillators".

Consider again S, the seed of cycle 2. The theorem says that S is the maximum integer which lives on the steady state part of the Marsaglia oscillator's cycle, in other words, the largest integer that you can produce more than once when starting from an arbitrary integer.


First of all, it should be clear that any integer I on [0, S] will iterate to another integer on [0, S]. The reason is that the maximum X value in this range is ((2^N)-1). If you multiply that by A, you get ((A*(2^N))-A). But C is by definition at most (A-1), so you can never actually get the new C greater than (A-1). So if we start in the bright region, we stay there.

Now in the dark region, all the integers are on [S+1, (2^2N)-1]. If we iterate the maximum value on this interval, which happens to be the one which produces the maximum possible output, we get ((A*((2^N)-1))+((2^N)-1)), or ((A+1)*(2^N)-(A+1)). Thus the maximum new C value is just A. We're almost into the bright region.

Let's iterate again. If the maximum possible X value after the first iteration is ((2^N)-1), then AX is going to be (A*((2^N)-1)), or ((A*(2^N))-A). If the maximum new C value after the first iteration is A, then it looks like we can end up with (A*(2^N)), after adding it to AX. But we can't. Because when that first new C value is equal to A, then its associated X is ((2^N)-(A+1)). Since A is at least (2^(N-1)) based in all the LMDx oscillators after LMD1, X must be less than A. Which means that after the second iteration, the maximum new C is now (A-1). Thus we're in the bright region, after at most 2 iterations.

Friday, February 1, 2013

Information Loss in Marsaglia Oscillators

Now that you've seen the pretty pictures in the previous post, it should be intuitively clear how many bits of information are lost when one iterates a uniformally distributed random number through a Marsaglia oscillator an infinite number of times (and, I think, at most twice). In simple terms, we lose the dark region shown in the images. How many bits is that worth?

Consider S, the seed of cycle 2:


Recall that S iterates back to itself. I strongly suspect that every integer greater than S is in the dark region. It seems intuitive if not obvious, and agrees with my observations, but I admit that I don't have a proof. [EDIT: Now I do.] This all looks to be quite simple, but perhaps we have another Riemann Hypothesis on our hands. (I doubt it, but arrogance makes fools of mathematicians.)

If you accept this, then it's easy to calculate the average information loss L, in bits, of starting with a randomly chosen seed (x, c) and then iterating the oscillator until it hits a limit cycle (including the trivial cycles 1 and 2):

L=log2((all possible seeds)/(all residual seeds))

To the extent that (2^(N-1)) <= A < (2^N), then we have 0 < L <= 1. Since L is independent of N, this means that we can render the loss negligible by setting N sufficiently large. All this seems to be a reasonable compromise in light of the apparent quality of randomness provided.

Visualizing Marsaglia Oscillators

In the question of assessing the pseudorandom quality of a Marsaglia multiply-with-carry oscillator with 2N bits of state, of the sort underlying LMDx after LMD1 [EDIT:  As in, (2^(N-1))<=A<(2^N)], it occurred to me that pictures might be of some use.

To do this, I took various A values that are possible with N=9. (By "possible", I mean that the period (P=(A*(2^(N-1))-1)) is a Sophie Germain prime.) Then I created a map of the entire cycle space. It's pretty clear that only 4 cycles exist in such an oscillator:

1. 0 maps to itself.

2. ((A-1)*2^N)+((2^N)-1) maps to itself. This is a "nuisance cycle" because it's easily overlooked and threatens to shut down the oscillator if chosen as a seed.

3a. One of the big cycles has period P.

3b. There are some integers greater than or equal to (A*(2^N)) which eventually find their way onto cycle 3a, but never return to 3b due to limits on the maximum possible output value within 3a.

4a. The other big cycle has period P as well.

4b. Analagous to cycle 3b, there are points which degenerate into to cycle 4a as well.

The number of integers in 3b and 4b is empirically similar, but not equal. I can't explain the difference.

You can see all this in the images below, which have A values of 495, 465, 459, 402, 390, 369, 297, and 270, respectively. There is a bright area, which contains 3a and 4a, and a dark area, which contains 3b and 4b. The upper left and lower right corners of the bright area are white pixels indicating cycles 1 and 2. The rest is red and blue, for cycles 3 and 4.

These images show clearly why these oscillators suffer from bias in the high bit: once you jump from the dark to the bright region, you can't return. One way to deal with that is to engineer A so that the dark area has negligible size.

And I don't know about you, but I'm seeing squarish cells and somewhat nonrandom lower frequency oscillations in the images. Still, it looks very solid for such a simple function. (They are losslessly encoded 512x512 GIF images, so there are no JPEG artifacts here. Right-click and "open in new window" if your browser is scaling them in some annoying way, which mine appears to be doing.)

Thursday, January 31, 2013

The Reflexive Density Theorem

This proof solidifies some of the earlier observations of the reflexive density constant, a necessary-but-not-sufficient requirement in the evaluation of random number generator security.

Given T, an unbiased true random number generator, which can produce 1 of N unique integers; and S, a set consisting of N such sample outputs. Then the expected number E of unique integers in S, in the limit that N approaches infinity, is given by:

E = (1-1/e)N
E = RN

Here is the proof:

Construct a bitmap B, consisting of N 0s. Obtain S by performing N invocations of T. For each integer in S, set a bit in B corresponding to the sorted order of all possible outputs of T; that is, bit 0 would correspond to the least such integer, and bit (N-1) would correspond to the greatest. (In practice, the possible outputs would be 0 through N-1, where N is a power of 2. But that's not required here.)

Because T is unbiased, the probability of any given bit ending up as 1 is uniform throughout B. So consider one of these bits in isolation.

After the first invokation of T, the odds that this bit is 1 is just 1/N. Put another way, the odds that it's 0 is (1-1/N).

After the second invokation, the odds that it's still 0 are: the odds that it was still 0 after the first iteration, times another factor of (1-1/N). Thus the odds that it's still 0 are (1-1/N)^2.

We can continue this logic over all N invocations of T. The odds that this bit ends up as 0 is then ((1-1/N)^N). Thus, in the limit of infinite N, the odds R that this bit is 1 is just 1 minus this quantity:

R = lim(N->infinity,1-((1-1/N)^N))
R = 1-1/e
R ~ 0.6321205588285576784044762298385391325541888689682321

This last step was computed here from everything that WolframAlpha knows about math. Due to the symmetry of the problem, the "oneness" of any such bit applies to whole of B, so RN is indeed the expected number of 1s.

It turns out that the reflexive density constant shows up in lots of places. For example:

1. Neural network stability at equation 6.27.

2. The exponential cummulative distribution function.

3. The Poisson distribution, if we were to add up the 1s at each bit of B, instead of ORing them.

4. The Random Arrival Time Best-Choice Problem on p. 499.

5. Some other discussion about it which shows that I wasn't the first to prove the reflexive density theorem, although hopefully you find my proof to be intuitive.

Wednesday, January 30, 2013

Mathematica Source Code for LMD4, LMD5, and LMD6

It's been brought to my attention that the C source code provided in this blog post isn't exactly well commented. I apologise for that. I think the most instructive way to show what's going on, however, is to use Wolfram Mathematica, specifically the Mathematica Kernel, which is a command line interface for math nerds. I have used this wonderful piece of software to verify the functionality of these hashes. But in the interest of reproducible research, you should be able to do the same.

So let's do an example. I'll give you the exact code you need to cut and paste in, and the expected output. Here again is the algo which underlies all 3 hashes:

FOR (i=0 to j){
  x=x xor y
  p=A*(x xor L[N, i])+c
  x=p MOD 2^N
  q=B*(y xor L[N, i xor ((j+1)/2)])+d
  y=q MOD 2^N
p=p MOD 2^(2N)
z=p xor M

And here again is a summary of what stuff means, in order of appearance above:

j is the maximum index of the block, each item of which is N bits wide.

N is half the bit width of the hash: 128 for LMD4, 256 for LMD5, or 512 for LMD6.

X0, C0, Y0, and D0 are the N-bit hash seeds, which are intended to be sourced from a cryptographic pseudorandom algo, and which is not used for any other purpose, nor ever used again. In the C source code, they appear in memory in the aforementioned order at p_base on input to the hash. Note that we have 4N bits going into a hash which will ultimately be only 2N bits wide.

x, c, y, and d are also N bits wide. (x, c) and (y, d) function in a manner similar to (x, c) in a Marsaglia oscillator. (x is at the memory address immediately below c.)

p is the first oscillator state, 2N bits wide. It's more complicated than a Marsaglia oscillator because its design is a product of evolution, but it contains the components of one.

A is the first Marsaglia oscillator's multiplier. For example, this is (2^128-2^125-2^110-2^100) in the case of LMD4.

L is the array of (j+1) N-bit words whose hash to evaluate. (In all cases, L is 2^15 bits long. It's just a question of how wide we consider a word to be, and thus how many words L contains.) Note that L is indexed starting from 1 in Mathematica, but from 0 in the algo above.

q is the second oscillator state, 2N bits wide.

B is the second Marsaglia oscillator's multiplier.

M is a xor mask, essentially a cryptographically generated nonce, which will mask the hash. (Encrypting the hash with this mask would appear to be overkill, but it's dirt cheap compared to encrypting and taking the hash of the payload, so why not.)

z is the hash output (the digest) of L, which is 2N bits wide.

In this example, we'll set things up like this:

L={1, 2, 3... (j+1)}

(M is 0 because nothing exciting happens with it anyway.) So what we're doing is just taking the hash of a series of incrementing integers, using seeds that also happen to be incremental. If you want to do something more complicated, you can modify the code to do so. Just Google around for the primitives I'm using, for example "mathematica array" if you want to learn how Array[] works,

OK let's write all this in a way that you can paste into Mathematica. (They probably have a free trial if you don't own it.) Note that all these text strings are intended to be interpreted as a single line, so if your text editor inserts line breaks, then remove them before pasting (Ctrl+V) into the Mathematica Kernel.

After the fact, you can print out individual values, for example the data block, L. (Just type "L".)

LMD4 Input


LMD4 Output


LMD5 Input


LMD5 Output


LMD6 Input


LMD6 Output


The encryption engine is left explicitly unspecified because these hashes are designed to satisfy the requirements for high entropy, high bit flip sensitivity, and irreversibility -- nothing more than that. LMD6 is used in the Karacell encryption algo, for instance. Karacell is the main motivation for putting these examples on the Web. Now you have readily accessible reference code, so that you can try to find serious weaknesses in them. Good luck!