Friday, February 15, 2013

The Utility of Extrapolatory Algogenesis

As I sort of implied this post, I think there's something missing in the discussion of allegedly secure systems, whether related to cryptography or authentication: extrapolation. (This technique is also applicable to a wide variety of other algos having nothing to do with security.)

Let me explain what I mean by this. I think most cryptographic algogenesis occurs empirically, in other words, we change our evolving algo to do something that makes things look more random, then run some correlation analyses on "these" integers vs. "those" integers. Finally, when the Poisson distributions look good, it must be a winner!

The whole problem of course, is that we humans are idiots. We see entropy in everything we don't understand. There may be some very simple way to crack these systems, but we can't see the forest because the trees are blocking our view.

Wouldn't it be cool if we could see the entire forest and how it interacts with all of the trees, in some exhaustive sense? In other words, look at entire classes of attacks, all at once? For instance, when I change a single bit of a message, then, given every possible every hash seed, what is the set of possible xor compensations which will legitimize the hash? There are plenty of other, similar questions, but they all ask the same basic thing: what are all possible ways to do operation X, given all possible seeds or keys?

Unfortunately, even with a quantum computer on hand, you would be hard pressed to answer any such question in a useful way because the state space is so large as to defy gathering statistics. In other words, the space may be much narrower than you think, but for practical purposes, 2^100 might as well be 2^1000.

On the other hand, extrapolation is a very powerful tool. More often than not, it seems, the properties of large mathematical systems can be extrapolated from small ones, even when we don't fully understand them. This isn't always true, of course. The history of mathematics is peppered with extrapolations that failed in spectacular fashion. But such eventualities would appear to be the exception.

What I'm advocating, given that by definition cryptosystems are too vast to exhaustively analyze, is that we rely on the method of extrapolatory algogenesis to produce them. It means what it sounds like: extrapolate the properties of a big algo from a small one. Want to produce a 512-bit asymmetric key exchange algo? Fine. Then start with an 8-bit variety, and study every possible key pair. Then move to 9 bits, then 10, etc., until your cloud computing bill is threatening to bankrupt you. At each stage, ask key questions, such as the one above, which test the strength of the system in a variety of commonsense ways. The answer to each question reveals a certain property of the algo, for instance, what is the distribution of the various limit cycle lengths involved?

Next, create a spreadsheet, showing the various properties with an 8-bit seed or key, then 9 bits, etc. Is the complexity scaling exponentially with the number of bits, or more slowly than that? How noisy is the data? Can you use statistical methods, perhaps on the raw numbers or their logarithms, in order to extrapolate out to your 512-bit target? Once you do the extrapolation, what does it tell you? Is your algo "hard" enough?

There's no guarantee that this will work. The extrapolation might be invalid, or, more likely, you will fail to ask all the "right" questions. But what this process can do, is reveal some hidden weaknesses. You can then modify the algo, extrapolate the properties again, and iterate as such until a satisfactory algo is obtained.

Once you have this 512-bit masterpiece, then what? For one thing, you can fall back on the old methods of differential cryptanalysis -- mainly, looking for Poisson distributions where you expect to find them, of the quality that a perfect random number generator would produce. Compare "these" integers to "those" integers, ad nauseum.

The result can only be better than what the last step alone can produce. Can you improve your own favorite algo using this technique?

No comments:

Post a Comment