MD5 quine, fixed point
MD5 quines
Sometimes I let my mind wonder and I get crazy questions. Today was a good example, I encountered a MD5 hash and I started to wonder, would there be a hash which would (when hashed again) be the same?
Thus: MD5(x) = x
This would be a kind of MD5 quine, when fed into the algorithm you get the original value back. This is actually called an MD5 fixed point.
Information I’ve found
So I started investigating, soon I discovered this website about collisions. Its well known that all hashing algorithms must have collisions, you can’t always produce a unique hash for input larger then the output of course.
The output of the MD5 sum is 128 bit (16 byte) long, so the input should also have the same length. But the MD5 algorithm is defined to have 512 bits as input. This isn’t really a problem because the algorithm will extend smaller input with padding. Lets assume the MD5 sum of any input is uniformly distributed over all possible sums, then the probability that a 128-bit string is a fixed point is 1/2^128. This isn’t a crazy assumption because all hashing-algorithms are designed to distribute the output as uniformly as possible to avoid collisions.
So, the probability that no 128-bit string is a real fixed point is (1 – 1/2^128)^(2^128). The probability that there IS a fixed point is 1 – (1 – 1/2^128)^(2^128).
Since the limit as N goes to infinity of (1 – 1/N)^N is 1/e, and 2^128 is most certainly a very large number, this probability is almost exactly 1 – 1/e = 63.21%.
But, of course, there is no randomness involved here – there either is a fixed point or there isn’t. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same).
Looking for the fixed point
I’ve just implemented a small program to look for these hashes, even though I know it will take millions of years to check all the numbers. But you never know, I might get lucky ;-)
The first algorithm I created took a single random String as input and kept applying the algorithm to the output. Eventually it will:
1. Go into a loop
2. Find a fixed point (which is a loop of size 1)
The weird thing is, if it ends up in a loop of size 1… I’ve found two things. Not only a md5 fixed point, which creates itself after applying the algorithm. But also an input-value that produces this md5 as output, a collision!
Graph
Another interesting thing would be a complete graph of all md5 answers. Which loops can we find, which md5 has most collisions etc. But this would take eternity to calculate, even using all the machines in the world.
Open questions
- Are there loops? (It is possible there aren’t any loops at all…?)
- Are there loops of size 1, a.k.a. fixed points?
- Which/how many 128 bit combinations can’t be created with the input values?
- Which/how many collisions will you get with all the input values?
More thoughts, errors, solutions..?
6 Responses to MD5 quine, fixed point
Leave a Reply Cancel reply
Latest tweet
- RT @olly_richards: Oh, it's 22 years since Jim Henson died. Which means, The Saddest Photo In The World: http://t.co/Ua6qOsn4
Archives



Maybe you could also consult running a query on a Rainbow Table?
Interesting. Your approach reminds me of Collatz’s Sequence.
Actually, theoretically there can be multiple fixed points. But we need some clever algorithm to find that.
I don’t think you will find a loop.
Hashes are like irrational numbers, always new values.
But if you can find the nth term of an irrational number then you can find the expression that computes the hash value and find the nth term.
The problem of md5 quines is dealt with on this page http://www.madore.org/~david/computers/quine.html
he even gives the source code of one
iamreddave: The link is very interesting but not relevant. I’m not talking about programs that produce their own MD5 hashcode (which is what he does in the link). But a hashcode which, hashed again, turns into its own value.