Wednesday, February 8, 2012

TEncryption Hash Algorithm Part 2

In 2009, I made a post about turning a 4-bit hash algorithm into an 8-bit hash algorithm.

My idea was:
"My Password" --> 1100 --> {"11", "00"} --> {0100, 1011} --> 01001011


However, after reading it over, I realized that there was a gaping flaw. The probability of a collision wouldn't change!


Let's say "Password a" and "Password b" both hash to a 4-bit value of 0101. They would both hash to the same 8-bit value in my implementation. This is a major issue, because in theory, an 8-bit hash algorithm should have significantly less collisions (2^4 / 2^8 = 1/16 less chance of a collision).


So how to avoid this issue? Simple. Don't use the 4-bit hash as the basis for the 8-bit hash. Rather, use permutations of the password as the basis for the 8-bit hash. This can include reversing the password, truncating the password, or rearranging the password. This will, in theory, decrease the number of collisions.  


How about an example?

Previous implementation with collision:


"Password a" --> 1110 --> {"11", "10"} --> {1100, 1011} --> 11001011
"Password b" --> 1110 --> {"11", "10"} --> {1100, 1011} --> 11001011


New implementation with the same possible collision:


"Password a" --> {"Password a", "a drowssaP"} --> {1110,0110} --> 11100110
"Password b" --> {"Password b", "b drowssaP"} --> {1110,1000} --> 11101000


The point is just because one permutation of a password results in a collision with another password, with a good hash function, there is no indication that another permutation will also result in a collision.


Still, though, concatenation means that precomputing hashes of unsalted, known, commonly used passwords will help you figure out half the resulting hash is problematic. In the above example, the first four bits are 1110 in both passwords because of the collision. So to overcome that issue, you can salt the passwords with the username, or file name, or whatever, which should really be done anyway.


New implementation with salting


"Password a" --> {"Username, Password a", "Username, a drowssaP"} --> {1111,0110} --11100110
"Password b" --> {"Username, Password b", "Username, b drowssaP"} --> {0110,1000} -->01101000


New implementation with salting
(assuming "Username, Password ..." results in collision)

"Password a" --> {"Username, Password a", "Username, a drowssaP"} --> {0110,0110} --> 11100110
"Password b" --> {"Username, Password b", "Username, b drowssaP"} --> {0110,1000} --> 11101000

The only advantage of this is that an attacker could have precomputed that "Password a" hashes to 1110 if "Password a" is a commonly used password, but precomputing wouldn't help when salting.
Another "attack" on this idea is the fact that there might not actually be 2^8 unique hashes. This is because some 8-bit hashes might be impossible to construct from existing 4-bit hashes, and hence an attacker can significantly limit the search space if implausible hashes were to be precomputed. This, however, would be very difficult to do and would take a bit more brain power to figure out than I have available right now, although I recognize the theoretic possibility.