Can Transformers Compute Hash of a String?
The answer should be NO but we can give it a shot
Hypothetically let us assume that I do not know how MD5 or string hashing in general works. As a Machine Learning Engineer, I see a string goes in and a string comes out, it is a text-to-text translation problem. I can either pick up a GPT-like decoder or T5-like encoder-decoder to solve arbitrary (?) text-to-text problems. So let us give it a shot and see how much domain knowledge do we need or even if we can solve this or not.
Experiment: Calculating MD5 of a random string
In first experimental setup, we want to just compute the MD5 hash of a randomly generated string. We know both the input text sequence as well as the output string.
Here is the output of 10 randomly generated sequences and their hashes:
Next, let us define a pretty tiny model (I opted for T5 implementation from the transformers library):
Pretty tiny model (21 MBs), since the problem is not too complex, I assume, tiny model should work.
Once the training is started, model is trying to mimic the output distribution but the predictions are way off:
I also decided to track a few more metrics other than loss:
For ~8k iterations, things seem to improve and in general go in the right direction, even though output is still garbage. But somehow all of a sudden, loss dropped faster and became very noisy, as well as, rest of the metrics became worse:
After around 25k iterations, nothing seemed to improve much, so I decided to stop:
Next, experiment is to try and scale the model. Maybe the model is not complex enough to learn the complexity of md5 hashing function:
This new model is ~350MB (~16x bigger). But even after 50k iterations, it did not converge - instead loss went up after a while:
It is clear that it is a very difficult task for transformers. Next experiment can be to send bits instead of bytes and see whether the transformer can learn the pattern or not. If so, we can also go in the reverse direction i.e. predicting string from hash.








