Stanford Crypto 1 course

Perils of the Many Time Pad

I've been doing the entry level crypto MOOC from Stanford/Coursera by Dan Boneh.

This is my solution to the first of the optional programming questions, but using some different data. The question provides several ciphertexts that have been encrypted with the same key.

The encryption is simple XOR with a random key, but the same key is used each time. The question shows the error of using a one time pad more than once.

There is a hint provided,

"Hint: XOR the ciphertexts together, and consider what happens when a space is XORed with a character in [a-zA-Z]. "

My solution involved two parts.

First noting the helpful hint. Ascii space is 0x20 and characters "A-Z" are sequential between 0x41 and 0x5a, and "a-z" between 0x61 and 0x7a. Therefore " " $\oplus$ "A" = "a" ( and because of the nature of XOR the reverse is true also )

First XOR each message against each other

  • for ${c_i},{c_j} \in C$ calculate ${c_i \oplus c_j}$, noting that $c_i \oplus c_j = m_i \oplus m_j$

This gives $n(n-1)/2$ new texts to analyze. I compared all the XOR of particular ciphertext. For each decryption if a printable character at a particular location appeared more times that another character, I would assume that was the best decryption and choose that for the initial guess of that messages plaintext.

Analyse the resultant plaintexts, and use simple word analysis and guessing to improve them.

  • encryption algorithm is $E(k,m_i) = m_i \oplus k$,
  • decryption is $D(k,m_i)=c_i \oplus k$
  • the key is simply $k=c_i \oplus m_i$

So I manually iterate around

  • use a guess to generate a new key.
  • use that to decrypt the other messages.
  • Improve the guesses making lexical choices.

I attach my python code for each part here programming1-1.py and programming1-2.py

Ways to improve this would be using bigram trigram probabilities, and checking a key only decrypts a cipher text to printable characters.