Strategy

This breakdown of our deciphering process illustrates how we utilized alteryx at HackCU.

First Step

Characterize letter frequencies:

  • Regular expressions were used to parse the original ciphertext into rows of 2 character pairs
  • A Frequency Table Node was then applied to these rows to determine the relative frequencies of each character pair.

Second Step

Compare known letter frequencies

  • A record ID node was used to structure our data prior to comparison
  • The relative frequencies of english letters was compared to the histogram that was created on our 2 character pairs.

What does Pig Latin do to letter frequencies?

The number of occurrences of any given letter in the English language is thrown off when translated into Pig Latin. This is due to the fact that every word becomes translated into a similar version of itself with the letters A and Y appended to the word.

Considering this fact, it is important to note that the frequencies we observed in the ciphertext will not line up perfectly with what you would expect out of a normal frequency analysis.

A note about Pig Latin Dialects

According to the internet, Pig Latin can take on a few different forms. Some dialectics have different rules about what happens to words that start with vowels.

Some say: Ocean --> Ocean way

Others say: Ocean --> Ocean ay

With a bit of analysis on our data following some attempts on decoding the ciphertext, it became obvious that this particular set of Pig Latin utilizes a dialect in which words starting with vowels are translated in the following way:

Ocean --> Ocean yay

After making this discovery, it became trivial to fill in mappings between 2 character pairs and actual Pig Latin letters.

But what about data loss?!?

Digging into things, we made the discovery that Pig Latin has some fatal encoding flaws. The issue is that when translating words that have consonant clusters at the front, multiple characters must be moved to the back of the word to create the Pig Latin equivalent. This creates a problem when attempting to return back from the Pig Latin, since there is no unique identifier to indicate that the original word contained one or several consonants as a prefix.

In other words, translating words to and from Pig Latin is not a one-to-one relationship.

To investigate the impact of this issue, we went ahead and wrote a python script to enumerate all of the potential words found in the original text which conflict due to consonant clusters.

Results:

Counting the occurrences, we found a total of 118 words with conflicts in the provided text. If you account for positioning of apostrophes, this number goes up to 186 words!

Take a look at the gists below to see all collisions:

Tooling

We used a lot of python scripts to help solve the initial encoding of characters. All python files can be found at github.com/mguida22/alteryx-crypto and the alteryx workflow was submitted by email.

Contributors

  • Ben Williams
  • Michael Guida
  • Andrew Gentry
  • Travis Benson
  • Collin Berg

Built With

Share this project:

Updates