Encrypt and decrypt text using a custom alphabet substitution
A simple substitution cipher is a classic encryption technique where each letter in the plaintext is replaced with a different letter (or symbol) according to a fixed mapping. This mapping is defined by a key, which is essentially a rearranged version of the alphabet.
Unlike the Caesar cipher which uses a fixed shift for all letters, simple substitution uses a completely custom mapping. This significantly increases the number of possible keys from just 26 (for Caesar) to approximately 4 × 10²⁶ (the factorial of 26), making simple brute force attacks much more difficult.
In a simple substitution cipher:
If we use the substitution key: QWERTYUIOPASDFGHJKLZXCVBNM
Standard alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ
To encrypt the word "HELLO":
So "HELLO" encrypts to "DTGGP"
There are several ways to create a substitution key:
Generate a completely random permutation of the alphabet. This is the most secure approach but makes the key difficult to remember.
Start with a keyword, remove duplicates, then add the remaining letters of the alphabet in order. For example, with the keyword "CIPHER":
CIPHERABDFGJKLMNOQSTUVWXYZ
A special case where you simply reverse the alphabet:
ZYXWVUTSRQPONMLKJIHGFEDCBA
Another special case where you shift the entire alphabet by a fixed number. With a shift of 3:
DEFGHIJKLMNOPQRSTUVWXYZABC
While simple substitution ciphers have a large key space, they are vulnerable to frequency analysis. In any language, certain letters appear more frequently than others. By analyzing these frequencies in the ciphertext, an attacker can often deduce the mapping without knowing the key.
For example, in English, 'E' is the most common letter, followed by 'T', 'A', and 'O'. If a cryptanalyst finds that 'X' is the most frequent letter in the ciphertext, they might guess that 'X' corresponds to 'E'.
Simple substitution ciphers are among the oldest known encryption methods. They were widely used for secure communication throughout history:
Despite its large key space, the simple substitution cipher can be broken using several techniques:
By analyzing the frequency of characters in the ciphertext and comparing them to known language patterns, cryptanalysts can often deduce the key. This works because the underlying letter frequencies remain preserved in the ciphertext.
Certain letter combinations (digraphs and trigraphs) appear commonly in languages. For example, in English, 'TH', 'ER', 'ON', and 'AN' are common pairs. Identifying these patterns helps in guessing parts of the key.
If spaces are preserved in the ciphertext, analysts can identify short words which are often common words like "a", "an", "the", "to", etc. These provide valuable clues to several letters at once.
While simple substitution ciphers are no longer used for secure communications, they still have several modern applications:
The simple substitution cipher represents an important milestone in the evolution of cryptography. While vulnerable to statistical analysis, it was a significant improvement over earlier methods and remained useful for many centuries. Its large key space makes it resistant to brute force attacks, but its preservation of language patterns led to the development of frequency analysis techniques that eventually rendered it insecure.
Today, it serves as an excellent introduction to cryptographic principles and remains a fascinating piece of cryptographic history. Modern encryption has moved far beyond simple substitution, but understanding these foundational techniques helps us appreciate the ongoing battle between cryptographers and cryptanalysts throughout history.