Encryption is used to protect data

Encryption is used to protect data (Oleksandr Pupko, iStockphoto)

Encryption

Let's Talk Science
Format
Readability
8.79
Subjects

Summary

Learn about the branch of mathematics involved with keeping data safe known as encryption.

Introduction 

At the heart of any domain of science is the testing of a theory. This can only be done when comparisons can be made in a language all scientists understand. This language is mathematics. Mathematics enables groups of scientists who don’t necessarily speak the same oral language to work on a common problem and talk in a way that they can all understand.

There are many different branches in mathematics, each with many unique and fascinating aspects. In this backgrounder you will explore ways in which we use mathematics to keep our data safe. This is known as encryption. We hope this gets you to think of mathematics in a whole new light!

Encryption

Whether you are a secret agent or a student buying stuff over the Internet, it is important to keep your information secret while you are transmitting it. For example, if you wanted to send a message about a surprise birthday party to a friend by email, it would be good to re-write it in a secret way that only you and your friend understand. Encryption is the term for creating secret messages. Decryption is the term for reading secret messages. Encryption and decryption have been vitally important during wartime. For example, in World War II, encryption and decryption centres were set up across many countries. One such centre was Bletchley Park in England. There cryptologists worked hard to decrypt messages from the countries they were fighting.

A famous German cryptography machine created during World War II was called the Enigma machine. Enigma machines used a keyboard, a set of rotating disks and electric circuits to create the secret codes. The electrical pathway from each letter on the keyboard to the output letter was changed often, so it was a tough code to break. Eventually, information about the machines and the hard work of Allied cryptologists enabled the Allies to read the German coded messages. This ability to find out what the Germans were up to, without their knowledge, helped the Allies win the war.

An Enigma machine in use, 1943
An Enigma machine in use, 1943 (Source: Bundesarchiv, Bild 183-2007-0705-502 / Walther / CC-BY-SA 3.0] via Wikimedia Commons).

Substitution Codes

The Enigma machine worked by substituting one letter with another. This is called a substitution code. A substitution code is a code in which each letter in a word gets replaced by a different letter or symbol. For example, if you wanted to send your bank password - “hotdog” - to a friend in a secure way, you could use a substitution code and rewrite it as “©¢¿£¢>” before sending it to your friend. Your friend would only be able to decode this message if he or she knew how you were substituting the letters. 

Question 1:

Given the example above (“hotdog” is coded as “©¢¿£¢>”), what would your friend need to know in order to break the code (known as the encryption key)? The answer is at the end of the Backgrounder.

Another way to do a substitution code is to replace each letter of the message with a letter that is located a fixed distance from that letter in the alphabet. For example the word DANGER becomes GDQJHU if we use the corresponding letter located three positions further along in the alphabet. This kind of encryption is called the Caesar Shift Cipher because Julius Caesar used it in Ancient Rome in relaying his messages.

Question 2:

Create a Caesar Shift Cipher for “The troops are approaching” and get a friend to solve it by giving him/her your encryption key. A potential answer is at the end of the Backgrounder.

In the “hotdog” example, if someone intercepted the message, that person would have to figure out what substitution code you were using. For example, which letter does “¢” stand for? The person might have to try all the possible combinations before finding one that makes sense. However, there are some tricks that could make the job easier. A clever interceptor could use clues like the number of times a symbol appears in the secret message. Since the letter “e” is the most common letter in the English language, the most common symbol in a secret message probably stands for “e”. Figuring out a message this way is called Frequency Analysis. The trouble with frequency analysis is that it only works when the message is long.

So, how can you prevent someone from using frequency analysis to decode your secret messages? One possible solution to this problem is to use different substitution codes for different parts of the message. For example, in the first sentence of the message, “e” could be coded to “%”, and in the second sentence, “e” could be coded to “*.” If you change codes often enough, as the Germans did with the Enigma machine, it will be very difficult to find patterns. 

Encryption Keys

The information that only you and the person reading your code know is called the encryption key of the code. This is because it allows you to ‘unlock’ its secrets. For example, in a substitution code, the key is the substitution pattern you agree on with your friend.

One simple type of encryption using keys is called Symmetric Key encryption. In this method, a key is used to encrypt the message and the same key is used to decrypt the message. The simplicity of this type of encryption is a drawback of this method.

Symmetric key encryption
Symmetric key encryption (©2020 Let’s Talk Science). 
Graphic - Text Version

The original message is in plaintext. It is then encrypted with the symmetric key which converts it to ciphertext. Ciphertext is decryped with the symmetric key. Now the message can be read as plaintext.

A more tricky method is Public-key Encryption. In this method, two different keys are used, one key to encrypt the message and a different key to decrypt the message. The key used to encrypt the message is called the public key. The key used to decrypt the message is called the private key (see Figure 7). These keys are mathematically linked in such a way that it is virtually impossible to guess the private key given the public one. Public-key is a safer and preferable method for encryption than symmetric-key because only the intended reader knows the private key.

Public-key encryption
Public-key encryption (©2020 Let’s Talk Science). 
Graphic - Text Version

The original message is in plaintext. It is then encrypted with a public key and converted into ciphertext. The ciphertext is decrypted using a private key. Now the message can be read as plaintext.

Compression Codes

Another type of code studied by cryptologists is the compression code. Compression codes are ways to re-write information using fewer characters, so that it takes less space to store on a computer. Compression codes also help to transmit things more quickly over the Internet. Take the following English message, for example:

Welcome to New Brunswick!

We could probably remove some of the characters in this message and still be able to read it. For example, if we remove all of the vowels in the message,

Wlcm t Nw Brnswck!

we get a shorter message whose meaning is still fairly clear. This message only needs two-thirds of the original space to store on a computer. So, removing all of the vowels in an English message is a simple kind of compression code. Other kinds of compression codes have been developed for other kinds of data; for example JPEG is a method for compressing images. JPEG is an acronym for Joint Photographic Experts Group, which is the name for the group who created this method of compression.

Morse Code 

The Morse code is a code that allows us to transmit text electrically. It was first demonstrated by Samuel Morse, its inventor, in 1844. He used it with his electric telegraph machine to transmit information over telegraph lines, which are wires for transmitting electrical signals. International Morse Code is a modern variation of Morse’s original code. In it, each letter of the alphabet is encoded as a different pattern of dots and dashes. Common letters are represented by short sequences, for example a single dot stands for “E”. Less common letters are represented by long sequences, for example “dash-dash-dot-dash” stands for “Q”.

International Morse Code
International Morse Code (Source: Public domain image via Wikimedia Commons).

This is one way of making the transmission as short as possible, since the short sequences will be used a lot, and the long sequences only rarely. Therefore, the Morse code is a kind of compression code.

Question 3: 

Decode this word in International Morse code. The answer is at the end of the Backgrounder.

•••• • •-•• •-•• ---

Run-length Coding

Have you ever seen a fax machine (short for facsimile machine)? 

A fax machine from the 1990s/Un télécopieur des années 1990
A fax machine from the 1990s (Source: Jonnyt [public domain] via Wikimedia Commons).

There is probably one in the office at your school or you may have seen one in an office building. A fax machine is a machine that is able to scan images and text and then send that information digitally over a telephone line. Since the images and text sent by a fax machine consist of black and white parts, a fax machine essentially divides the image into a fine grid, and then sends information indicating whether each square in the grid is black or white.

For example, let’s say you wanted to send the image in Figure 10. It is actually part of a message, called the Arecibo Message that was sent into space using a radio telescope. 

Section of the Arecibo Message/Une partie du message d’Arecibo
Section of the Arecibo Message (©2020 Let’s Talk Science).

Let’s say 0 represents a white square, and 1 represents a black square. To send the first line of the picture in Figure 10, we would send a message saying 000000001111100000000, which is 21 characters long. Let’s say each character takes one second to transmit. The description of the first line of the picture would take 21 seconds to transmit. But, is there a shorter way to transmit a description of this line of the picture? Since the pictures sent by fax machines often contain alternating long blocks of either white or black pixels, we can send a description of these blocks. We only need to send a message saying how many pixels are contained in each block. This is called run-length coding. In the picture above we could encode the first line as 8,5,8 which takes only 3 characters. Assume that you always start with white.

Question 4: 

Encode the rest of the picture above in both regular coding (0s and 1s) and run length coding. The answer is at the end of the Backgrounder.

Answers

Question 1: Given the example above (“hotdog” is coded as “©¢¿£¢>”), what would your friend need to know in order to break the code (known as the encryption key)?

Answer:

Your friend would need to know that h = ©, o = ¢, t = ¿, d = £ and g = >. 

Question 2: Create a Caesar Shift Cipher for “The troops are approaching” and get a friend to solve it by giving him/her your encryption key.

Answer​​​​​:

This will depend on the encryption key used. If you used a Caesar Shift Cipher that was a shift of three letters, the ciphertext would read, “Wkh wurrsv duh dssurdfkloj.”

Question 3: 

Decode this word in International Morse code. •••• • •-•• •-•• ---

Answer:

•••• = h, • = e, •-•• = l, --- = o, therefore the word is “hello”

Question 4: Encode the rest of the picture above in both regular coding (0s and 1s) and run length coding.

Regular coding                                   Run-length coding

Line 2: 000000111111111000000        6, 9, 6

Line 3: 000011100000001110000       4, 3, 7, 3, 4

Line 4: 000110000000000011000       3, 2, 11, 2, 3

Line 5: 001101000000000101100       2, 2, 1, 1, 9, 1, 1, 2, 2

Line 6: 011001100000001100110       1, 2, 2, 2, 7, 2, 2, 2, 1

Line 7: 010001010000010100010      1, 1, 3, 1, 1, 1, 5, 1, 1, 1, 3, 1, 1

Line 8: 010001001000100100010      1, 1, 3, 1, 2, 1, 3, 1, 2, 1, 3, 1, 1

Line 9: 000001000101000100000      5, 1, 3, 1, 1, 1, 3, 1, 5

Line 10: 000001000010000100000    5, 1, 4, 1, 4, 1, 5

Line 11: 000001000000000100000    5, 1, 9, 1, 5

 

References

MathWorks. (n.d.). RunLength.

History.com. (2009, November 9). Morse code & the telegraph.

Jones, D. W. (n.d.). Data compression and encryption algorithms. University of Iowa.

Practical Cryptography. (n.d.). Simple substitution cipher.

The Editors of Encyclopaedia Britannica. (n.d.). Enigma.

Tyson, J. (n.d.). How encryption works. HowStuffWorks.