Bubble babble is an encoding method where binary information is represented as pseudo-words made from alternating vowels and consonants. It is mostly used to represent cryptographic fingerprints. Created by Antti Huima, the bubble babble method makes it easier for people to remember and repeat important information easily. The encoding also has built-in error correction and redundancy, which makes it possible to spot any errors in transcription.
The methodology came into being to help people to pronounce and remember hexadecimal codes in a more natural way. It is sometimes necessary to verbally validate a key over the phone or through some other system. These public keys and the hash values of important certificates are called thumbprints or fingerprints. These types of prints are vital in verifying sensitive data for security reasons. They are typically represented as long strings of hexadecimal digits.
It's difficult for people to pronounce, remember, and repeat long strings of numbers reliably over the phone, for instance. Huima created the code to address this problem in order to easily represent forgettable binary data into more memorable pseudo-words. The term itself is a pun on the classic video game Bubble Bobble. When hexadecimal digits are encoded using the bubble babble encoding method, the generated words resemble babbling or gibberish.
This also comes in handy in situations where the valid encryption keys have been lost or are otherwise unavailable. In this case, the key fingerprint needs to be verified by the user's recollection of the original key. Fingerprints encoded using bubble babble encoding are often easier to recall than their hexadecimal versions. This makes a difference in critical situations instead of having to accept a fingerprint without authenticating it in any way.
To detect transmission errors or invalid encoding, the bubble babble encoding method contains a check summing feature. Markers that represent the start and end of a particular string of numbers are also incorporated into the encoding. For every two bytes in the input sequence, the output comes out as a dash and five characters. One of the advantages of this method is that it doesn't increase the length of the encoded information.
The pseudo-words are made using vowels and consonants in a certain way — numbers between zero and five are mapped with vowels as 0-a, 1-e, 2-i and so on in the vowel table. Integers between zero and 16 are mapped as 0-b, 1-c, 2-d and so on in the consonant table. Every vowel in the resulting pseudo-word carries 0.58 bits of redundancy. The checksum information would be around 4,640 bits for a 1,000-word string, which is helpful to detect errors like flipped bits.