Encipherment process (take 1)

Voynich Manuscript

According to my latest cipher theory, this is a general estimate of what the encipherment process could be:

  1. (Optional) Prepare your plaintext by removing some letters. Helps to save time.
  2. (Optional) Split into blocks of equal length. Helps to reduce errors.
  3. Convert each letter into a number with simple substitution.
  4. Do some mathemagics to the numbers with Pascal’s Triangle (exact details are trade secret). You now have Voynichese! If you split the plaintext into blocks, each one now corresponds to a line of ciphertext. But some may have ended up with wildly different lengths, so…
  5. (Optional) Pad out lines with filler at the beginning or end to make them equal length. Helps to make the result look nicer and harder to decipher.

Pascal’s Triangle and the Voynich Manuscript

Voynich Manuscript

I’ve been toying with the idea of using Pascal’s Triangle to make a cipher that results in similar statistics to the text “system” of the Voynich Manuscript. My concepts are premature but I’m pleased to note that so far I’ve devised something (relatively) simple with short words, binomial word lengths, strong word structure, lines as semantic units, lack of repeated sequences, and word-adjacent repetition. I haven’t had the time to really dig in and quantify any of these and compare with the VMS text but on first glance it appears fairly close.

Example

For example, here is a ciphered phrase using an early version of the cipher and EVA transcription: (deliberately seeded to end in -n all the time)

chiain chiin dain choiin shoin shoiin chiiin chiin shn dain chiiin diin in.

Here is the same phrase again:

potir chiin dain shoedy shoin shoiin chols sheey toy chddy chiiin ooli aiim.

Here is the same phrase yet again:

fodar choiin shn sheey diiin diin choli shoedy chedy ty choin shels daiim.

Here is the same phrase without vowels:

shedy shoey sheyi shoiin choiin chtchar cheli shoyiiim.

Features

Interesting things about my system (so far):

  • Word context is highly important and affects all content.
  • The same plaintext sequence is almost guaranteed to end up completely different every time it is included. This applies to individual words too. For a word of length enciphered twice, the probability that its ciphered versions will match is approximately 1/(2^18n), with a few caveats here and there. I wish WordPress could embed formulae easily (can someone please tell me how in the comments below?).
  • Multiple appearances of the same ciphertext sequence are almost guaranteed to be completely unrelated. This applies to individual words and similar sequences like Timm Pairs. The probability is similar to the one mentioned above. However, if they are at the very beginning or end of lines they might be a bit related. If they are labels (i.e. enciphered outside a line) they become much more similar.
  • Blank spaces in words are meaningful. What do I mean by this? All words actually store 10 letters of information, but one letter of the alphabet is an invisible glyph (we’ll call it “_”), giving the appearance of different word lengths. For example (not a real example), fodar might actually be f _ _ o d _ a _ _ r. The system allows us to unambiguously reconstruct the original ten letter sequence with ease. This allows words to store more information than they would suggest.
  • Similar words that appear next to each other (Bad Romance sequences) are an unintended side effect. They store just as much information as any other sequence because of their context.
  • It allows for a total of 9^4=6561 unique words, though this can be adjusted with some tricks and workarounds. Stolfi counted a total of 6525 unique words in the Voynich Manuscript.
  • If this was confirmed to be the system behind the Voynich Manuscript’s text, I would still have very little idea of how to decipher it.
  • Update: At certain points you could pack filler at the beginning or end of a line to make them equal length and make the system a bit more secure. In the Voynich Manuscript itself, some see evidence of meaningless filler material at the beginning or end of some lines.
  • Update 2: It also accounts for the findings that the first two letters of each word are more predictable than the rest, and that there is some mild correlation between the end of one word and the start of the next.