DHQ: Digital Humanities Quarterly
Editorial

The Application of Latent Semantic Analysis to the Voynich Manuscript

Lisa Fagin Davis <lfd_at_themedievalacademy_dot_org>, , Medieval Academy of America

Abstract

The Voynich Manuscript (VM) is a medieval manuscript likely written in the 15th century (Yale Univ., Beinecke Rare Book & Manuscript Library MS 408). The manuscript is written in an unknown language or code using an unidentified set of symbols that has yet to be made legible. Additionally, the codex contains many strange and fantastical images of plants, people, and cosmological/zodiac illustrations, the meaning of which are also unknown. One of the main research avenues into the VM is to examine its textual content to understand how it behaves relative to known texts; this can provide insight as to whether the mysterious writings contain decipherable text or not. In this paper, we explore the coherence and flow of the manuscript using Latent Semantic Analysis (LSA). LSA is a technique that may help ascertain whether the behavior of the text within the VM shows evidence of a coherent flow of topical content, by comparative analysis of text samples that are near each other, farther away from each other, at section breaks, or even page breaks. The advantage of this strategy is that LSA analysis can be undertaken without actually knowing the meaning of the text. We expect portions of text that are near to each other to have a relatively high similarity score, that is, to be potentially semantically related. We also expect that at anticipated topic breaks (pages or sections), the similarity score between adjacent text blocks would be smaller, as the breaks seem to represent a change in topic. Both of these patterns are observed in the control manuscript studied as proof-of-concept experiments. Patterns then observed in several sections of the VM suggest that there may be an overall coherence to the text.

Content goes here!

Works Cited

Altszyler et al. 2017 Altszyler, E. et al. (2017) “The interpretation of dream meaning: Resolving ambiguity using latent semantic analysis in small corpus of text”, Consciousness and Cognition, 56, pp. 178–187.
Currier 1976 Currier, P. (1976) “Some important new statistical findings”, in New Research on the Voynich Manuscript: Proceedings of a Seminar, 30 November 1976, pp. 20–26 (unpublished typescript). Available at: https://media.defense.gov/2021/Jul/13/2002761428/-1/-1/0/PROCEEDINGS-OF-A-SEMINAR-30-NOVEMBER-1976.PDF.
Davis 2020 Davis, L.F. (2020) “How many glyphs and how many scribes? Digital paleography and the Voynich manuscript”, Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies, pp. 164–180.
Deerwester et al. 1990 Deerwester, S. et al. (1990) “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, pp. 391–407.
Dos Santos and Favero 2015 Dos Santos, J.C.A. and Favero, E.L. (2015) “Practical use of a latent semantic analysis (LSA) model for automatic evaluation of written answers”, Journal of the Brazilian Computer Society, 21, p. 21.
Foltz, Kintsch, and Landauer 1998 Foltz, P., Kintsch, W. and Landauer, T. (1998) “The measurement of textual coherence with latent semantic analysis”, Discourse Processes, pp. 285–307.
Landauer and Dumais 1997 Landauer, T. and Dumais, S. (1997) “A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge”, Psychological Review, pp. 211–240.
Lindemann and Bowern 2021 Lindemann, L. and Bowern, C. (2021) “Character entropy in modern and historical texts: Comparison metrics for an undeciphered manuscript”. arXiv:2010.14697v2.
Manly 1921 Manly, J. (1921) “The most mysterious manuscript in the world”, Harper's Magazine, pp. 186–197.
Mikolov et al. 2013 Mikolov, T. et al. (2013) “Efficient estimation of word representations in vector space”. arXiv:1301.3781.
Pelling 2006 Pelling, N. (2006) The curse of the Voynich. Surbiton: Compelling Press.
Reddy and Knight 2011 Reddy, S. and Knight, K. (2011) “What we know about the Voynich manuscript”, in Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, June 2011, pp. 78–86. Available at: http://dl.acm.org/citation.cfm?id=2107647.
Sterneck, Polish, and Bowern 2021 Sterneck, R., Polish, A. and Bowern, C. (2021) “Topic modeling in the Voynich manuscript”. arXiv:2107.02858.
Timm and Schinner 2020 Timm, T. and Schinner, A. (2020) “A possible generating algorithm of the Voynich manuscript”, Cryptologia, pp. 1–19.
Timm and Schinner 2023 Timm, T. and Schinner, A. (2023) “The Voynich manuscript: Discussion of text creation hypotheses”, Cryptologia, 48(4), pp. 305–322.
Vaswani et al. 2017 Vaswani, A. et al. (2017) “Attention is all you need”, in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
Zandbergen 2004–2025 Zandbergen, R. (2004–2025) The Voynich manuscript. Available at: https://www.voynich.nu/ (Accessed: 12 February 2025).
Zandbergen n.d. Zandbergen, R. (n.d.) “IVTFF intermediate Voynich MS transliteration file format: File format 2.0”.