DHQ: Digital Humanities Quarterly
Editorial
ITALMUD: a Computational Terminology of the Italian Translation of the Babylonian Talmud
DOI: pending
1. Introduction
The Babylonian Talmud is one of the fundamental texts of the Jewish tradition, the
result of a complex interweaving of legal, narrative, and sapiential discussions #strack1996. This text presents terminological peculiarities that make it difficult to translate
into other languages: it is in fact written mainly in Aramaic, but it includes Hebrew
terms, linguistic borrowings, and a vast lexical repertoire with a high degree of
polysemy and contextual variability schäfer2007. The exact interpretation of Talmudic terms often depends on the immediate context
and the cultural and legal implications implicit in the discussion.
From a linguistic point of view, the main terminological difficulties are related
to semantic ambiguity, widespread polysemy, variations in meaning according to contexts
of use, and frequent implicit references to concepts or cultural and legal institutions
specific to the rabbinic world #satlow2006.
Translating a text of such complexity (and vastness) constitutes a challenge; from
a terminological point of view, translators are required not only to have considerable
linguistic skills, but also a profound knowledge of Jewish culture and norms.
In this context, translation cannot be reduced to the substitution of words across
languages. It requires the transposition of concepts, legal categories, ritual practices,
and culturally embedded references from the rabbinic tradition into another linguistic
and interpretive framework. Terminological work is therefore essential, since it helps
preserve the specificity of the source tradition while making its concepts accessible
to readers of the target language.
To date, there are few complete translations of the Babylonian Talmud. One of the
most renowned is the translation edited by Rabbi Adin Steinsaltz (#steinsaltz1965)[1] rendered in modern Hebrew with commentary. The Schottenstein Edition (#schottenstein1990), published in English by ArtScroll between 1990 and 2005, is widely recognized for
its accuracy and popularity in the English-speaking world. Moreover, ArtScroll has
also published a complete translation of the Talmud in French #safra2011. Another noteworthy translation is the Goldschmidt Talmud (Goldschmidt [ed.], 1897-1936)
a German translation edited by Lazarus Goldschmidt between 1897 and 1936.
A complete Italian translation of the Babylonian Talmud is, at the time of writing,
underway since 2012 within the framework of the “Traduzione del Talmud Babilonese”
project2.
The role of technology has proven fundamental for this highly interdisciplinary project:
without specially developed computational tools, it would have been extremely difficult
to coordinate dozens of translators and reviewers. Indeed, the computer-assisted translation
(CAT) tool Traduco (#giovannetti2016) was essential in ensuring terminological consistency, coherence, and accuracy throughout
the translation process.
The aim of the work presented in this article is to develop ITALMUD, a multilingual
terminological resource for the Babylonian Talmud — pivoting on Italian — using the
Linked Data paradigm (#bizer2009) and computational lexical resources. The focus is therefore not on the vocabulary
of the Babylonian Talmud as a whole, but on the set of specialised terms documented
in the expert glossaries produced during the Italian translation process. The significance
of ITALMUD lies not only in the creation of a specialised resource for Talmudic studies,
but also in the methodological problem it addresses. Many digital humanities projects
generate rich terminological knowledge in the form of glossaries, annotations, translation
notes, and expert commentaries; however, such knowledge often remains difficult to
query systematically or to connect with other linguistic resources. ITALMUD shows
how this kind of expert knowledge can be converted into Linked Open Data and connected
to a computational lexicon, thereby making implicit semantic relations explicit. This
makes it possible to ask questions that go beyond simple word lookup — for example,
to retrieve terms connected by broader semantic domains, to compare translation choices,
or to identify conceptual patterns across a large and multilingual textual tradition.
The effort is therefore worthwhile because it turns a glossary into an instrument
for discovery: a resource that supports both close philological work and scalable
computational exploration.
The rest of the article is organised as follows. A brief overview of previous works
on Talmudic terminology is provided in Section 2.
In particular, as detailed in Section 3.1, the Linked Data approach offered a way
to formalize and standardize the resource, foster its shareability (#chiarcos2020), and significantly aid in managing the terminological variety and complexity within
the Talmudic domain.
The resource was built using data extracted from a series of glossaries produced during
the translation process and subsequently converted into a formal representation compliant
with the aforementioned standards, as described in Section 3.2.
It was also decided to link the resulting terminological resource to a computational
lexicon of the Italian language, to be used as a lexical superstructure, in order
to systematically connect and more clearly define Talmudic terms through an established
lexical framework (see Section 3.3). This approach made it possible to embed the terminology
within the dense network of semantic relationships that constitutes the lexicon, thereby
enhancing its usability.
The rest of the article is organised as follows. In Section 4, the resource will be
described from a quantitative point of view, including some statistics on the various
types of linking with the Italian lexicon; Section 5, on the other hand, will describe
how to access the resource and some possible uses of it. The last section, finally,
will conclude the article with some reflections on the work done and its future prospects.
2. Related Works
Freely available lexical resources related to the religious texts of Judaism are relatively
scarce. As for the Hebrew language, the only resource that, to the best of our knowledge,
is freely accessible is Jastrow’s Dictionary3, hosted by Sefaria4, which includes
English definitions of approximately 30,000 Hebrew and Aramaic words used in various
rabbinic texts, including the Babylonian Talmud. This digital dictionary is based
on the work of the scholar Marcus Jastrow, whose “A Dictionary of the Targumim, the Talmud Bavli and Yerushalmi, and the Midrashic
Literature” (#jastrow1926) remains an authoritative source for understanding the Aramaic lexicon in rabbinic
Jewish literature.
We also cite for Hebrew the Historical Dictionary Project of the Academy of the Hebrew
Language, initiated in the 1950s. This project aims to compile an authoritative historical
dictionary encompassing the entire Hebrew lexicon, tracing the meanings and morphology
of words from their earliest attestations to their most recent usage. The project
has developed a comprehensive computerized database, known as Ma'agarim5, which serves
as a unique resource for researchers. The project is documented online by the Academy
of the Hebrew Language5.The database is freely accessible online but cannot be downloaded.
For Aramaic, we cite the Comprehensive Aramaic Lexicon (CAL)6, a digital project hosted
by the Hebrew Union College-Jewish Institute of Religion. Initiated in the 1980s,
CAL provides a text base of Aramaic texts across all dialects from the 9th century
BCE to the 13th century CE. It offers a searchable dictionary (which cannot, however,
be downloaded) and access to text corpora.
Regarding research involving Italian lexical resources, several studies conducted
within the Babylonian Talmud Translation Project into Italian are mentioned below.
These studies have addressed various aspects, ranging from the computational representation
of Talmudic terms to the ontological modelling of rabbinic figures mentioned in the
text. Specifically, we highlight the contribution on extraction, representation, and
use of the Babylonian Talmud terminology (#giovannetti2020) and the definition of an ontology dedicated to the Talmudic masters (Giovannetti
et al, 2021). Furthermore, experiments were conducted on graphical visualization of
Talmudic terms to enhance accessibility for scholars and non-specialist users #marchi2022. Additional works explored the application of the OntoLex-Lemon model to formally
link the Talmudic text to lexical entries (#sciolette2024), and structured methodologies were proposed for building integrated computational
resources (terminological and conceptual) that are shareable and interoperable, derived
from religious texts, including Hebrew texts #saponaro2022.
Taken together, these works show that Talmudic terminology constitutes not only a
specialised domain of Jewish studies, but also a useful testing ground for broader
questions concerning the computational representation, publication, and reuse of expert
knowledge in multilingual textual traditions.
3. The Construction of the Resource
The terminological resource was built based on a selection of tractates from the Babylonian
Talmud translated into Italian. This translation effort is part of the Babylonian
Talmud Translation Project, introduced in Section 1. The project’s goal is to create
a digital Italian version of the Babylonian Talmud — an essential work in Jewish tradition
that encompasses a wide range of knowledge, from law and science to philosophy and
daily life.
The translation process involves a dedicated team of translators, subject matter experts,
and editorial reviewers, all collaborating synergistically through the use of the
aforementioned Traduco CAT tool. In this respect, the construction of ITALMUD illustrates
a recurrent challenge in digital humanities projects: how to transform knowledge produced
during collaborative scholarly work into structured data that can be preserved, queried,
and reused beyond the immediate context in which it was created.
The reference corpus used as the source for the terminological data was compiled from
all the tractates that had been translated and published at the time of writing this
article, amounting to ten in total, namely Berakhòt, Rosh haShanà, Ta’anit, Qiddushìn, Chaghigà, Betzà, Meghillà, Sukkà, Mo’èd Qatàn, and Sotà.
More specifically, the terms were selected from the glossaries included in the ten
tractates under consideration. During the translation process, in fact, Talmudic technical
terms — as well as any other terms that the translators deemed it useful to accompany
with a definition to facilitate the reading of the Talmud — were added to thematic
glossaries. Of the seven glossary categories created, three were taken into account
for the construction of the present resource, i.e., “Concept”, “Linguistics”, and
“Nature”7. These categories were selected because they contain the entries with the
strongest terminological relevance for the purposes of the present resource.
Each glossary entry contains the following information: the category (e.g., “Concept”),
the entry label (e.g., “Sinedrio”), the original Hebrew term (e.g., “סַנְהֶדְרִין”),
its transliteration (“Sanhedrìn”), its translation (used for terms that were not translated
into Italian but left in their transliterated form — for example, “Tevilà,” translated
as “immersione”), and finally, its definition.
Once the glossary entries from which to extract the terms had been identified, it
was necessary to define a model capable of accommodating and formalising these terms.
As previously mentioned, the Linked Data paradigm was adopted as a reference.
3.1 The Model of Reference
The paradigm of Linked Open Data (LOD) (#berners_lee2006) today constitutes one of the methodological foundations for the management, publication,
and interoperability of structured data within the context of the Semantic Web #bizer2009. It enables the transformation of data into knowledge that is accessible, shareable,
and reusable on a global scale, fostering an open and interconnected information ecosystem.
In accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable)
(#wilkinson2016), the LOD paradigm promotes traceability and methodological transparency, ensuring
the accessibility and reusability of informational resources in domains such as scientific
research, education, and technological innovation. The LOD approach, in addition to
improving data quality and usability, enhances their scientific, cultural, and social
significance, contributing to making knowledge — including linguistic knowledge —
an open, shared, and dynamically integrable resource within the global digital landscape.
From this perspective, the Linguistic Linked Open Data8 cloud (LLOD) represents a
collaborative initiative promoted by members of the Open Linguistics Working Group
(#chiaros2020), with the aim of building a coherent subset of open and interconnected data specifically
dedicated to linguistic resources #mccrae2016. The adoption of the Linked Data paradigm in the linguistic domain enables mechanisms
of semantic interoperability, while also promoting the integration of linguistics
with other scientific and applied fields already employing this approach, such as
geography (#goodwin2008), biomedicine (#ashurner2000), and public administration9.
In the context of terminology, the adoption of Linked Open Data (LOD) enables the
description not only of terms and their definitions, but also of the full range of
semantic, morphological, etymological, and pragmatic relations that connect them,
as well as their usage contexts, sources of attestation, and equivalents in other
languages. To this end, standard languages such as RDF10 (Resource Description Framework),
OWL11 (Web Ontology Language), and SPARQL12 (a query language for RDF data) are employed,
along with internationally recognised and shared ontological vocabularies, such as
OntoLex-Lemon (#mccrae2017) for the representation of lexical resources, SKOS13 (Simple Knowledge Organization
System, for the structuring of concepts), and LexInfo14 (for the encoding of grammatical
and syntactic properties). The OntoLex-Lemon model has gained popularity as the de
facto standard for representing lexical data to express information on the Semantic
Web as Linked Data. Although originally conceived as a linguistic model for describing
the lexicalisation of concepts represented in a formal ontology, over the years several
methodologies have been proposed, both for applying Linked Data principles to terminologies
(#ciniano2015; bosque2015), including services that generate LD terminologies from corpora such as termItUp
(#martin2022), and for converting terminologies represented in the ISO standard TBX/XML format
into OntoLex-Lemon, such as Term-à-LLOD #dibuono2020, TBX2RDF (#cimiano2015; #montiel_ponsoda2015, and LemonizeTBX #bellandi2023.
To make the modelling approach more explicit for non-specialist readers, Figure 1
presents a simplified view of the core OntoLex-Lemon components used in this work.
Rather than treating a term as a simple label, the model distinguishes between the
lexical entry, its written forms, and its senses. It also makes it possible to associate
a lexical sense with a conceptual reference and to establish semantic links with external
lexical resources.
This distinction is particularly important for ITALMUD, where each glossary item may
involve several layers at once: a lexical entry, one or more written forms, a culturally
specific sense, and, when relevant, a conceptual reference. The same structure also
provides the basis for linking ITALMUD senses to external resources such as the Italian
computational lexicon CompL-it.
More generally, this modelling step shows how domain-specific knowledge, often embedded
in glossaries or editorial notes, can be translated into a form that supports computational
analysis and scholarly reuse.
3.2 From Glossaries to Terminology
The glossaries compiled for the translation of the Babylonian Talmud are resources
manually curated by domain experts over a span of ten years. This extended time frame
allowed for the development of resources that are rich in information, highly accurate,
and thoroughly validated. However, like any manually created dataset, they also exhibit
reworkings, duplicate entries, annotation inconsistencies, and certain types of ambiguity,
such as, for example, the presence of two forms — singular and plural — in the field
“name” of the glossary entry, or the presence of separators, in the same field, for
different forms. While such issues may be negligible in the context of human consultation,
they become critical for computational processing. This situation is common in many
scholarly projects: resources designed for human consultation often contain precisely
the richness that makes them valuable, but they require an additional layer of modelling
before they can become computationally actionable.
Accordingly, the glossaries underwent manual preprocessing to ensure data homogeneity.
The primary interventions focused on the extraction and analysis of alternative forms
of the lemma. Each entry of glossaries was subsequently mapped and converted into
a Lexical Entry following the OntoLex-Lemon model, as described in the previous section.
Lexical entries were classified into two distinct categories: Word entries, representing
single words, and Multiword entries, comprising compound words, fixed expressions,
and complex phrases. From a theoretical perspective, the Multiword category was designed
to encompass a wide range of linguistic phenomena, which were later subdivided into
more specific syntactic classes. Where possible, the various forms associated with
a lemma were extracted, particularly when such forms were explicitly provided by domain
experts either in the entry label or within its definition.
In fact, both of these fields — entry name and definition — were used in the glossaries
not only to convey semantic information, but also to include grammatical annotations
such as plural forms.
Once lemmas were identified as canonical entries and the alternative forms were treated
as instances of ontolex:OtherForm in accordance with the OntoLex-Lemon model, all
data was tagged with Part-of-Speech (PoS) information and morphological features,
specifically gender and number. Furthermore, wherever feasible, entries and forms
in Aramaic were identified and distinguished. Cross-references between glossary entries
were formalized using the rdfs:seeAlso property.
LexInfo, the reference linguistic ontology for the OntoLex-Lemon model, was adopted
both for the syntactic classification of phrases and for the attribution of PoS tags
and morphological traits. In cases where it was not possible to assign a specific
syntactic class to a multiword unit, the entry was left underspecified and labelled
simply as a multiword.
Regarding the handling of duplicates, the selection process was based on criteria
agreed upon with domain experts: in cases of equivalent informational content, preference
was given to the most recently annotated (and therefore updated) entry with the greatest
number of annotated contexts. In cases where the contexts associated with duplicate
candidates differed, a merging operation was performed to consolidate all contextual
annotations under a single entry.
From this preprocessing of the 887 glossary entries, a total of 828 entries were obtained,
distributed as follows: 461 entries of the “concept” type, 276 of the “linguistics”
type, and 91 of the “nature” type.
Finally, the lexical data extracted from glossaries and mapped to OntoLex-Lemon were
converted into RDF: the resulting resource, we called ITALMUD, is a multilingual,
open, and extensible terminology compliant with the LOD paradigm.
3.3 The Linking to the Italian Lexicon
The set of terms contained in the resource essentially constitutes a list of Italian,
Hebrew, and Aramaic terms, linked by translation relations. What is lacking, therefore,
is a semantic network of relations among the terms themselves.
What characterizes a computational terminology, however, is its capacity to formally
define semantic relations among the linguistic entities it contains. Such a structure
enables the resource to be exploited as a representation of the underlying domain,
allowing for advanced queries both on the resource itself and on the texts to which
it is linked.
In light of this, we decided, as a first step, to address the lack of relations among
the ITALMUD terms by anchoring its Italian entries to the words of an Italian computational
lexicon, CompL-it, in which the words, through their respective senses, are already
connected to each other by a dense network of relations.
3.3.1 The Computation Lexicon for Italian: CompL-it
CompL-it (#sciolette2024) is a computational lexicon for contemporary Italian, developed in accordance with
the OntoLex-Lemon model. The construction of this resource involved the integration
of three distinct data sources: (1) M-GLF (MAGIC-Generated Lemmatized Forms), a collection
of lemmatized entries enriched with morphological information produced by the MAGIC
morphological analyzer (#battista1999; #pirrelli2000); (2) a selection of Italian language treebanks — including the Italian Stanford
Dependency Treebank (ISDT), the Venice Italian Treebank (VIT), ParallelTut (ParTut),
and ParlaMint-It — accessible via the Universal Dependencies (UD) repository; and
(3) the LexicO computational lexicon (#sciolette2023), which serves as the foundational component of the resource, particularly from a
modeling perspective. For a detailed account of the resource construction process
and the characteristics of the underlying model, the reader is referred to the previously
cited works #sciolette2024, #sciolette2023.
Below, a sample entry from the resource is briefly described. In this example, the
Italian noun arca (ark) is modelled as a lexical entry with two distinct senses: one meaning “wooden
chest” and the other “monumental sarcophagus”. Each form of the word (arca and its plural arche) is encoded with grammatical information such as gender and number. The lexical senses
are semantically enriched with definitions, usage examples, and links to two other
senses, with two different relations (i.e., hyponym and usedFor).
lex:MUSarcaNOUN a ontolex:Word; rdfs:label "arca"@it; lexinfo:partOfSpeech lexinfo:noun; ontolex:sense lex:USem63144arca; ontolex:sense lex:USem63145arca; ontolex:canonicalForm lex:arca_MUSarcaNOUN_noun_singular- feminine; ontolex:otherForm lex:arche_MUSarcaNOUN_noun_plural-feminine . lex:arca_MUSarcaNOUN_noun_singular-feminine a ontolex:Form; lexinfo:number lexinfo:singular; lexinfo:gender lexinfo:feminine; lexinfo:degree lexinfo:positive; ontolex:writtenRep "arca"@it . lex:arche_MUSarcaNOUN_noun_plural-feminine a ontolex:Form; lexinfo:number lexinfo:plural; lexinfo:gender lexinfo:feminine; lexinfo:degree lexinfo:positive; ontolex:writtenRep "arche"@it . lex:USem63144arca a ontolex:LexicalSense; skos:definition "cassa di legno"; lexinfo:senseExample "l'arca delle tavole della legge" . lex:USem63145arca a ontolex:LexicalSense; skos:definition "sarcofago monumentale" . lex:USem63144arca compl-it:usedFor lex:USemD883contenere .
3.3.2 Linking
At the current stage of development, 227 ITALMUD terms have been linked to CompL-it.
This figure does not represent the lexical coverage of the Babylonian Talmud in CompL-it,
nor the proportion of Talmudic words that occur in the Italian lexicon. Rather, it
refers to the subset of curated ITALMUD terminological entries for which a validated
semantic connection with CompL-it has so far been established. Although this linking
is still partial, it already enables ITALMUD terms to be explored through the semantic
network of a general-language computational lexicon. The value of the operation therefore
lies not in simple lexical overlap, but in the possibility of using CompL-it as a
semantic infrastructure through which specialised Talmudic terminology can be queried,
compared, and progressively enriched.The intersection between Italian Talmudic terms
and the CompL-it lexicon accounts for approximately 10% of the whole ITALMUD dataset.
Specifically, the following types of linking have been identified:
- i) linking between senses of the same word, which can be further divided into two cases:
a) the lemma form in ITALMUD and CompL-it is identical, e.g., bietola (chard);
b) the lemma form in ITALMUD does not match that in CompL-it, e.g., vetriolo (vitriol) in ITALMUD appears as Qanqantòm (a transliteration of the Hebrew word קַנְקַנְתּוֹם);
ii) linking between senses of different words, based on semantic relations (such as
hyponymy);
iii) linking between a lexical entry in CompL-it and a sense in ITALMUD, which can
thus constitute an additional lexical sense of the entry. This also occurs in two
cases:
a) the lemma forms in both CompL-it and ITALMUD are identical, e.g., abisso (abyss);
b) the lemma forms differ between the two resources, e.g., racimolo (raceme), which in ITALMUD appears under the term Lèqet (the transliteration of the Hebrew word לֶקֶט).
In alignment with the principles of Linked Open Data, it was decided to link ITALMUD
to CompL-it by adopting standard linking strategies. The vocabulary selected for the
relations between senses with a certain degree of equivalence is SKOS, which is widely
used within the LOD community and in the domain of computational terminologies. For
all other types of relations (such as the hyponymy), it was decided to use the CompL-it
vocabulary, which includes many properties mapped to the Lexinfo, the main lexical
ontology, for OntoLex-Lemon, as well as numerous custom properties, specific to the
CompL-it model, which could not be mapped to LexInfo.
To handle the first type of cases introduced earlier, relative to the linking between
the senses conveyed by the entries, three different SKOS relations have been used.
A first example, where the relation skos:exactMatch has been applied, is relative
to term manna, understood as the miraculous food sent by God to the Israelites during their journey
through the desert, whose sense is also present in CompL-it, albeit with a less detailed
definition.
However, cases of perfect equivalence are relatively rare, due to the complexity of
the subject matter, which often calls for a cautious approach in order to avoid overly
strong assertions. For this reason, the relation that has been more frequently preferred
was skos:closeMatch. This property allows the expression of conceptual similarity
and limited interchangeability in specific practical contexts, without implying full
logical identity or triggering strong semantic inferences, as would be the case with
OWL properties such as owl:sameAs or owl:equivalentClass. An example of this case
is the entry Androgino (androgynous), which in ITALMUD includes additional nuances in its definition compared
to the sense described in CompL-it, although both refer to the same underlying concept.
In cases where the senses are more loosely semantically related, the skos:relatedMatch
property is used. This relation expresses a meaningful semantic association between
concepts, without implying interchangeability or logical equivalence, as for the case
of levita (levite), member of the Levi tribe.
As for the second type of linking, it undoubtedly involves a highly extensive undertaking
in order to find the most appropriate “anchoring” of talmudic words to the Italian
lexicon. Indeed, the CompL-it vocabulary of relations is exceptionally rich and allows
for a very fine-grained semantic description of entries. A large-scale project aimed
at mapping and establishing links between the entries of the two resources will be
the subject of future work. By way of example, a possible linking case is provided
involving the entry shofar, defined as “a horn of a kosher animal (usually a ram) that was blown on various occasions [...].”
The sense of word shofar, in ITALMUD, could be linked via the hyponymy relation to the sense of the word corno (horn) in CompL-it, defined as “an ancient wind instrument,” and through the relation madeOf to the sense of corno as “a horn-like or bony protrusion found on the head of many animals.” It may also be connected to the sense of suonare (to blow/play) via the Object of Activity relation, which describes the action for which an object is used.
As for the third type of linking, it involves creating a new sense corresponding to
the meaning of the entry in ITALMUD, which does not match any of the senses of the
entry in CompL-it. An example is racimolo, which in CompL-it has only the sense of
“a small bunch of grapes,” whereas in ITALMUD it also means “Ears of grain fallen in the field during harvesting, which under certain conditions
are the rightful property of the poor (Lev. 19:9–10, 23:22).” Consequently, the iTALMUD sense is linked to the CompL-it entry using the sense relation,
as specified in the OntoLex-Lemon model. An illustrative schematic of these cases
is shown in Figure 2.
Figure 2.
The three types of linking used to connect ITALMUD to CompL-it. In the first two cases,
senses are linked to senses, while in the last case, a sense of ITALMUD is linked
to a lexical entry of CompL-it, “abisso”, which in the lexicon has only one sense.Each of the CompL-it words mentioned is itself connected to other words through additional
relations. This enables the construction of highly sophisticated queries (see Section
5).
4. ITALMUD in Numbers
The trilingual terminological resource produced through this work comprises a total
of 1,613 Talmudic terms, including 823 terms in Italian, 773 in Hebrew, and 17 in
Aramaic. Of the Italian terms, 782 have a Hebrew translation, 17 an Aramaic one and
4 terms have both a Hebrew and an Aramaic translation.
From the point of view of lexical entry type, 1,242 are classified as “Word” (625
in Italian, 601 in Hebrew, and 16 in Aramaic), and 371 as “Multiword Expression” (198
in Italian, 172 in Hebrew, and 1 in Aramaic). A schematic representation of the distribution
of terms by language and type is shown in Figure 3.
Figure 3.
Distribution of lexical entries by language and type. The chart compares the number
of entries classified as Word and Multiword Expression across Italian, Hebrew, and
Aramaic. Italian and Hebrew show a predominance of Word entries, while Aramaic includes
only a small number of terms.As for the PoS of the Italian terms, as noted in Section 3.2, no syntactic categories
were assigned to multiword expressions. Of the remaining 625 Italian terms, 606 are
nouns, 12 adjectives, 4 interjections, 2 adverbs, and 1 conjunction. The PoS of the
Hebrew and Aramaic terms, of course, are the same as the Italian “Word” terms linked
to them by translation relations. However, there are 11 Italian multiword expressions
with Hebrew or Aramaic single-word translations, all of which have been assigned the
PoS “Noun”, as they are all nouns.
The total number of terms that have so far been linked to the CompL-it lexicon is
227. Of these, 158 terms have been connected, via their respective senses, to senses
in the CompL-it lexicon through three types of SKOS relations: skos:exactMatch (68
links), skos:closeMatch (72 links), and skos:relatedMatch (18 links). The remaining
69 terms have been linked, again via their respective senses, to lexical entries in
CompL-it using the ontolex:isSenseOf property. The connection between ITALMUD and
CompL-it has been technically implemented through a dedicated linkset, used as a “bridge”
resource.
5. Use and Applications
ITALMUD is an open and freely downloadable resource16. The linkset connecting it to
the CompL-it lexicon is also available for download at the same address. The resource
can additionally be queried via SPARQL through a dedicated web interface17. Turtle
was chosen for representing the resource — a serialisation format for RDF data designed
to be both readable and compact. Turtle expresses information as “subject–predicate–object”
triples and allows the use of prefixes to shorten URIs18, making the content more
readable. It is a W3C standard widely used in the context of Linked Data and terminological
resources to describe knowledge in a formal, shareable, and interoperable way.
Below is an example of combined access to the three resources — ITALMUD, CompL-it,
and the linkset connecting them — aimed at demonstrating their potential use in the
study of Talmudic terminology.
The following query, based on the data currently available in the resources, illustrates
how semantic modelling changes the scale and nature of inquiry. Instead of manually
comparing isolated glossary entries, users can formulate repeatable queries over a
network of lexical and terminological relations. In this case, the query concerns
the search for Talmudic terminology related to the concept of “container”. A possible
formulation in natural language might be as follows:The example, based on the data
currently present in the resources, concerns the search for Talmudic terminology related
to the concept of “container”. A possible query expressed in natural language, aimed
at retrieving Talmudic terms in ITALMUD related to containers, might be as follows:
Retrieve all Talmudic terms that are related to containers.
By leveraging the connection established between ITALMUD and CompL-it, it is possible
to answer this query by first searching within the language lexicon for the lexical
senses of the word “contenitore” (container), its hyponyms (including indirect ones19), and/or the senses linked
by a usage relation to the senses of the word “contenere” (to contain). Then, the bridge resource — the linkset — can be used to identify
which of the resulting lexical senses have been linked to terms in ITALMUD. Technically,
to respond to this request the following operations are required:
- search in CompL-it for the senses of the word “contenitore” (there is only one) and the word “contenere” (which has three senses);
- retrieve, via the lexinfo:hyponym relation, the hyponym senses of the sense of “contenitore” (there are 337, of which 160 are direct and 177 indirect), and the senses linked via the compl-it:usedFor relation to the three senses of “contenere” (there are 233), the union of which (thus excluding senses in common) amounts to 390 senses;
- search the linkset to see whether any senses of Talmudic terms are linked — through one of the three sense relations (cf. 3.3.2) — to at least one of the 390 CompL-it senses retrieved in the previous step.
This query, available as a preset through the aforementioned query interface, returns
12 Talmudic terms related to containers — such as tasqà (sack), disaqqayà (saddlebag), and peyalè (cup) — along with their respective Hebrew translations, the definition, and additional
information.
To emphasize the multilingual nature of the resource, we have also added to the SPARQL
interface a preset query that starts from the Hebrew term “סִרְפָּד” (nettle) (which
can, if desired, be replaced with another term by directly editing the query) and
retrieves from the database the related terms — particularly the co-hyponyms, i.e.,
other types of plants, since they are hyponyms of the term “pianta” (plant), which,
in CompL-it, is the hypernym of “ortica” (nettle).
As these examples suggest, the integration of the produced terminological resource
(beyond its intrinsic value) with a computational lexicon such as CompL-it is not
merely an exercise in data linking, but opens up new perspectives in the study of
Talmudic terminology. Thanks to the alignment between lexical senses and Talmudic
terms, it becomes possible to carry out terminological research that would otherwise
require lengthy manual comparisons across heterogeneous sources, but can now be resolved
through formalised and repeatable semantic querying.
The use of the RDF formalism and the Turtle format also ensures that each term and
each relation are expressed in an interoperable way and in compliance with W3C standards,
facilitating the reuse of data in international and multidisciplinary research contexts.
The ability to simultaneously query the three resources — the Talmudic terminological
repertoire, the computational lexicon of the Italian language, and the linkset connecting
them — will make it possible, especially as the resources grow, to accurately map
shared or divergent semantic areas, to monitor lexical variation across different
periods and traditions, and to compare the evolution of concepts with their treatment
in modern languages.
6. Conclusions
This article has presented ITALMUD, a multilingual terminological resource based on
the glossaries developed within the framework of the Italian translation of the Babylonian
Talmud project. Through the adoption of the OntoLex-Lemon model and the principles
of Linked Open Data, the Talmudic terms have been formalised and additionally linked
to a computational lexicon of contemporary Italian (CompL-it). The overall work included
phases of normalisation, linguistic modelling, and the creation of a dedicated linkset,
transforming the originally “flat” list of Talmudic terms into a resource that can
be queried along semantic pathways inherited from the language lexicon. For this reason,
ITALMUD should also be understood as a methodological case study in how specialised,
multilingual, and culturally situated terminologies can be made computationally accessible
without detaching them from the expert knowledge on which they depend.
In addition to the outcome achieved with the release of the ITALMUD resource, we believe
that the methodology presented — anchoring a specialised terminology to a computational
lexicon of general language — can be fruitfully applied in other domains as well.
Linking terms to a structured lexicon not only makes implicit semantic relations explicit,
but also enhances access to the texts in which such terms occur. This enables scholars
and researchers to carry out targeted investigations, identify conceptual variants,
disambiguate meanings, and accurately trace terminological evolution within complex
textual traditions.
The future development of this work will primarily involve the expansion of the ITALMUD
resource, which will progress in parallel with the translation of new tractates of
the Talmud and the simultaneous production of additional glossary entries, which will
be converted into terms. Alongside this expansion of the terminology, semantic relations
will also be extended — both through the linking approach already outlined in this
article (further expanded to include additional semantic relations offered by CompL-it),
and by defining semantic relations among the Talmudic terms themselves.
It will be possible to directly involve Talmud scholars in the expansion of the ITALMUD
resource by providing them with access to the collaborative web tool Maia (#giovannetti2024), designed for the construction of lexical and terminological resources based on
the OntoLex-Lemon model and for linking them to annotated corpora.
A shared platform of this kind, based on Linked Open Data, would foster collaboration
among specialists, ensuring terminological consistency, traceability of translation
choices, and interoperability with other resources. It would also allow for the continuous
updating of terminology and semantic access to texts, facilitating interdisciplinary
research and comparative approaches across languages and traditions.
Funding
This work was supported by the TALMUD project and carried out within the scientific
collaboration between S.c.a r.l. Progetto Traduzione Talmud Babilonese and Cnr-Istituto
di Linguistica Computazionale “A. Zampolli”.
Notes
[1]




