Several attempts were made to study the linguistic features of the Arabic translations of the Gospels in order to identify the different textual traditions. The majority of these projects use verbal agreement between texts to define their identities. In some cases, other techniques were used on a reduced scale in analyzing selected texts. The limitations of these attempts lies in the fact that the studies used only a small number of readings selected from only a few manuscripts and that the selection was not formalized or automated.
To fill these gaps, our project offers automated linguistic corpus processing features. All transcribed texts are subject to a morphosyntactic annotation. Lexical, grammatical and inflectional properties (tense, grammatical mood, grammatical voice, aspect, person, number, gender and case) are associated with the annotated text. These linguistic properties allow the system to perform complex searches based on abstract representations of a specific word, sentence, paragraph, syntax and occurrence.
In order to formalize all possible verbal tokens, we defined a taxonomy of inflectional classes for Arabic verbs. This taxonomy allows the system to encode simultaneously in the lexical representation three variations: inflectional, morphophonemic and orthographic.