1. Units annotated
There are four basic annotated units in the OGR:
- Segments: A single phonological segment. See Annotation 2: Segments.
- Words: A syntactically independent sequence of Segments.
Clitics are treated as Words. See Annotation 3: Words.
- Syllables: Sequence of Segments consisting minimally of a vocalic nucleus.
Since Words are syntactically rather than prosodically defined, a Syllable can
contain more than one word.
See Annotation 4: Syllables and meter.
- Lines: Sequence of both Words and Syllables forming a line of verse.
Units larger than the Word (e.g. laisses or paragraphs, manuscript pagination) are
encoded using TEI markup.
2. ANNIS and TXM versions
Annotation differs between the ANNIS and TXM versions.
- ANNIS supports multiple layers of tokenization and all units are represented as separate spans.
- TXM supports only word-level tokenization and all annotation is (as far as possible) realized as word-level tags.
This annotation guide refers to the ANNIS version unless otherwise specified.