LAUDATIO - Description

Documentation

Browse
LAUDATIO Documentation Browse example

Via 'Browse' you can have a look at the Documentation page to see the metadata for each corpus in the repository to inform yourself. Each corpus is documented with the LAUDATIO-metadata scheme (documentation for metadata TEI XML scheme) and gets an own corpus-' Browse'-page.

Each set of metadata provides information with respect to
1. 'Corpus' to answer questions such as
  • Who is building the corpus?
  • Which university, research group or project is involved?
  • What revision history does the corpus have?
  • Which license is assigned to the corpus?
2. 'Documents' to answer questions such as
  • Which historical texts are included?
  • Which editions or manuscripts are used as the base for digitization?
  • What kind of text extraction is chosen, e.g. sentences, excerpts, or the whole text?
  • Which annotation layers are applied to a single document?
3. 'Annotation' to answer questions such as
  • How many and which layers of annotation are there?
  • Which tag sets are applied in which format?
  • What does a specific tag mean in a given annotation layer?
4. 'Preparation Steps' to answer questions such as
  • Who annotated which annotation layer in which manner (semi-automatic, manual etc.)?
  • What kind of annotation tool is used?
  • Which conversion steps are done?

To get answers to the questions above, click on the section of interest and navigate through the information. If it is impossible to provide a certain kind of information, the value will set on 'NA' for not available by default. Notice that each CorpusCreator is responsible for the information provided by the metadata. Additionally, each corpus gets a persistent identifier (5) and a recommended form of for citation (6) for referencing purposes. On each single Corpus-'Browse'-page every format is available for download (7). If available, a corpus has a link to the search- and visualization system ANNIS (8).

Search
LAUDATIO Documentation Search example

Via 'Search' you can search through all corpus metadata at once to find your corpus of interest in the hit list (4). The metadata Search is based on the LAUDATIO- TEI XML metadata scheme. You have two overall search options; the ‘free metadata Search’(1) and the ‘faceted metadata Search’(2) which you can combine with each other, too.

Using the ‘faceted metadata Search” will reduce the hit list of all available corpora in the LAUDATIO-Repository, too. By clicking on a facet (2) and choosing a single value (3) the hit list will be reduced and only the corpora will stay in the hit list on which the facet values applies. For each facet you will get all possible values among which you can choose. Adding more facets values will reduce the hit list(4) step by step.

    The first group of facets covers metadata for the corpus itself:
  • name
  • project
  • format
  • date
  • size.
    The second group of facets covers metadata for each document of a corpus:
  • annotation layers
  • document date
  • size.

The hit list box(4) provides an overview of resulting corpora including the corpus title, project description, size, and list of documents.

Inserting a search term in the ‘free metadata Search’ (1) will reduce the hit list of all available corpora in the LAUDATIO-Repository and only those corpora. There are five options between which you can choose to get the best hit(4) for your metadata Search:

Partial match: Supports searching for e.g. with and without umlauted vowel 'Maerchen'. The result from search string will contain 'Maerchen' or 'Märchen'.
Exact match: Supports exact matches e.g. with umlauted vowel: 'Märchen'.
Fuzzy match: Supports fuzzy searching based on edit distance algorithm.
Match all: Supports searching with the operator \'AND\' e.g. \'jiddisch AND version\'.
Match any: Supports searching with the operator \'OR\' e.g. \'jiddisch OR version\'.

You can use the logical operators ‘AND’ and ‘OR’ within the ‘free metadata Search”. The operators must each be enclosed with spaces; otherwise, the Search string will be ignored.
An example for logical operators:
‘Märchen AND Linguistik’
‘Märchen OR Linguistik’

Note that the results for the free metadata Search may apply to different aspects of metadata.
Results for the word form ‘Märchen’ may find a title of a document or/and of a corpus or annotation layer.

Import
LAUDATIO Documentation import example

If you would like to upload your own corpus in the LAUDATIO-Repository, you need to be registered (create an account). If you already have an official Humboldt-University account, you can use it instead of creating a new account.

First, fill in the 'Corpus Name' with an ID which might be an acronym, e.g. 'laudatio:RIDGES' (1) and a 'Corpus Label' which should be the full name of the corpus, e.g. 'Register in German Science' (2). Upload the LAUDATIO metadata TEI XML file which will be immediately validated (3). You will get feedback as to whether the TEI XML file is valid or not. For an instruction on creating the LAUDATIO TEI XML metadata for your corpus switch to the equivalent tab. Then you can upload the corpus itself (4). A corpus may have several formats. You can upload a zipped file for each format. Add the display name of each format.

In a next step you can chose whether the corpus, its metadata and its formats will be published and free to use in the Browse (5) and Search (6) function of the LAUDATIO-Repository. If you would like to check the upload in your private account space first, it is possible to set the open-access checks later with the help of the Modify function. Finally, chose the Creative Commons license for your corpus (7).
Start the upload process!

Modify

For updating the LAUDATIO-metadata and the corpus data, CorpusCreators can use Modify-function. A CorpusCreator owns the account under which a corpus was uploaded.

    You can change the following corpus properties:
  • Corpus label:
    Changing the label will change the display name in the Browse-function.
  • Corpus name:
    Cannot be modified.
  • Corpus state:
    Changing the state will set the current version active or inactive. An active state of a corpus means that it is available at the Browse and Search-function in the repository.
  • Data streams:
    You can add, change and delete the corpus format files and update or add its metadata.
  • Data labels:
    Changing the label will change the display name in the Browse-function.
  • Data MIME types:
    Multi-purpose Internet Mail Extensions specifies the file name of a data stream and indicate the used/given file types.
  • Data state:
    Changing the state will set the data stream current version active or inactive. An active state of a corpus data means that it is available at the Browse and Search-function in the repository.
  • Set version:
    Defines whether the applied changes will create a new version of the corpus or replace the current version.
LAUDATIO metadata specification and scheme

The metadata for each corpus in the LAUDATIO-Repository is built on a customization of the TEI P5 format. The metadata will be uploaded for each corpus as a TEI XML file. This file is validated against the scheme which is built via the TEI specification file ODD (One Document Does it all). Note that there are several versions of the LAUDATIO-ODD and scheme. We customize the ODD according to the needs of the repository and the Users' perspectives. If you have feature request, please Contact us (laudatio-user@hu-berlin.de).

Current TEI scheme
Corpus
Download ODD
Download RelaxNG scheme
Download RelaxNC scheme
documentation
Document
Download ODD
Download RelaxNG Scheme
Download RelaxNC Scheme
documentation
Preparation
Download ODD
Download RelaxNG Scheme
Download RelaxNC Scheme
documentation
How to create TEI-metadata for your corpus

    The LAUDATIO metadata is composed of three components: Information about the corpus (CorpusHeader) itself, information about all historical documents (DocumentHeader) in the corpus and about all annotation layers (PreparationHeader). Here you get the technical Documentation of the LAUDATIO metadata.

    First, collect the following information if available about your corpus:

      1. CorpusHeader:
    • Corpus editor(s)
    • Corpus annotator(s), persons in charge for transcription and infrastructure
    • Annotation guideline(s) for all annotation layers in the corpus including a description for each annotation tag
    • Name and version of the corpus formats
    • Version number, publication date and revision steps
    • Project description and homepage URL
    • Licenses
    • Size
      2. DocumentHeader (one header for each historical document):
    • Title of the historical text
    • Author of the historical text
    • Editor of the historical text
    • Publication place, date, edition, collection, history of the historical text
    • Size (tokens or words)
    • Annotation layers in the document
      3. PreparationHeader (one header for each annotation layer):
    • Annotation editor(s)
    • Annotators, persons in charge for transcription and infrastructure
    • Formats and tools
    • Editing/annotating steps
    • Conversion steps
    • Version number, publication date and revision steps
    • Project description and homepage URL
    • Licenses
    • CorpusHeader template (scheme 7)
    • DocumentHeader template (scheme 7)
    • PreparationHeader template (scheme 7)

    Note that all headers need to be merged with the help of the teitool
    into the AllHeader (xml-file), which will be uploaded in the LAUDATIO-Repository.

    For example: The RIDGES Herbology Corpus gets
  • 1 CorpusHeader
  • 29 DocumentHeader
  • 78 PreparationHeader
  • which are merged with the teitool
    into the AllHeader (xml-file) which can be uploaded. The metadata in the AllHeader provides the information, which is available in the Browse- or Search-function of the repository.

    All corpora in the repository have such an AllHeader. You can download the TEI XML metadata of an already published corpus for copying purposes.