The Digital Library
TextGrid has acquired the online library from zeno.org; it is an extensive collection of texts in digital form, ranging from the beginning of the printing press up to the first decades of the 20th century. The collection is of particular interest to German Literature Studies as it contains virtually all the important texts in the canon and numerous other texts relevant to literary history whose copyright has expired. The same applies to Philosophy and Cultural Studies as a whole. For the most part, the texts are taken from textbooks and can therefore be cited; this is also the case with the remaining texts which predominantly stem from the digitalisation of first editions.
The texts in the zeno.org online library
The texts in the zeno.org online library are divided into the following categories:
- History (14 texts)
- Cultural history (113 texts)
- Art (12 texts)
- Literature (693 texts)
- Fairytales (58 texts)
- Music (81 texts)
- Natural sciences (20 texts)
- Philosophy (248 texts)
- Sociology (1 text)
- Reference works (27 texts)
TextGrid will make these texts available not only for reading, but particularly for further processing, e.g. in editions and text corpora. For this purpose, the XML files are converted into a valid TEI format during the course of the project, which will make an exact research into the texts possible.
As of 13 July 2011, the data stock of the literature folder is available for download.
Publication of the literature folder
So far part of the data stock – the fiction texts – have been processed for scholarly use (conversion into TEI and extensive mark-ups for more precise research) and published.
Should you find any errors or defects in the mark-ups, we would like to ask you to inform us (e-mail: katrin.betz(at)uni-wuerzburg.de). Please always give the URL and the exact context of the error.
In the TextGrid Repository Portal you will currently find a limited number of texts from the literature folder. This list is constantly being extended. You can use the following links to download the entire data stock of the literature folder as well as a schema on the data.
Download the published files: text and pictures (1,9gb)
Downlaod the published files: only text (384mb)
Download the file schema (subversion repository)
Licensing
The Editura publishing company (operator of zeno.org) has digitalised texts in the public domain and marked them in XML.
This means that the publishing company has the ancillary copyright to these digitalised, compiled and marked texts. TextGrid has acquired the licence to use this digitalised and XML-marked collection of texts, provided that Editura is mentioned (Creative Commons licence “by” version 3.0).
In order to relay the annotated data stock including the metadata with as few restrictions as possible, TextGrid will also make this data stock available under the Creative Commons licence “by” version 3.0.
The texts as such, i.e. the texts without annotations and without added metadata, are in the public domain. If the texts are already in the public domain, this is not affected by the licensing.
TextGrid has created a new database by processing and structuring the texts as well as editing the metadata; this database is automatically subject to own ancillary copyrights in accordance with general copyright regulations. These copyrights are also regulated by the Creative Commons licence “by” version 3.0. This means that the data stock of the Digital Library can be:
- reproduced, distributed and made available to the general public
- used to adapt and edit the content
- used commercially
Refer to: http://creativecommons.org/licenses/by/3.0/de/
Thereby TextGrid must always be named in the following form: TextGrid
Should you pass on data of the data stock that are protected, you should add the following information:
The work title by name is a modification of the data stock of TextGrid’s Digital Library, www.editura.de, and is published under the Creative Commons licence.
Work steps
1. Work steps performed
- Structural analysis of the text data: The data are structured in folders according to encyclopaedias/subject areas (history, cultural history, art, literature, fairytales, music, natural sciences, philosophy, and sociology); each folder contains subfolders (generally one subfolder per author which contains all the author’s works in one file).
- Enriching of the original data (ID, information on the work, structural disambiguation)
- Extraction of the metadata: The metadata on the individual works are located in various files – the information on the digitalisation source is stored in an external catalogue file; the information on the time and place of publication is located at the beginning of the author file as an unstructured free-form text. All metadata belonging to a certain work are assigned to the respective work via a specific transformation routine.
- Manual marking of the work level: The mark-ups do not allow an automatic division of the data into individual works; this is why the information on the works was added manually (initially for the literature folder, over 120,000 individual works). For this purpose, a user interface had to be created for displaying the data and processing them further.
- Filtering of the files by text type: For the literature folder the individual works had to be sorted according to their text type in order to make it possible to develop conversion routines specifically according to the text type. The existing user interface could be enhanced accordingly.
- Specifications for the mapping of the text types poetry, prose and drama
- Development of transformation routines for the mapping of the individual text types in the literature folder on TEI P5
- Structural transformation from <div> to <teiCorpus>
- Encoding of metadata that can be extracted automatically in <teiHeader>
- First adjustment of the data structure to the TextGrid architecture
- Integration of Adelungs’ dictionary and “Meyers Konversationalexikon“ (conversation dictionary) into the “Trierer Wörterbuchnetz” (Trier Dictionary Network)
- Creation of routines for the mapping of Adelung’s dictionary to TEI P5
2. Planned work steps
- Refining of the metadata, and development of a user interface for manually correcting the metadata
- Error analysis of the TEI marking and corrections
- Optimisation of the data structure with regard to the TextGrid architecture
- Additional structural analysis of the texts and more in-depth TEI marking
- Allocation of persistent identifiers for each work level
- Application and, if necessary, modification of the transformation routines for the remaining folders and dictionaries in the Digital Library
Note on funding
This collection of texts was acquired as part of the research project TextGrid (www.TextGrid.de, funding code 01UG0901A) with funds provided by the BMBF (“Bundesministerium für Bildung und Forschung” – German Federal Ministry of Education and Research). We therefore kindly ask you to add this note on funding when putting the data stock to further use.