TextGrid Special Issue [December 21, 2009]
TextGrid Acquires Collection of Humanities Texts
TextGrid has acquired the comprehensive collection of texts from the online library of zeno.org with financial support of the Federal Ministry of Education and Research (BMBF) and will make it available for the research community and the general public. This digital collection is the most comprehensive of its kind in the German-speaking areas and contains texts from the beginning of printing to the first decades of the 20th century.
A) Preparation for Scholarly Use: As announced in the press release on December 2, TextGrid will make the collection of texts available in a form suitable for scholarly use (for example, the conversion into TEI, with more extensive markup for more precise research). An initial partial version will be functional within a year. In order to include the interests of a variety of disciplines, all interested colleagues are invited to take part in a vote between the two Creative Commons licensing models that will determine the kind of reuse and the subsequent use of research data.
---> To the Vote on Licensing Conditions for Scholarly Use in TextGrid (with detailed information about the voting process) (German only)
B) Preparation for the General Public: As announced in the Cooperation Statement published today, with support from TextGrid Wikimedia will soon make the data in its current form available to the general public under the Creative Commons License by (attribution).
In addition, in an interview with Professor Fotis Jannidis you will hear more information about the importance of the “Digital Library” for research purposes and how TextGrid will prepare the data.
The entire TextGrid Team wishes you a festive holiday season and a Happy New Year 2010!
Inhaltsverzeichnis
Cooperation Statement from TextGrid, Wikimedia, and Creative Commons
2,347,703,384 Characters Worth of Culturally-Valuable Texts Freely Available
TextGrid, Wikimedia, and Creative Commons Germany[1] are cooperating to make an extensive collection of texts freely available.
The research group TextGrid recently obtained the texts of the online library zeno.org with financial support from the Federal Ministry of Education and Research (BMBF)[2]. This digital collection is the most comprehensive of its kind in the German-speaking areas and contains texts from the beginning of printing to the first decades of the 20th century.
TextGrid, Wikimedia Germany and Creative Commons Germany are now cooperating in order to make this collection of texts freely usable for the general public. Wikimedia will soon make the collection available with the assistance of TextGrid. Subsequent use of the texts will be possible without restriction if they are comprised of contents that are in the public domain (particularly in terms of the digitalized texts themselves). If additional data for providing access is included (bibliographic metadata, for example), it will be covered under the license CC-BY 3.0 de[3]. This license primarily requires the attribution of the licensor and is moreover recognized by the Free Software Foundation as “free license”[4].
“With the selection of the Creative Commons License, legal certainty will result for every subsequent user of the texts, since in addition to the copyright status of the actual work, the question of ancillary copyright is thereby resolved, and the licensor waives all rights resulting from this protection to the fullest extent possible,” explained Michael Weller of the European Academy of Computing in Law (Legal Project Management for Creative Commons Germany).
Every Internet user will receive free access to the data and can further process and reuse the data holdings as long as the attribution requirement is observed. For projects sponsored by the Wikimedia Foundation, new possibilities emerge: “With free access to the data, the projects carried out by Wikimedia Foundation, such as Wikisource and Wikimedia Commons, and their users can provide and link the works of the collection of texts in their accumulation of knowledge,” commented Mathias Schindler from Wikimedia Germany.
The general public as well as specialized research communities will benefit from this cooperation: “The primary task of the Digital Humanities is no longer digitalization, as it was in the 90s, but instead the methodically innovative development of structured data sets. With this cooperation we will make access to this information possible not only to research communities but also to the general public,” said Dr. Heike Neuroth, TextGrid Project Manager at the Lower Saxony State- and University Library Göttingen.
During the next three years, TextGrid will prepare the collection for scholarly use (for example, the conversion into TEI, with more extensive markup for more precise research) and make it available in a virtual research environment together with appropriate tools for further processing. The scholarly communities are requested to vote on the desired licensing conditions for their research data resulting from this textual foundation[5].
[1] Legal Project Management for Creative Commons Germany, supported by the European Academy of Computing in Law (EEAR) and the Institute of Law and Informatics at Saarland University (IFRI).
[2] Press release from the Georg-August-Universität Göttingen on 2 December 2009 (in German)
[3] Creative Commons License cc-by
[4] Statement of the FSF on the CC-License types BY and BY-SA
[5] Vote on the Licensing Conditions for Scholarly Use in TextGrid (in German)
Interview with Professor Fotis Jannidis on the Acquisition of the Online Library Zeno.org
Background information: Prof. Fotis Jannidis is head of the „Lehrstuhl für Computerphilologie und Neuere Deutsche Literaturgeschichte“ (Faculty in Computer Philology and the History of Modern German Literature) at Würzburg University. Since 2005 he has been a partner in TextGrid. He is mainly responsible for the integration of the online library zeno.org into TextGrid.
What importance does the acquisition of these texts have for your discipline, the study of literature?
Fotis Jannidis: For the study of German literature, this expanded range of data in TextGrid is of special interest since the collection contains nearly all of the important works of the literary canon up to the beginning of the 20th century, including 280 novels by German-speaking authors and numerous other texts of literary relevance.
What are you planning to do with the online library of zeno.org?
Fotis Jannidis: We will gradually convert the texts to TEI P5, a standard which should guarantee their long-term usage. Furthermore, it is especially important to us that the complete collection will be freely accessible and available for further processing, e.g. in editions and corpora.
Why is it necessary to refine the data for research purposes?
Jannidis: In its current state, it is not possible to carry out systematic research with this data. Although you can limit the search to single works or authors, you cannot search for other metadata. For most researchers, this is definitely an important way of accessing the texts. You could not only search the works of a certain author but also, for example, in all poems in the library which were published in the 18th century.
What other kinds of usage do you expect?
Jannidis: Reading, on the one hand, although that is certainly not the most important way of using the texts...
How so?
Jannidis: In the case of research literature, the most important way making use of these texts is definitely reading, whether on the screen or on paper. However, that is not the case for the canonical works under discussion since many are readily available in multiple editions for convenient offline reading. The situation with dictionaries and encyclopedias is a little different because in this way they are more easily accessible. Still, the texts will arrive just in time for the new generation of digital text readers.
Are there other possibilities for using the data?
Jannidis: We expect that the usage of this data in individual research contexts will become increasingly natural, e.g. when compiling corpora for quantitative analysis or collating variant sources. Editors will also be able to use the text as a basis for their own editions or to offer important contexts for their edited texts and to link their texts with them.
Many editors refrain from those sources at the moment because of the instability of the links. How will you address this hesitation?
Jannidis: All of the digital library texts will be available via persistent internet addresses so that long-term stable references to all texts in TextGrid will be possible. In this case we are relying on standards which have already been successfully used by other digital libraries.
What other types of users would you like to address with the tests?
Jannidis: User interest in these texts is naturally not limited to the editors of digital editions such as literary scholars, philosophers, historians and musicologists. Other types of users include historical linguists who seek to create diachronic corpora; researchers who aim to examine texts with quantitative methods; scholars who concentrate on a smaller number of texts and therefore work more intensively with them (through extensive markup of names and dates, for example); and, of course, all researchers who need a large amount of text for developing new text technologies. These texts are now freely available for everyone interested in German literature, history, philosophy and culture. The Internet has constantly surprised us in positive ways, as when free data, software and services were used for unexpected, innovative purposes. We would all be very pleased if something similar happened to these texts as well.
TextGrid Newsletter
If you would like to get in contact with the project team, please send an email.
This newsletter is a joint effort of all TextGrid partners. You can subscribe to it or cancel your subscription here. There is also an archive of past newsletters there.

