Introduction to Linked Open Data in Linguistics

Thierry Declerck (DFKI GmbH, Germany) and John P. McCrae (National University of Ireland, Galway, Ireland)

Publishing language resources under open licenses and linking them together has been an area of increasing interest in academic circles, including applied linguistics, lexicography, natural language processing and information technology, also with the goal to facilitate exchange of knowledge and information across boundaries between disciplines as well as between academia and the IT business.

Until now this development has been discussed in workshops, datathons, and has also been at the core of the work conducted within the W3C Ontology-Lexica Community Group, whose final report has been published in May 2016 (Lexicon Model for Ontologies: Community Report, 10 May 2016)[1]. We see this development as an important step towards making linguistic data: i) easily and uniformly queryable, ii) interoperable and iii) sharable over the Web using open standards such as the HTTP protocol and the RDF data model.

While it has been shown that linked data has significant value for the management of language resources in the Web, the practice is still far from being an accepted standard in the community. Thus it is important that we continue to push the development and adoption of linked data technologies among creators of language resources, but also within curricula at universities and summer schools.

This proposed course for ESSLLI 2019 class has the main goal of giving people in the field of computational linguistics practical skills in the fields of linked data and semantic technologies as applied to linguistics and lexical data.

After developing a short initial ontology, participants will learn step by step how to represent multilingual data with their ontology and how to ground it linguistically. We will introduce a variety of state-of-the-art multilingual representation formats and application scenarios in which to leverage and exploit multilingual semantic data. Finally, we will detail the connection of lexical and corpus resources using the NIF[2] data format. At the end of the course, participants will be able to use Linguistic Linked Open Data (LLOD) for the semantic representation of linguistic data.

Students will also be made familiar with best practices for publishing their own linguistic data in the Linguistic Linked Data cloud (guidelines resulting from a past European Supporting Action, LIDER: http://www.lider-project.eu/ and further developed in the context of the European infrastructure project ELEXIS (https://elex.is/) and  the recently started H2020 project Prêt-à-LLOD (http://www.pret-a-llod.eu/), which addresses also industrial use cases for the LLOD ecosystem.

Both instructors of this proposed course have spent the last years on investigating the interface of lexical data and knowledge representation systems. We can refer to a number of publications together on various aspects of this intersection of ontologies and natural language resources. John was a driving force behind the development of the Lexicon Model for Ontologies (lemon) and its further development in the context of a W3C Working Group on Ontology-Lexica. Thierry has considerable experience in connecting the field of lexicography with the LLOD. John and Thierry have been teaching on the topics at a recent Linguistic Linked Data and Semantic Technology summer school (Eurolan 2015)[3], at datathons[4] and at ESSLLI 2018[5].


[1] https://www.w3.org/2016/05/ontolex/
[2] NLP Interchange Format (http://aksw.org/Projects/NIF.html)
3] http://eurolan.info.uaic.ro/2015/
[4] http://datathon.lider-project.eu/ and http://datathon2017.retele.linkeddata.es/
[5] http://esslli2018.folli.info/introduction-to-linked-open-data-in-linguistics/