Core solutions & quality

Home
Core solutions

The basis of language technology development in Icelandic

The purpose of developing speech recognition – speech to text – systems for Icelandic, is to enable developers of voice-based applications to add Icelandic easily to their products. Speech recognition, which enables users to interact with computer-controlled devices by speech rather than keyboard, has many uses, including in-car computer systems, the healthcare system, companies’ service centres, computer-assisted language studies, and supporting people who, because of disabilities, find it difficult to type.
Speech recognition products can be accessed here.
Speech synthesis for Icelandic, which is being developed to ensure that many different pronunciations of Icelandic can be recognised, will be integrated into software that requires automatic recital or voice answering.
Speech synthesis products can be accessed here.
Machine translation reduces cost by accelerating the translation process. It has the added benefit of protecting less-common languages by generating, for example, real-time translations for television programmes.
Machine translation products can be accessed here.
Spelling and grammar-correction software assists in the correcting and writing of texts. It is the core writing software for a wide range of users, including the general public, companies and organisations, second-language speakers, children, and dyslexic people. It is also important in the development of other types of language technology software, for example for search functions and to make optical-read text usable in a digital environment.
Spelling and grammar products can be accessed here.
The term language resources refers to data and support tools. Language technology data are organised into linguistic and text collections, and lexical and speech data. Support tools, which are used to transform raw data into appropriate formats for software development and training of language models, are also used to perform basic language analysis and processing as the first step in more complicated applications. Relevant data and reliable support tools are prerequisites for all language-technology development.
Language resources can be accessed here, and on the following links:
Icelandic Gigaword Corpus 1.
Icelandic Gigaword Corpus 2.
MIM-GOLD, a gold standard for PoS-tagging Icelandic texts.
MIM-GOLD, training and testing sets.
Database of Icelandic Morphology (DIM).
The Icelandic WordWeb.
ParIce: English-Icelandic parallel corpus.
ParIce Dev/Test/Train Splits.
RÚV TV data.
Support tools can be accessed here.

Quality assurance

Almannarómur is responsible for ensuring that the Icelandic Language Technology Programme’s products are of a sufficiently high standard to enable innovators, companies and individuals to base their development of Icelandic solutions on them.

SÍM has two Quality Assurance Teams: of software solutions and data. In addition, Almannaromur has set up a panel of distinguished international experts in the field of language technology.

International expert panel

The Association of Language Technology for Icelandic (SÍM) is contracted by Almannaromur to build the core solutions. SÍM has written the standards and guidelines for data and software, and ensures that the programme‘s deliverables adhere to those standards. Since the deliverables from SÍM form the basis for the ongoing development and commercialisation of the core project, the quality demands are high and Almannarómur has appointed two independent industry experts to review the standards and their execution.

The panel‘s role is to work with Almannarómur‘s Executive Director on the technical aspects of core projects. It keeps the technical description of the co-operation agreement between Almannarómur and SÍM under review, both in the context of the five-year Language Technology Programme and of the current international development of language technology solutions. It also reviews SÍM‘s progress reports before payments are made.

Panel of experts

Bente Maegaard is the former director of the Center for Sprogteknologi (Centre for Language Technology) at the University of Copenhagen’s Department of Nordic Studies and Linguistics. She was the Vice Executive Director of CLARIN ERIC from 2012-2018. Her main expertise is in research infrastructure, language resources, machine translation, evaluation methodology and governance issues. Bente has led a number of Danish and European research projects. She has served as the Chair of the ESFRI Strategy Working Group for Social and Cultural Innovation from 2019 to 2021. She was Vice-chair of the organisation Digital Humanities in the Nordic countries from its founding in 2015 to 2020.
Kadri Vider is a language technology expert at the University of Tartu’s Institute of Computer Science. She is also the CLARIN ERIC National Coordinator in Estonia and Executive manager of the Center of Estonian Language Resources. She has wide-ranging experience in developing and implementing language technology programmes for threatened languages, such as Estonian, and she was the programme coordinator for the National Programme for Estonian Language Technology, which is one of the models for the Icelandic language technology programme.
Steven Krauwer is an emeritus at Utrecht University in the Netherlands. He studied mathematics and general linguistics in Utrecht and Copenhagen, and was a teacher and expert in mathematical and computational linguistics at the Utrecht Institute of Linguistics at Utrecht University until his retirement in 2011. Steven has participated in and led numerous EU projects, primarily involving machine translation and other subjects within the field of speech technology. He has been a committee member of the Foundation for Endangered Languages since 2005. After retiring from Utrecht University, he became the first Executive Director of CLARIN ERIC, the central committee of CLARIN (Common Language Resources and Technology Infrastructure). After stepping down in 2015, Steven became senior adviser to CLARIN’s Board of Directors.

Quality Assurance Teams of SÍM and CLARIN

Software solutions

The Language Programme's software deliverables are packaged in such a way as to encourage further development and to facilitate their use in larger language technology systems. The members of the Quality Assurance (QA) Team who, between them, have decades of international experience in software development for industry and academia. They are:

Dipl.Inf. Daniel Schnell, from Grammatek ehf
Dr Hrafn Loftsson, from Reykjavik University
Vilhjálmur Þorsteinsson, from Miðeind ehf

Data

The data deliverables adhere to the FAIR-principles to ensure that data is accessible, interoperable, and reusable. The members of the Data QA Team are representatives from the CLARIN Centre in Iceland, together with research specialists in the text corpora and speech area are:

Prof.emer. Eiríkur Rögnvaldsson and Samúel Þorsteinsson, from the CLARIN Centre
Dr Eydís Huld Magnúsdóttir, from Reykjavik University
Steinþór Steingrímsson, from The Árni Magnússon Institute for Icelandic Studies

Core solutions & quality

The basis of language technology development in Icelandic

speech recognition

speech synthesis

machine translation

spelling and grammar correction

language resources

Quality assurance

International expert panel

Panel of experts

Bente Maegaard

Kadri Vider

Steven Krauwer

Quality Assurance Teams of SÍM and CLARIN

Software solutions

Data