Language Grid

Overview

Recently, the number of languages used on the Web has increased rapidly, as more people communicate via the Internet. Although there is a huge volume of language resources on the Internet, such as language data and language processing software, thereare difficulties in utilizing language resources for supporting intercultural collaboration without the involvement of linguistic professionals. This is caused by complicated contracts and intellectual property rights, and the variety of data structures
and program interfaces.

We developed the Language Grid which allows users to share language resources as services on the Internet, based on a collective intelligence approach. By accessing the Language Grid, users can employ language services provided by universities, research institutes, and companies. Moreover, users can also publish new language services by freely composing those language services according to their needs.

In the project, we focus on the global expansion of the Language Grid, system development based on the Language Grid, and language services generation like bilingual dictionary induction. Especially, research results on bilingual dictionary induction have been published in top international conferences and journals of artificial intelligence like IJCAI and ACM Transactions on Asian and Low-Resource Language Information Processing.

Research

Global Expansion of the Language Grid

We have developed and operated several systems based on the Language Grid. The Language Grid Playground showcases various language services provided by the Language Grid as well as the latest research and development efforts of this group. The Language Toolbox integrates several customizable multilingual support tools, which is releases as open source software and can be accessed in SourceForge and GitHub.

We conduct global research activities with researchers in the US, Europe and Asia to promotethe Language Grid. We have already successfully operated the Language Grid at Department of Social Informatics, Kyoto University since 2007 and built the Asia-wide Language Grid by establishing several Language Grid Operation Centers in Thailand (2010) to collect Southeast Asian language services, Indonesia (2012) to share Indonesia language services and Xinjiang, China (2014) to accumulate Central Asian language services. These operation centers are federated with each other to share 225 language services for users. Moreover, we are also connecting the language service infrastructures in ELDA/ELRA and US NSF projects that are built based on the Service Grid Server Software.

(Figure: Design concept of the Language Grid)

Bilingual Dictionary Induction based on Constraint Optimization

The design concept of Language Grid is shifting from language resources to language services to easily enable sharing and customizing language resources for multilingual communities. Therefore, one of the most important issues is how to enhance the registration of language resources of various languages, especially the low-resource languages. This research focuses on the automatic language service creation for low-resource languages.

For example, high quality bilingual dictionaries are very
useful in variety of tasks in natural language processing and cross-lingual information retrieval, but such resources are rarely available for low-resource languages especially for those that are closely related such as Uyghur and Kazakh in Turkic language family. This has been an obstacle in creating advanced systems like machine translators which are becoming increasingly important for overcoming the language barrier. Automatic extraction of bilingual dictionaries from large size of parallel corpora has long been studied and resulted in relatively high quality of output. However, the parallel corpora is also an expansive resource that is available in large scale only for popular languages, and for non-poplar languages, it remains sparse, dated, or simply unavailable. This makes such studies less applicable for poorly resourced languages. Therefore, using a third language to link two other languages is a well-known solution, which is called pivot-based approach. To apply the pivot-based
approach in low-resource languages, we propose bilingual dictionary induction based on constraint optimization and apply it in low-resource Indonesian tribe languages and other languages.

(Figure: Bilingual dictionary induction based on constraint optimization)

Multi-Language Support System for Symposiums using Language Grid

Simultaneous translation is always used in international symposiums to support participants from different nations. When multiple languages are involved in simultaneous translation, the cost becomes very high.

To reduce the cost for multilingual support in symposiums, we aim at developing the Multi-Language Support System for international symposiums by using the Language Grid and real-time summarization by humans. Moreover, we evaluate and improve the system by applying it in the laboratory
seminars and other multi-language activities.

(Figure: Multilingual support system for international symposiums)

Selected Publications

[Book]
  1. Toru Ishida Ed. The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, ISBN 978-3-642-21177-5. 2011.
  2. Yohei Murakami and Donghui Lin (Eds.) Worldwide Language Service Infrastructure. Springer, ISBN 978-3-319-31467-9. 2016.
[Chapter in Book]
  1. Yohei Murakami, Donghui Lin, and Toru Ishida. Service-Oriented Architecture for Interoperability of Multi-Language Services. Paul Buitelaar and Philipp Cimiano (Eds.), Towards the Multilingual Semantic Web. Springer, pp. 313-328, 2014.
[Journal]
  1. Mairidan Wushouer, Donghui Lin, Toru Ishida, and Katsutoshi Hirayama. A Constraint Approach to Pivot-based Bilingual Dictionary Induction. ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 15, No. 1, Article 4, November
    2015.
[Conference]
  1. Toru Ishida, Yohei Murakami, Donghui Lin, Takao Nakaguchi, Masayuki Otani. Open Language Grid – Towards a Global Language Service Infrastructure. The Third ASE International Conference on Social Informatics (SocialInformatics 2014), Cambridge,
    USA, 2014. (Invited talk)
  2. Toru Ishida, Yohei Murakami, Donghui Lin, Masahiro Tanaka, and Rieko Inaba. Language Grid Revisited: An Infrastructure for Intercultural Collaboration. 10th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS
    2012), 2012. (Invited talk)
  3. Jun Matsuno and Toru Ishida. Constraint Optimization Approach to Context Based Word Selection. International Joint Conference on Artificial Intelligence (IJCAI-11), pp. 1846-1851, Barcelona, Spain, 2011.
  4. Rie Tanaka, Yohei Murakami and Toru Ishida. Context-Based Approach for Pivot Translation Services. International Joint Conference on Artificial Intelligence (IJCAI-09), pp. 1555-1561, 2009.
  5. Toru Ishida. Language Grid: An Infrastructure for Intercultural Collaboration. IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2006), pp.96-100, 2006. (Keynote address)

Related Web Sites