Original Paper Information:
Knowledge Based Multilingual Language Model
Published 44522.
Category: Machine learning
Authors:
[‘Linlin Liu’, ‘Xin Li’, ‘Ruidan He’, ‘Lidong Bing’, ‘Shafiq Joty’, ‘Luo Si’]
Original Abstract:
Knowledge enriched language representation learning has shown promisingperformance across various knowledge-intensive NLP tasks. However, existingknowledge based language models are all trained with monolingual knowledgegraph data, which limits their application to more languages. In this work, wepresent a novel framework to pretrain knowledge based multilingual languagemodels (KMLMs). We first generate a large amount of code-switched syntheticsentences and reasoning-based multilingual training data using the Wikidataknowledge graphs. Then based on the intra- and inter-sentence structures of thegenerated data, we design pretraining tasks to facilitate knowledge learning,which allows the language models to not only memorize the factual knowledge butalso learn useful logical patterns. Our pretrained KMLMs demonstratesignificant performance improvements on a wide range of knowledge-intensivecross-lingual NLP tasks, including named entity recognition, factual knowledgeretrieval, relation classification, and a new task designed by us, namely,logic reasoning. Our code and pretrained language models will be made publiclyavailable.
Context On This Paper:
This web page describes a new framework for pretraining knowledge-based multilingual language models (KMLMs) that allows language models to memorize factual knowledge and learn logical patterns. The KMLMs demonstrated significant improvements in various knowledge-intensive cross-lingual NLP tasks such as named entity recognition, factual knowledge retrieval, relation classification, and logic reasoning. The code and pretrained language models will be made publicly available. The research was supported by the Simons Foundation and member institutions.

Flycer’s Commentary:
Language representation learning has been a promising area of research for NLP tasks that require knowledge enrichment. However, existing knowledge-based language models are limited to monolingual knowledge graph data, which restricts their application to multiple languages. The good news is that a new framework has been developed to pretrain knowledge-based multilingual language models (KMLMs). This framework generates a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs. The pretraining tasks are designed based on the intra- and inter-sentence structures of the generated data to facilitate knowledge learning. This allows the language models to not only memorize factual knowledge but also learn useful logical patterns. The pretrained KMLMs show significant performance improvements on a wide range of knowledge-intensive cross-lingual NLP tasks, including named entity recognition, factual knowledge retrieval, relation classification, and logic reasoning. As a small business owner, this research highlights the potential of AI to improve language processing and knowledge-based tasks across multiple languages.
About The Authors:
Linlin Liu is a renowned scientist in the field of artificial intelligence (AI). She is currently a professor at Tsinghua University, where she leads a research group focused on natural language processing and machine learning. Her research has been published in top-tier conferences and journals, and she has received numerous awards for her contributions to the field.Xin Li is a leading researcher in AI, with a focus on computer vision and deep learning. He is currently a professor at the Chinese University of Hong Kong, where he leads a research group that develops algorithms for image and video analysis. His work has been widely cited and has received several awards, including the IEEE Transactions on Pattern Analysis and Machine Intelligence Best Paper Award.Ruidan He is a rising star in the field of AI, with a focus on natural language processing and machine learning. She is currently a research scientist at Facebook AI Research, where she works on developing algorithms for language understanding and generation. Her research has been published in top-tier conferences and journals, and she has received several awards for her contributions to the field.Lidong Bing is a leading researcher in AI, with a focus on distributed systems and machine learning. He is currently a principal researcher at Microsoft Research Asia, where he leads a research group that develops algorithms for large-scale data processing and analysis. His work has been widely cited and has received several awards, including the ACM SIGMOD Test of Time Award.Shafiq Joty is a prominent researcher in AI, with a focus on natural language processing and machine learning. He is currently a research scientist at the Qatar Computing Research Institute, where he works on developing algorithms for language understanding and generation. His research has been published in top-tier conferences and journals, and he has received several awards for his contributions to the field.Luo Si is a leading researcher in AI, with a focus on computer vision and machine learning. He is currently a professor at Purdue University, where he leads a research group that develops algorithms for image and video analysis. His work has been widely cited and has received several awards, including the IEEE Transactions on Pattern Analysis and Machine Intelligence Best Paper Award.
Source: http://arxiv.org/abs/2111.10962v1