Automatic acquisition of “noun+verb” idiomatic compounds in Korean
- 경희대학교 언어정보연구소
- 언어연구
- 제32권 제1호
-
2015.04253 - 280 (27 pages)
- 25
The state-of-the-art skills of computational linguistics pay attention to lexical semantics, because it has a potential to be used to improve language processing systems in terms of coverage as well as accuracy. In particular, utilizing multiword expressions is importantly regarded as one of the components to foster performance of language applications. Handling these expressions is particularly crucial in multilingual processing, such as machine translation. Amongst a variety of multiword expressions, the present study investigates “noun+verb” idiomatic compounds in Korean. These compounds are made up of a verb plus the verb’s syntactic object, and what the combination of the two words conveys is not equivalent to the sum of the meanings of the parts. In order to acquire the “noun+verb” idiomatic compounds in Korean in a fully automatic way, the current work exploits a syntax-annotated corpus (i.e. treebank) and three lexical hierarchies in Korean. The current work extracts the syntactic patterns from the development corpus (the Sejong Korean Treebank), calculates the selectional preferences each verbal item has with its objects, and identifies the idiosyncratic items with reference to the three lexical hierarchies (CoreNet, KorLex, and U-WIN). The result includes 548 idiomatic compounds, 70% of which are evaluated as satisfactory. (Nanyang Technological University)
Abstract
1. Introduction
2. Background
3. Methodology
4. Acquisition
5. Result
6. Conclusion
References
(0)
(0)