漢字結構情報를 이용한 漢字檢索 시스템 연구

이 논문은 현대 漢字學의 構形學 이론에서 제시하고 있는 漢字結構와 한자의 分解와 組合에 대한 이론을 바탕으로 한자검색 시스템을 연구한 것이다. 한자는 일정한 結構에 의하여 일정한 개수의 部件으로 분해할 수 있으며, 분해된 부건은 이미 알고 있는 특정 한자[부수자 포함]일 수도 있고, 의미를 가지지 않는 하나의 필획일 수도 있다. 그리고 이렇게 분해된 부건을 구조적으로 데이터베이스화하며, 이 데이터베이스를 이용하여 자음과 자의를 모르는 한자를 분해된 부건의 組合만으로 편리하게 검색할 수 있다.이와 같은 이론적 근거를 바탕으로 한자의 部件과 부호화된 조합 규칙인 結構形態를 이용하여 모든 한자를 구조적으로 데이터베이스화하며, 이를 ‘漢字結構情報 데이터베이스’로 정의한다. 데이터베이스 구축에는 몇 가지 원칙을 적용하였는데, ① 데이터베이스 구축 범위는 유니코드에 등록된 한자 74,474자로 하고, ② 모든 한자를 부수자[변형된 部首字 포함]를 포함하여 字音과 字義를 가지는 하나 이상 N개의 部件으로 분해하며, ③ 부건은 기본 운영체계에서 한글 음가로 입력 가능한 유니코드 Ext.A 영역의 한자까지로 제한하며, ④ ‘[心], [川], [乙], [牛]’ 등과 같이 변형된 部首字인 경우에만 별도의 입력창을 통해 입력한다. 이상과 같은 방식으로 구성된 한자결구정보 데이터베이스를 바탕으로 한자검색 시스템을 구축할 수 있는데, ① 검색하고자 하는 한자를 구성하는 최소한 하나 이상의 부건과 결구형태를 입력하는 부분과, ② 입력 요청된 결구정보를 분석하는 부분, ③ 한자를 구성하는 부건을 결구형태와 함께 구조화한 한자결구정보 데이터베이스 부분, ④ 한자결구정보 데이터베이스로부터 검색 결과를 출력하는 부분으로 구성된다.이렇게 구성된 한자검색 시스템은 ① 검색하고자 하는 한자를 구성하는 최소한 하나 이상의 부건과 결구형태를 입력하는 제1단계, ② 입력 요청된 결구정보를 분석하여 검색용 결구형태 정보로 변경하거나 결구형태 정보를 분리하는 제2단계, ③ 분석된 결구형태 정보를 바탕으로 SQL문을 생성하여 데이터베이스에 질의(Query)를 하는 제3단계, ④ 데이터베이스로부터 질의 결과를 받아 부수와 획수 순으로 정렬하여 복수 개의 한자를 화면상에 출력하는 제4단계로 검색을 진행한다.이 한자검색 모델은 한자의 字音과 字義를 정확하게 모르더라도 기본적인 한자의 분해와 조합에 대한 개념만 알고 있으면, 손쉽게 한자를 검색할 수 있는 것이 특징이다. 이러한 특징으로 인해 신출한자를 비롯하여 異體字簡體字略字, 또는 草書나 古漢字 등과 같이 표준 코드체계에 등록되지 않는 비표준한자를 이미지 형태로 데이터베이스화한 자료로부터 원하는 한자를 검색하는 데에 유용하게 이용할 수 있을 것이다. 뿐만 아니라 한자의 조합에 의한 폰트 제작 시스템이나 전자사전, 그리고 한자 입력기와 같은 각종 소프트웨어에도 응용할 수 있을 것이다.

This article is the study of Ideographic Characters Searching System, using IDS(Ideographic Description Sequence) that based on the theory of Ideographic Structure Characters in Han Ideographic Characters. Ideographic Characters can be analyze into some specific components, they are well known characters to include radicals or strokes to have no meanings. And we can construct database structurally by the union of these components with the kind of 12 IDC(Ideographic Description Characters). Due to using this database, we can search Ideographic Characters simply; nevertheless, it's the first meeting, never seen before. Based on this theory of Ideographic Structure Characters in Han Ideographic Characters, we can construct a new type database called IDS Database. To structure of database be applied some rule, as follow four items. ① Number of Ideographic Characters are 74,474 in unicode. ② All of Ideographic Characters analyze into components the numbers of N having pronunciations and meanings. ③ Components are limited within CJK Unified Ideographics Ext.A, and must be input to change Ideographic Characters by Hangul. ④ Radicals changed original shape, example of 㣺[心], 巛[川], 乚[乙], 牜[牛], are input through special input windows. Based on this IDS Database, we can construct Ideographic Characters Searching System. It is consist of four part, ① the part of input one more components at least with IDC, ② the part of analysis components and IDC, ③ the part of IDS Database, ④ the part of display the results of searching. And It has four step's processing for searching progress, ① the first step of input one more components at least with IDC, ② the second step of analysis components and IDC, or changing to IDS it is suitable for searching, ③ the third step of generating SQL Query to IDS Database, ④ the forth step of display the results of searching, getting from IDS Database. Thanks to use this Searching System, we can search Ideographic Characters simply, if you know that Ideographic Characters can be analyze into some specific components, and these components can be the union of its one; nevertheless, it's the first meeting, never seen before, and have no informations of it's pronunciations and meanings. Due to this feature, It will be use non standardization Ideographic Characters Searching System usefully, lake as Database of Variants Ideographic Characters, Database of Old Ideographic Characters, etc.

Ⅰ. 문제제기

Ⅱ. 漢字結構와 漢字의 分解및 組合

Ⅲ. 漢字結構情報데이터베이스

Ⅳ. 漢字檢索시스템 구성

Ⅴ. 과제와 전망

참고문헌

漢字結構情報를 이용한 漢字檢索 시스템 연구
The Study of Ideographic Characters Searching System, Using IDS(Ideographic Description Sequence)

(0)

(0)

(0)

(0)

漢字結構情報를 이용한 漢字檢索 시스템 연구 The Study of Ideographic Characters Searching System, Using IDS(Ideographic Description Sequence)

(0)

(0)

(0)

(0)

漢字結構情報를 이용한 漢字檢索 시스템 연구
The Study of Ideographic Characters Searching System, Using IDS(Ideographic Description Sequence)