Development of Analysis Rules for UNL Based Machine Translation

View With Charts And Images
Development of Analysis Rules for UNL Based Machine Translation

Chapter I

Introduction

Universal Networking Language (UNL) is a project under the auspices of the United Nations University (UNU), Tokyo, Japan. The mission of the UNL project is to allow people across nations to access information in the Internet in their own languages [1]. Hundreds of millions of people of almost all levels of education and attitudes of different jobs all over the world use the Internet for different purposes [2]. The last decade of the 20th century witnessed an un imaginary acceleration in the development of information technology in all fields of life. The decade also witnessed a great increase in the spread and popularity of the internet. Though information resources are at fingertip through Internet, but resources are not accessible as language barrier persists. English is the main language of the Internet. Understandably, not all people know English. Teeming millions are deprived to access the information repositories directly in native language. On the other hand, vast information resources in different languages could not be shared. Knowledge and information are scattered all over the world and remain mostly inaccessible due to non-machine representation and language barrier [3]. Translation is the only means to disseminate information but only with much effort and involving direct and indirect cost. Language barrier hinders progress at individual level, institutionally and nationally although nations are becoming interdependent and need to exchange information. Knowledge sources are to be shared globally as much as possible to advance civilization.

Among those who did their best to tackle this problem was the United Nations University/Institute of Advanced Studies (UNU/IAS). The institute conducted a review of all internationally available machine translation programs and finally decided to start devising a better, more efficient and more workable technique to develop a human language neutral meta- language for Internet. The result of the project is Universal Networking Language (UNL) [4]. The UNL project is a large scale international cooperation with the goal to provide information in the Internet in all national languages of the members of the United Nations. The goal is to eliminate the massive task of translation between two languages and reduce language-to-language translation to a one-time conversion to UNL. Once information written in one language is “enconverted” into UNL it will be able to be shared by anyone in the world [3]. That means the UNL is based on developing an intermediary language system whereby any written text can be converted to many languages (all languages involved in the UNL program) and simultaneously, all texts written in different languages can be converted to that particular language. For example, Bangla corpora, once converted to UNL, can be translated to any other language given UNL system built for that language.

Fig 1.1 UNL Systems

1.1 Background of Study

The UNL system was invented by United Nations University/Institute of Advanced Studies (UNU/IAS).

The United Nations University is an autonomous academic institute, which was established by the United Nations General Assembly in 1975 [5]. The University carries out works on the pressing global problems of human survival development and welfare through a network of research and postgraduate training centers and cooperating institutions in both industrialized and developing countries.The UNU/IAS is an advanced research and education institution with a flexible and multi-thematic programme orientation concerned with the interactions of social and natural systems. The UNU/IAS is currently active in the following relevant areas of research and education: Eco-restructuring for Sustainable Development; Mega cities and Urban Development; Multilateralism and Governance; and Science, Technology and Society. In January 2001, the United Nations University set up an autonomous organization, the UNDL Foundation, to be responsible for the development and management of the UNL Program.

The purpose of the UNL movement is to provide an infrastructure of knowledge for people to have equal opportunities to use without any language barrier and for computers to do intelligent processing using the knowledge. The UNL Program was launched in 1996 in the Institute of Advanced Studies of UNU. A vast amount of linguistic resources of the UNL as well as of the various native languages has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on.

1.2 Objectives

A framework has been developed for converting Bangla texts to UNL expression and vice versa. But for the limitations of word in word dictionary and analysis rules, the following are the motivate works for the project.

1. Analysis the Bangla words and formation the templates of words for continents, countries, districts, cities, divisions and capitals.

2. Analysis the rules to develop templates of morphological and semantic rules.

1.3 Contribution

In this project works, we have developed:

a) Template of universal words of Bangla Head Word for word dictionary.

b) Analysis rules to convert Bangla text to UNL Expressions and UNL expressions to Bangla texts.

1.4 Project Organization

This report is organized as follows.

Chapter II highlights the literature review of this project. Here we have extracted out the knowledge about UNL and the format of word dictionary as well as the format of analysis rules.

In Chapter III the structure of the Universal Networking Language and the process of translation has been discussed rigorously. In the structure of UNL we have discussed about universal words, UNL attributes, relation labels, UNL expression, UNL knowledge base and UNL system. The processes of translation have also been discussed in this chapter.

Chapter IV describes the about our proposed work which include bangle grammatical attributes, Templates for Dictionary For Continent, Country, Capital, District & City

In Chapter V experimental analysis has been performed. Mainly we discussed here about Morphological, syntactic and semantic analyses of Bangla words and sentences, Analysis and Generation rules for conversion of Bangla language sentences to UNL expressions.

Finally, the Chapters VI draw conclusions of this thesis and show ways for future works.

Chapter II

Literature Review

Literature Review

For completion of our project work we have gone through Universal Networking Language [2] where we have learnt about UNL Expressions , UNL relations, Attributes , Universal words and UNL knowledge base. All these are key factors for preparing the templates of bangle headwords for word dictionary analysis rules (morphological and semantic rules) in order to convert Bangla text into UNL expression. We have also gone through some research papers that describe the templates of dictionary entries of Bangla roots and primary suffix [29]. Morphological analysis of Bangla words for UNL [2], conversion of Bangla sentences for Universal networking language [31] and bridges Bangla to UNL [ 4].

For Bangla language processing, the research has been done for morphological analysis of Bangla words [20], parsing methodology for Bangla sentences [22, 23] and dictionary development of Bangla words [24, 25]. The suffix, prefix and inflexions are detailed in [26]-[28]. Therefore, in the literature no attempt has been taken to integrate the works for a concrete computational approach for conversion of Bangla texts into UNL expressions and UNL expressions to respective Bangla texts. To address this issue, this thesis has proposed a framework that will interface between UNL and Bangla considering case structure, parts of speech, different forms of verbs, nouns, adjectives and pronouns along with their prefixes, suffixes and inflexions, UNL compatible Bangla Word Dictionary, analysis rules, and generation rules.

Chapter III

Universal Networking Language

The UNL [5] has been introduced as a digital meta-language for describing, summarizing, refining, storing and disseminating information in a machine-independent and human language neutral form. This language intends to express meanings in standardized way. We think that a comprehensive description of UNL specification is necessary though it is available in the UNL website. The meaning of a native language sentence is expressed in UNL system as a hyper graph composed of nodes connected by semantic relations. Nodes or Universal Words (UWs) are words loaned from English and disambiguated by their positioning in a knowledge base (KB) of conceptual hierarchies. Function words, such as determiners and auxiliaries, are represented as attributes to UWs or nodes to provide additional information. The core structure of UNL is based on the following elements:

• Universal Words: Nodes that represent word meaning

• Attribute Labels: Additional information about the universal words

• Relation Labels: Tags that represent the relationship between Universal

Words i.e.between two nodes tags are the arcs of the UNL hyper graph.

3.1.Universal Words

Universal Words constitute the vocabulary of UNL and a basic element for constructing a UNL expression of a sentence or a compound concept [20]. Such a UW is represented as a node in a hyper graph. There are two classes of UWs from this viewpoint in the composition:

• Labels defined to express unit concepts and called “UWs” (Universal Words)

A compound structure of a set of binary relations grouped together and called

“Compound UWs”.A UW is a English-language word followed by a list of constraints. The following is the syntax of description of UWs in context-free grammar (CFG):

::= []

::= …

::= “(“ [ “,” ]… “)”

::= { “>” | “<” } [] |

{ “>” | “<” } []

[ { “>” | “<” } [] ] …

::= “agt” | “and” | “aoj” | “obj” | “icl” | …

3.1.1 Headword

The headword is an English word/compound word/phrase/sentence that is interpreted as a label for a set of concepts: the set is made up of all the concepts that may correspond to that in English. A basic UW (with no restrictions or constraint list) denotes this set. There are Restricted UW’s that are defined by constraint list. Extra UWs denote new sets of concepts that do not have English-language labels.

3.1.2 Types of Universal Words

A UW is an English language word with restrictions. UWs do not allow semantic ambiguity as a first principle. The reasons why English words are employed in UW construction are that (i) English is known by all UNL developers; (ii) and there are a lot of good bilingual dictionaries between a local language and English. A UW can express various levels of concepts depending on the restrictions and can be used to express a more specific or particular concept or an instance by giving attributes. The UWs are based on five concepts:

3.1.2.1 Basic UWs

These are bare headwords with no constraint list.

3.1.2.2 Restricted UWs

Restricted UW’s are headwords with a constraint list. Examples are given below:

stste(icl>express(agt>thing,gol>person,obj>thing))

state(icl>country)

state(icl>region)

state(icl> abstract thing)

state(icl>government)

3.1.2.3 Extra UWs

These are special type of restricted UW; for example:

rose (icl>flower)

3.1.2.4 Temporary UWs

Such concepts are not necessary to define. For example: http://www.undl.org/

3.1.2.5 Compound UWs

These are a set of binary relations that are grouped together to express a compound concept. A sentence itself is considered as a compound UW. Compound UWs denote compound concepts that are to be interpreted/understood as a whole so that one can talk about their parts all at the same time. A compound UW is expressed by a scope in UNL expressions. In the example below, “:01” indicates all of the elements that are to be grouped together to define compound UW number 01. An example and translation to UNL is given below: Women who wear big hats in movie theaters should be asked [to leave]

The UNL translation is as follows:

agt:01(wear(aoj>thing,obj>hat), woman(icl>person).@pl)

obj:01(wear(aoj>thing,obj>hat), hat(icl>wear))

aoj:01(big(aoj>thing), hat(icl>wear))

plc:01(wear(aoj>thing,obj>hat), theater(icl>facilities))

mod:01(theater(icl>facilities), movie(icl>art))

agt:01(leave(agt>thing,obj>place).@entry, woman(icl>person).@pl)

state(icl>express(agt>thing, gol>person, obj>thing))

state(icl>country)

state(icl>region)

state(icl>abstract thing)

state(icl>government)

3.2 Attributes

The attributes represent the grammatical properties of the words. Attributes of UWs are used to describe subjectivity of sentences [11, 12]. They show what is said from the speaker’s point of view: how the speaker views what is said. This includes phenomena technically [4,5]called speech, acts, propositional attitudes, truth values, etc. Conceptual relations and UWs are used to describe objectivity of sentences. Attributes of UWs enrich this description with more information about how the speaker views these state of affairs and his attitudes toward them. For example, the corresponding UW of play is “play (icl>do)”. If the word “play” is in the past form in the sentence an attribute @past is tagged with “play (icl>do)”. If it is the main word in the sentence then @entry will be tagged such as “play (icl>do), @entry, @past.

3.3 Relational Labels

The relation between UWs is binary that have different labels according to the different roles they play. A relation label is represented as strings of three characters or less. There are many factors to be considered in choosing an inventory of relations [13]. The following is an example of relation defined according to the above principles.

Relation: agt (agent)

agt defines a thing that initiates an action.

agt (do, thing)

agt (action, thing )

Syntax:

agt [“:”]“(“{|“:”}“,”{|”:”< Compound UW-ID>} “)”

An agent is defined as the relation between

UW1 – do, and

UW2 – a thing

Here UW2 initiates UW1, or UW2 is thought of as having a direct role in making UW1 happen.

Examples of “agt” relation:

John breaks … agt(break(agt>thing,obj>thing), John(icl>person)

Mary broke the window agt(break(icl>do).@entry.@past, Mary)

3.4 UNL Expression

The UNL expresses information or knowledge in the form of semantic network. UNL semantic network is made up of a set of binary relations where each binary relation is composed of a relation and two UWs that hold the relation. A binary relation of UNL is expressed in the following format: (, )

In , one of the relations defined in the UNL specifications is described. In and the two UWs that hold the relation given at are described.

3.5 Knowledge Base

UNL Knowledge Base (UNLKB) defines every possible relation between concepts. The possible relations are defined based on a hierarchy of UWs (UW System). The UW System is built up by inclusive relations between concepts according to property inference mechanism of concepts. The architecture of the UW System allows introducing and defining any concept no matter how particular or specific it is.

Such UNLKB is a semantic network comprising every directed binary relation between UWs. It plays two roles: 1) defines the semantics (concepts) of UWs, and 2) provides linguistic knowledge of concepts. Concepts of UWs and linguistic knowledge of the concepts are defined by possible relations each concept can have with others. Such UNLKB not only provides linguistic knowledge in the form that computer can understand but also provides the semantic background of UNL expressions, that is the UNLKB ensures the meanings of UNL expression.

3.5.1 Roles of the UNLKB

The UNLKB Defines Semantics of UWs: A UW is a label for a concept. Concepts labeled by UWs are defined by describing the set of possible relations that each concept can have with other concepts in UNLKB. Definitions of possible relations of a concept with other concepts describe the behavior of the concept. This behavior is the property of a concept in the sense that the descriptions of behavior characterize the concept and provide enough information for understanding the semantic structure of a sentence, which includes the concept.

The UNLKB Provides Linguistic Knowledge of Concepts: The behavior of a concept is considered as linguistic knowledge on the concept. This knowledge is used to provide semantic structure of sentences of natural languages. For example, an “author” is a “person”, who can take various actions that a person can take, such as writing something and something might be a book, and so forth. This level of knowledge is necessary to provide the semantic background of natural language sentences. Further knowledge, for example real world knowledge, will be established based on this linguistic knowledge, using the UWs. In the UNLKB, the semantics of UWs are defined using the UW system and linguistic knowledge of concepts is provided also based on the UW System.

In the UNL KB, all UWs are linked with each other through ‘icl’ (subclass), ‘iof’ (element/instance), or ‘equ’ (equivalent) relations. ‘icl’ links a UW of a subclass concept to the class concept UW; ‘iof’ links a UW expressing an instance to a UW of a class concept; and ‘equ’ links a UW to an equivalent UW. The UWs related to each other through ‘icl’, ‘iof’ and ‘equ’ relations make up a hierarchy of UWs. This hierarchy of UWs is the UW system. This UW system allows having multiple super-class concepts. Accordingly, the UW system is a lattice type of network.

3.5.2 Uses of the UNLKB

The UNLKB defines the syntax and semantics of the UNL. Such UNLKB is used in sentence analysis for disambiguation and in sentence generation for finding more general concepts when encountering a unknown concept to a target language. The UNLKB also is used to verify UNL expressions since it provides syntax and semantics of the UNL.

To fully utilize the functions of the UNLKB, all UWs (concepts) must be defined in the UNLKB. For convenience, the following templates are provided for defining UWs that express instances. With these templates, a UW that has the same restriction as one of these templates is not necessary to be defined in the UNLKB, and the corresponding template is used instead in referring to the UNLKB. For example, ‘UW (iof>person)’ is the template for ‘John (iof>person)’.Template UWs:

uw(iof>brand{>mark})
uw(iof>city{>region})
uw(iof>club{>organization})
uw(iof>committee{>organization})
uw(iof>company{>organization})

3.6 UNL System

The UNL system allows people to communicate with peoples of different languages in their mother tongue. The UNL is a common language to exchange information through computers, which can deal with natural languages [20]. The UNL system basically consists of language servers, UNL editors and UNL viewers. A conversion system from native languages into UNL is called “enconverter”, and one that deconverts from UNL into native languages is called “deconverter”.

3.7 Language Server

A Language Server consists of a deconverter and an enconverter [20]. The processes of “enconversion” and “deconversion” are provided by a Language Server which resides in the network of the Internet (Figure 3.1). The “enconverter” and “deconverter” are responsible for converting a particular language into UNL, and vice versa. The “Eenconverter” enconverts a language into UNL, while the “Deconverter” deconverts UNL into a native language.

The EnConverter and DeConverter are the core software in the UNL System. The EnConverter converts natural language sentences into UNL Expressions. The Universal Parser (UP) is a specialized version of the EnConverter. It generates UNL Expressions from annotated sentences with referring to the UW dictionary without using grammatical features. All UNL Expressions are verified by the UNL Verifier, and then to be stored in the format of UNL Document. The DeConverter converts UNL Expressions to natural language sentences. Both the EnConverter and DeConverter perform their functions based on a set of grammar rules and a word dictionary of a target language. Whether consulting the UNL Ontology and/or a co-occurrence dictionary in EnConverter or DeConverter is optional.

Figure 3.1 UNL language servers

3.8 Enconverter

EnConverter is a language independent parser that provides synchronously a framework for morphological, syntactic and semantic analysis [13]. It would be impossible to solve an ambiguity in morphological analysis without the use of syntactic or semantic information. Also, it would be impossible to solve an ambiguity in syntactic analysis without the use of semantic information.

An “enconverter” is a software that automatically or interactively enconverts natural language text into UNL. UNU/IAS developed a software for enconversion called “EnCo” which constitutes an enconverter together with a word dictionary, co-occurrence dictionary and conversion rules for a language.

Figure 3.2 shows the structure of EnConverter EnConverter operates on the nodes of the Node-list through its windows. There are two types of windows, namely the Analysis Window and the Condition Window. Two current focused windows are called “Analysis Windows (AW)”, circumscribed by the windows called “Condition Windows (CW)”.

EnConverter analyses a sentence using the Word Dictionary, Knowledge Base, and Enconversion Rules. It retrieves relevant dictionary entries from the Word Dictionary, operates on nodes in the Node-list by applying Enconversion Rules, and generates semantic networks of UNL by consulting the Knowledge Base.

EnConverter

Analysis

Rules(Bangla

To UNL)

Bangla-UNL

Dictionary

C

C

C

A

A

ni

ni+1

ni+2

Node List

A

B

E

D

C

Node-net

ni-1

ni+3

Knowledge Base

Figure 3.2 UNL Enconverter

3.8.1 EnConverter works as follows:

First, EnConverter converts enconversion rules from text format into binary format, or loads the binary format enconversion rules. The rule checker works while converting rules. Once the binary format rules are made, they are stored automatically and can be used directly the next time without rule conversion. It is possible to choose to convert new text format rules or to use existing binary format rules.

3.8.2 Convert or load the rules

Secondly, EnConverter inputs a string or a list of morphemes/words of a sentence of a native language.

3.8.3 Input a sentence

Then, it starts to apply rules to the Node-list from the initial state (see Figure 5). EnConverter applies enconversion rules to the Node-list through its windows. The process of rule application is to find a suitable rule and to take actions or operate on the Node-list in order to create a syntactic tree and UNL network using the nodes in the Analysis Windows. If a string appears in a window, the system will retrieve the Word Dictionary and apply the rule to the candidates of word entries. In this case, if a word satisfies the conditions required for the window of a rule, this word is selected and the rule application succeeds. This process will be continued until the syntactic tree and UNL network are completed and only the entry node remains in the Node-list.

Apply the rules and retrieve the Word Dictionary. Figure 3.3 shows the flowchart of enconversion process.

Figure 3.3 Flowchart of the EnConversion Process

Finally, the UNL network (Node-net) is outputted to the output file in the binary relation format of UNL expression.

3.8.4 Output the UNL expressions

With the exception of the first process of rule conversion and loading, once EnConverter starts to work, it will repeat the other processes for all input sentences. It is possible to choose which and how many sentences are to be enconverted.

3.9 Deconverter

A “deconverter” is software that automatically deconverts UNL into native languages [14]. It is important to achieve a high quality and correct results. It is also important that the basic architecture of the “deconverter” is widely shared throughout the world, in order to treat all languages with the same quality and precision standards [3, 12]. Technology developed for a language can be applied to other languages as long as the architecture is shared.

DeConverter transforms the sentence represented by an UNL expression – that is, a set of binary relations – into the directed hyper graph structure called Node-net. The root node of a Node-net is called Entry Node and represents the main predicate of the sentence. It then applies generation rules to every node in the Node-net respectively, and generates the word list in the target language. In this process, the syntactic structure is determined by applying Syntactic Rules, while morphemes are generated by applying Morphological Rules.

The DeConverter works in the following way. It first transforms the input of a UNL expression – a set of binary relations – into a directed graph structure with hyper-nodes called node-net. The root node of a node-net is called entry node and represents the head (e.g. the main verb) of a sentence. Deconversion of a UNL Expression is carried out by applying Deconversion Rules to the nodes of node-net. It starts from the entry node, to find an appropriate word for each node and generate a word sequence (a list of words in grammatical order) of a target language. In this process, the syntactic structure is determined by applying syntactic rules, and morphemes are similarly generated by applying morphological rules. The deconversion process ends when all words for all nodes are found and a word sequence of target sentence is completed.

Figure 3.4- UNL deconverter

Figure 3.4 shows the structure of DeConverter. DeConverter operates on the nodes of the Node-list, and inserts nodes from the Node-net into the Node-list through its windows. There are two types of window, namely Generation Window and Condition Window. Two current focused windows are called “Generation Windows”(GW), circumscribed by the windows called “Condition Windows”(CW).

DeConverter generates a sentence using the Word Dictionary, Deconversion Rules, and Co-occurrence Dictionary. It retrieves relevant dictionary entries from the Word Dictionary, operates or inserts nodes by applying Deconversion Rules, and makes word selection for natural wording by referring to the Co-occurrence Dictionary. The use of the Co-occurrence Dictionary is optional.DeConverter uses the Condition Windows (CW) for checking the neighbouring nodes on both sides of the Generation Windows (GW) in order to determine whether the neighbouring nodes satisfy the conditions for applying a deconversion rule or not. The Generation Windows (GW) is used to check two adjacent nodes in order to apply one of the deconversion rules. If there is an applicable rule, DeConverter will modify the grammatical attributes of these two nodes, and/or insert a node from the Node-net into the Node-list. This process will be continued until all nodes of the Node-net are inserted into the Node-list, so that the nodes of the Node-list compose the generated.

3.9.1 DeConverter works as follows:

First, DeConverter converts deconversion rules from text format into binary format, or loads the binary format deconversion rules. The rule checker works while converting rules. Once the binary format rules are made, they are stored automatically and can be used directly the next time without rule conversion. It is possible to choose to convert new text format rules or to use existing binary format rules.

3.9.2 Rule Conversion or Loading

Secondly, it inputs a sentence of UNL expresions and converts it into the Node-net, when the word entries are also retrieved from the Word Dictionary using the UW of each node.

3.9.3 Input of UNL Expressions and Word Dictionary Retrieval

Then, it starts to apply rules to the Node-list from the initial state (see Figure 5). DeConverter applies deconversion rules to the Node-list and inserts nodes from the Node-net. This process will end when either the Sentence Tail node of the Node-list appears in the left Generation Window or the Sentence Head node appears in the right Generation Window.

3.9.4 Rule Application

Except for the first process of rule conversion and loading, once DeConverter starts to work, it will repeat the second and third processes for all input sentences of UNL expressions. It is possible to choose which and how many sentences are to be deconverted.

The ability of a rule is designed to be able to describe on what condition to perform what operation using the grammatical features both or either defined by the rules and/or given in a Word Dictionary. With this, a set of enconversion or deconversion rules can be prepared for a desired language thus allowing the EnConverter or the DeConverter to deal with the language.

3.9.5 UNL Editor and Viewer

UNL editor is used to make UNL documents. UNL editor is linked to a language server equipped with an “enconverter” and a “deconverter” for a natural language. As the author writes a document, e-mail or any other text, in his/her language, UNL editor “enconverts” it into UNL documents. In this process, UNL expressions are produced automatically or interactively with the author.

Figure 3.5 UNL users

UNL editor also shows the input in a UNL document in the author’s native language, showing how the UNL editor understands the original document; hereby, the author can check the correctness of the “enconversion”. In this verification, the high accuracy of “deconversion” counts a lot. When it is found that the result is not correct enough, the author can either rewrite the original document or modify UNL interactively according to the guidance that is provided by the editor. Then the author can produce a UNL document as correct as is desired UNL viewer is used to see UNL documents in a user’s native language; UNL viewer utilize a language server when it deconvert, the UNL documents into the user’s native language.

3.9.6 How UNL System works

Any person with access to the Internet will be able to “enconvert” text written in their own language into UNL expressions using UNL editor. And likewise, any UNL expressions can be “deconverted” into a variety of native languages using the UNL viewer.The processes of “enconversion” and “deconversion” are provided by a Language Server which resides in the network of the Internet [21]. The “enconverter” and “deconverter” are responsible for converting a particular language into UNL, and vice versa. The “enconverter” “enconverts” a language into UNL, while the “deconverter” “deconverts” UNL into a native language.

The illustration above shows the case that a home page will be developed in Bangla, through UNL, and we will see this page in Spanish. The Bangla Language Server and the Spanish Language Server provide the conversion service.

When home pages are developed in Bangla, the UNL Editor recognizes the contents as Bangla and sends a request to the Bangla Language Server to “enconvert” the text. Once the Bangla text is “enconverted” to UNL, the Bangla Language Server sends the results back to the UNL Editor. Home page designers can now embed UNL into their pages.

Figure 3.6 UNL architecture

When we read this page in Spanish, the UNL Viewer recognizes the contents as UNL and sends a request to the Spanish Language Server to “deconvert” the text. Once UNL is “deconverted” to Spanish, the Spanish Language Server sends the results back to the UNL Editor.

The text – once converted to UNL – may be converted to many different languages. For example, home pages can be designed in one’s native language and then “enconverted” to UNL

before being uploaded. Once a home page is expressed in UNL, it can be read in a variety of languages.

Chapter IV

Proposed Work

In this chapter, we have discussed the development of templates of bangle head word for word dictionary and the development of analysis rules for converting text to UNL expression as follows:

4.1 Template of Bangla Word Dictionary

The UNDL foundation provides a dictionary format. The Word Dictionary is a collection of the word dictionary entries. Each entry of the Word Dictionary is composed of three kinds of elements: the Headword (HW), the Universal Word (UW) and the Grammatical Attributes. A headword is a notation/surface of a word of a natural language that composing the input sentence and it is to be used as a trigger for obtaining equivalent UWs from the Word Dictionary in enconversion. An UW expresses the meaning of the word and is to be used in creating UNL networks (UNL expressions) of output. Grammatical Attributes are the information on how the word behaves in a sentence and they are to be used in enconversion rules.

Each Dictionary entry has the following format of any native language word [5].

Data Format: [HW] {ID} “UW” (Attribute1, Attribute2,…) <FLG, FRE, PRI>

Here,

HW ß Head Word (Bangla word)

ID ß Identification of Head Word (omitable)

UW ß Universal Word

ATTRIBUTE ß Attribute of the HW

FLG ß Language Flag (we use B for Bangla)

FRE ß Frequency of Head Word

PRI ß Priority of Head Word

Format of an element of Bangla-UNL Dictionarywould be:

[HW]{} “UW (icl>restriction)” (Attributes) <B, 0, 0>

Bangla Head Word

Grammatical

Semantic

Morphological

Universal Word

Figure 4.1 Format of Word Dictionary

4.1.1 Development of Grammatical attributes:

Representing Universal Words (UWs) for each of the Bangla Head Word we need to develop grammatical attributes that describe how the words behave in a sentence. They play very important roles for writing Enconversion and Deconversion rules because a rule uses GA in morphological and syntactic analysis, to connect or analyze one morpheme with another to build a meaningful (complete) word and to examine or define the position of a word in a sentence. When we analyze the HWs for representing them in the word dictionary as UWs, we find all the possible specifications of the HWs as attributes named grammatical attributes, so that they can be used in the dictionary for making rules (EnCo and DeCo). For example, if we consider “cvwLÓ (pakhi) meaning bird as a head word, then we can use attributes N (as it is noun), ANI (as bird is an animal), SG for singular number and CONCRETE (as it a concrete thing which is touchable).

So, this word can be represented in the dictionary as follows:

[cvwL] {} “bird(icl>animal>animate thing)”(N, ANI, SG, CONCRETE)

Head Universal Word Grammatical Attributes

Similarly, we can represent the words avb (paddy), bvP& (dance) in the Word Dictionary as follows:

[avb] {} “paddy(icl>plant>thing)”(N, PLANT, CONCRETE)

[bvP&]{} “dance(icl>do)”(ROOT, BANJANT)

Some proposed grammatical attributes for developing Word Dictionary of Bangla words and morphemes, analysis and gerneration rules for encoversion and deconversion are shown in Table

Grammatical Attributes

Descriptions of attributes

Examples (Here we use Bangla/English words)

ADJ

adjective

fvj (good), my›`i (beautiful) etc.

ALT

alternative root

?? (gi), ?? (je) etc.

ABY

indeclinable

??? (and), ???? (for) etc.

BOCH

articles

?? (ti), ?? (ta), ???? (gula), ???? (guli) etc.

BIV

normal inflexions

???? (onto), ?? (oi) etc.

7TH

seventh Bivokti (Inflexion)

?(e), ?(oy), ??(te) etc.

5TH

fifth Bivokti (Inflexions)

???? (hoite), ???? (theke) etc.

3RD

third Bivokti (Inflexions)

?????? (dara), ???? (die) etc.

2ND

second Bivokti (Inflexions)

?? (ke) etc.

CEND

verb roots or nouns that are ended with consonant

co& (read), ai& (catch), ????? (London) etc.

CEG

verb roots of consonant ended groups

co& (read), ???? (write), ????? (Paris) etc.

CMPL

verbal inflexions that can combine with roots to make verbs for present and past perfect tense

???? (echhi), ??????? (echhilam) etc.

CHL

inflexions that are used for cholti language

??? (tam), ??? (lam) etc.

CONCRETE

solid thing

??? (land), ?? (house), etc.

FUT

verbal inflexions that are combined with roots to make future tense

?? (be) etc.

FEM

female person

?? (????? ), she (female) etc.

HON

respected pronouns

???? (you), ???? (he) etc.

HPRON

human pronoun

??? (ami), ?? (she) etc.

IMPR

verbal inflexions that can combibe with roots to make verbs for present and past imperative

? (o), ? (n) etc.

KPROT

the suffixes that are used after roots to create Nouns, Adjectives etc.

BK (ik), Ab (on) etc.

KBIV

verbal inflexions

B (i), B‡ZwQ (itechhi), †e (be) etc.

MNOUN

the suffixes that are added with roots to make nouns.

Av etc.

MADJ

the suffixes that are added with roots to make adjectives

Aš— etc.

MAL

male person

?? (?????), he (male) etc.

N

any noun

Kjg (pen), Avg(mango) etc.

NPRO

proper noun

????? (Dulal),-name of a person, ????? (Padma),-name of a river etc.

NCOM

common noun

gvbyl (Man), ??? (Cow), MvQ (Tree), gvQ (Fish) etc.

NMAT

material noun

Rj (water), evZvm (air), AvKvk (sky), ‡jvnv (iron) etc.

NABS

abstract noun

myL (happiness), `ytL (sadness) etc.

NP

noun phrase

??? ???? (by pen) etc.

NUM

number

? (5), ?(7), ?(9) etc.

NANI

not amimate

?? (book), ??? (pen) etc.

NGL

neglected pronouns

??? (you), ???? (you [pl]) etc.

PL

plural number

???? (amra), ?????? (tahara) etc.

PRON

pronoun

Avwg (I), Avgiv (we) etc

PSTEM

pronoun stem

??? (ama), ???? (toma) etc.

PROT

all suffixes

Av ( a), Ab (on), AvB (ai) etc.

1P

first person pronouns

Avwg (I), Avgiv (we) etc.

2P

second person pronouns

Zzwg (you) etc.

3P

third person pronoun

‡m cyi“l (He), ‡m gwnjv (She) etc.

PRS

the suffixes that are added with roots to create present indefinite form of the sentence

B (i), etc.

PRS

verbal inflexions that are combined with roots to make present tense

B (i), B‡ZwQ (itechhi) etc

PST

verbal inflexions that are combined with roots to make past tense

??? (lam) etc.

PRGR

verbal inflexion that are combined with roots to make present and past continuous tense

????? (itechi), ???????? (itechhilam),

PSUFF

primary suffixes

???? (onto), ?(a) etc.

ROOT

verb root

Pv (want), hv(go), co& (read), ai& (catch) etc.

SHD

inflexions that are used for shadhu language

B (i), B‡ZwQ (itechhi) etc.

SG

singular number

??? (ami), ???? (tumi) etc.

UROOT

consonant ?

????? (dul), ??? (khul) etc.

VEND

verb roots or nouns that are ended with vowel

Pv (want), hv(go) etc.

VEG

verb roots of vowel ended groups

Pv (want), hv(go) etc.

Some example entries of dictionary for Bangla are given below:

[?????] {} “Asia (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT) <B,0,0>

[????]{} “Dhaka(iof>capital>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST ,CITY, #PLC, #PLF, #PLT) <B,0,0>

[????????] {} “Bangladesh (iof>country>thing)” (N, NPRO, CEND, ASIA,#PLC, #PLF, #PLT) <B,0,0>

[?????] “Savar” (iof>place>thing)” (N, NPRO, CEND, UPZL, ASIA, #PLC, #PLF, #PLT) <B,0,0>

Where attributes N stands for Noun,NPRO means proper noun, PLACE for place, VEND for vowel ended , CEND for consonant ended respectively. Also Here

Grammatical Part: N, NPRO

Morphological Part: VEND

Semantic Part: CITY, CAPT

4.1.2 Procedure for retrieving universal words provided by the UNDL Foundation

To use the UNL platform, first we go to http://www.undl.org/unlpf/index.html. If already then enter register mail and click “OK”. [20].

Fig 4.2 Login screen of UNL Platform

After successful login, the workspace of UNL platform will be displayed. This workspace contains some menu items like a File, Conversion( NL> UNL).

Fig 4.3 UNL Platform Space

To check the UNL word dictionary entry for any word, go to File>New>Direct Input.

Fig 4.4 Input Word in UNL Platform

Then type desired word and select Conversion> Word Selection.

Fig 4.5 Enconversion Process

Then pop-up message box will be displayed on the screen. In this message box, select the language as English and then click OK.

Fig 4.6 Language selection for Enconversion Process

At this stage, a new tab on the workspace will be displayed named “Word Selection” and the word will be displayed in Bold, Brown color. Then take the cursor or mouse pointer over highlighted word and the existing word dictionary entries of that word will be displayed in a pop-up message.

Fig 4.7 Existing UNL Word Dictionary entry for input word

4.1.3 Development of Templates for Dictionary For Continent, Country, Capital, District & City

Based on process of developing Universal words and grammatical attributes discussed above.

The template of the dictionary is

[HW]{}“Universal Word”(N, NPRO/ NCOM/ NCON/ NABS, VEND/ CEND/ DIST/ CAPT/ CITY/ CONT, #PLC, #PLF, #PLT)

We have formats of word dictionary for Bangla Head word as follows.

4.1.3.1 Formats of Bangla Head Word for Continents

”Continent name;;

[?????] {} “Asia (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[?????] {} “Europe (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[???????] {} “Africa (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[???? ???????] {} “North America (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[???? ???????] {} “South America (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[???????????] {} “Australia (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[????????????] {} “Antarctica (iof>continent>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

[?????] {} “Asia (iof>place>thing)”(N, NPRO, VEND, #PLC, #PLF, #PLT)

4.1.3.2 Formats of Bangla Head Word for Capital & City

;; Asian Region Capital ;;

[?????]{} “Kabul (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????]{} “Dhaka (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[??????]{} “Thimpu(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Kathmandu (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????????]{} “Islamabad (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????????]{} “New Delhi (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????]{} “Male (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “Colombo (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[???????]{} “Ashgabat (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SASI, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????? ???? ????????]{} “Bandar Seri Begawan (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????? ???????]{} “Port Moresby (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????????]{} “Yangon (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[??????]{} “ Phnom Penh (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????]{} “Dili(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????????]{} “Jakarta (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????????]{} “Vientiane (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????????????]{} “Kuala Lumpur (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[????????]{} “Manila (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????????]{} “Singapore (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????]{} “Bangkok (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[??????]{} “Hanoi (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, SEAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[??????]{} “Manama (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, WAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[?????????]{} “Nicosia (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, WAS, ASIA, DIST, CITY, #PLC, #PLF, #PLT)

[??????]{} “Tehran (iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “ Baghdad(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Jerusalem(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????????]{} “Dushanbe(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “Amman(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????? ????]{} “Kuwait city(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????]{} “Beirut(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[???????]{} “Muscat(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????]{} “Doha(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “Riyadh(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[???????]{} “Damascus(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[???????]{} “Ankara(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[???????]{} “Tashkent (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “Bishkek (iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[???????]{} “Abu Dhabi(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????]{} “Sanaa(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, MEAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????]{} “Seoul(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????]{} “Tokyo(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Ulan Bator(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[??????]{} “Beijing(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Victoria(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????? ?????]{} “Pyongyang(iof>city>thing)” (N, NPRO, CEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????]{} “ Taipei(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Macau(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, NAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[?????????]{} “Armenia(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, WAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)

[????]{} “Baku(iof>city>thing)” (N, NPRO, VEND, CAPT, DIVS, WAS, ASIA, DIST ,CITY, #PLC, #PLF, #PLT)