Arabic Nominals in HPSG: A Verbal Noun PerspectiveAbstract

view with charts and images
Arabic Nominals in HPSG: A Verbal Noun PerspectiveAbstract

Semitic languages exhibit rich nonconcatenative morphological operations, which can gen- erate a myriad of derived lexemes. Especially, the feature rich, root-driven morphology in the Arabic language demonstrates the construction of several verbal nouns such as gerunds, active participles, passive participles, locative nouns, etc. To capture this rich morphology by natural linguistic processing, the best choice can be Head-driven Phrase Structure Grammar (HPSG). It combines the best ideas from its predecessors and inte- grates all linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of natural language processing. Although HPSG is a successful syntactic theory, it lacks the representation of complex nonconcatenative morphology. In this work, we propose a novel HPSG representation which includes the morphological, syntactical and semantic features for Arabic nominals and various verbal nouns. We also present the lexical type hierarchy and derivational rules for generating these verbal nouns using the HPSG framework. Fi- nally, we have implemented the lexical type hierarchy, Attribute Value Matrix (AVM) and construction rules in the TRALE (An extension of the Attribute Logic Engine) platform to validate the proposed HPSG formalism.Chapter 1

Introduction

Head-driven Phrase Structure Grammar (HPSG) is an attractive tool for capturing com- plex linguistic constructs. It combines the best ideas from its predecessor – Generalized phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Govern- ment and binding theory (GB) [8]. It is very suitable for natural language processing as it integrates the essential linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of natural language processing. It is also flexible to modify for specific language.

1.1 Motivation

Semitic languages like Arabic, Amharic and Hebrew, exhibit rich nonconcatenative mor- phological operations for construction of lexicons. We can have a large coverage of vo- cabulary in these languages by computational linguistic modeling of their morphology. Among these Semitic languages, we have chosen Arabic for nonconcatenative morpholog- ical analysis. It is the best instance of nonconcatenative morphology among the living languages. More than two hundred and eighty million people speak in this language as a first language and it is official language of twenty two countries. It ranks fifth by number

CHAPTER 1. INTRODUCTION

of native speakers. Despite these facts, the morphological analysis of Arabic language is a relatively new area of research. It is also the intellectual and liturgical language of the Islamic World.1.2 Scope of the Work

The HPSG analysis for nonconcatenative morphology in general and for Semitic languages in particular are relatively new. However, the intricate nature of Arabic morphology motivated several research projects addressing the issues [1, 7, 40]. HPSG representations of Arabic verbs and morphologically complex predicates are discussed in [2–4]. An in- depth analysis of declensions in Arabic nouns has been presented in [18]. The diversity and importance of Arabic nominals is broader than that of their counterparts in other languages. Modifiers, such as adjectives and adverbs, are treated as nominals in Arabic. Moreover, Arabic nouns can be derived from verbs or other nouns. Derivation from verbs is one of the primary means of forming Arabic nouns, for which no HPSG analysis has been conducted yet.

Arabic nouns can be categorized based on several dimensions like derivation (derived from verb or noun), ending type (sound ending or weak ending), declension (declinable or indeclinable), etc. Based on derivation, Arabic nouns can be divided into two categories as follows:

1. Non-derived nouns: These are not derived from any other noun or verb.

2. Derived nouns: These are derived from other nouns or verbs. ?An example of a non-derived, static noun is à???? (h. i.sanun – which means “horse”):it is not derived from any noun or verb and no verb is generated from this word. On the? ? ? other hand, ?? ?? ?? (katibun – which means “writer”) is an example of a derived noun. This CHAPTER 1. INTRODUCTION? word is generated from the verb ?? ?? (kataba ) which means “He wrote” in English. This

simple example provides a glimpse of the complexity of the derivational, nonconcatenative morphology for constructing a noun from a verb in Arabic. In this work, we analyze and propose the HPSG constructs required for capturing the syntactic and semantic effects of this rich morphology.

An HPSG formalization of Arabic nominal sentences has been presented in [29]. The formalization covers seven types of simple Arabic nominal sentences while taking care of the agreement aspect. In [24], an HPSG analysis of broken plural and gerund has been presented. Main assumption in that work revolves around the Concrete Lexical Repre- sentations (CLRs) located between an HPSG type lexicon and phonological realization. But in that work the authors have not addressed other forms of verbal nouns including participles.

In this work, we analyze all type of verbal noun generated from strong (or sound) triliteral root verb. We analyze their derivation from verb, their syntactic and semantic information. We do not analyze derivation of any type of verbal noun generated from strong quadriliteral or weak verb. Because, All eight types of verbal nouns are derived from strong triliteral root verb and these derivations follow regular patterns. On the other hand, the pattern of derivation from quadriliteral or weak verb is not so regular. So, analyzing their derivations need more effort. Moreover, most of the time maximum three types of verbal nouns are derived from these type of root verbs.

1.3 Contribution

Our contributions towards the HPSG analysis of Arabic nouns presented in this disserta- tion are as follows:

• We formulate the structure of Attribute Value Matrix (AVM) for Arabic noun and extend the AVM for Arabic verb proposed in [2]. We make this design robust so that CHAPTER 1. INTRODUCTION

it can handle not only lexeme and word construction but also phrase and sentence construction.

• We capture the syntactic and semantic effects of Arabic morphology.

• We determine the placement of verbal nouns and its subtypes in lexical type hier- archy with proper justification.

• Generally, Arabic morphology is root pattern morphology. Different lexemes can be generated from same root, using different patterns. We utilize this root pattern morphology to design lexical rules to avoid the requirement of exhaustive lexical entry for four types of verbal noun derived from all strong triliteral root verbs. As a result, hundreds of verbal nouns can be recognized by barely associating the root verbs with set of lexical rules applicable for that root verbs. Thus, lexical entry in the dictionary is very much optimized.

• We implement the designed AVM, type hierarchy and lexical rules in TRALE (An extension of the Attribute Logic Engine) [34] which is a freeware system developed in prolog and integrates phrase structure parsing, semantic-head-driven generation and constraint logic programming with typed feature structures as term.

1.4 Organization of Rest of the Dissertation

Chapter 2 gives a background by explaining the linguistic concepts and necessary tools. It discusses about several linguistic topics ranging from morphology, syntax to semantics. Then it provides a sketch of Arabic grammar, mainly the morphology associated to its word construction. Next, it gives a brief introduction about HPSG, the mathematical theory of languages used in our thesis. At the last part of this chapter, a detail discussion is presented on related works done so far.

Chapter 3 presents our contribution to the development of a generic structure of the CHAPTER 1. INTRODUCTION

Attribute Value Matrix of Arabic noun. It also describes the type hierarchy of Arabic noun and its subtypes based on derivation dimension. Next, it discusses about the construction rules for four type Arabic verbal nouns derived from strong triliteral root verb. It also designs the lexical entry for other four types of verbal noun which do not follow rigorous regular patterns.

Chapter 4 gives a brief description of TRALE lexical compiler. Then, it shows neces- sary components of TRALE and how we implement our HPSG formalism using TRALE.

Finally, Chapter 5 gives the conclusion. In this chapter, we gives the concrete con- tribution of our work from a technical point of view. We finish this chapter by giving direction for further research on this topics.Chapter 2

Background and Related Works

The topics discussed in this chapter serve as a background of the rest of the thesis. In Section 2.1 we explain some theoretical linguistic which is necessary to develop linguistic models. Section 2.2 gives an introduction of morphology and more specifically morphology in Arabic language and its effect on other linguistic layers. Section 2.3 gives an overview of Head-driven Phrase Structure Grammar (HPSG). Finally, in Section 2.4, we present the state of the research works on HPSG modeling with emphasis on the Arabic language. In this chapter, we frequenly use Arabic alphabet. We present the transliteration of Arabic alphabet in the Table 5.1.

2.1 Theoretical Linguistics

Scientific study of human language is called linguistic. Among all branches of linguistic, theoretical linguistic is the most important for developing models of linguistic knowl- edge. The core subjects of theoretical linguistics are phonology, morphology, syntax and semantics. All parts of theoretical linguistics can be summarized as follows:

• Phonology: is the systematic use of sound to encode meaning in any spoken human language, or the field of linguistics studying this use. In other words, it is concerned

CHAPTER 2. BACKGROUND AND RELATED WORKS with the function, behaviour and organization of sounds as linguistic items.• Morphology: is the study of word formation. It is the study of the internal struc-

ture of words or in other words it is the study of the patterns of word formation in a particular language, description of such patterns and the behavior and combination of morphemes.

• Syntax: is the study of the principles and rules for constructing phrases or sentences in natural languages.

• Semantics: is the study of meaning. It typically focuses on the relation between signifiers, such as words, phrases, signs and symbols.

• Pragmatics: is the study the ways in which context contributes to meaning. It studies how the transmission of meaning depends not only on the linguistic knowl- edge (e.g. grammar, lexicon etc.) of the speaker and listener, but also on the context of the utterance, knowledge about the status of those involved, the inferred intent of the speaker, and so on.

• Discourse: is the study of connected speech. A discourse constitutes sequences of relations to objects, subjects or predicates. Discourse can be observed in mul- timodal/multimedia forms of communication including the use of spoken, written and signed language in contexts spanning from oral history to instant message con- versations to textbooks.

Although phonology is a significant part of theoretical linguistics, it is beyond the scope of this thesis. Because, it deals with language sounds and our works begins from the word formation i.e. morphology. For the background purpose, we discuss the concepts related to the morphology, syntax and semantic layer. We have taken the linguistic definitions from [25]. CHAPTER 2. BACKGROUND AND RELATED WORKS

2.1.1 Morphology

Morphology is the study of the internal structure of words or in other words it is the study of the patterns of word formation in a particular language, description of such patterns and the behavior and combination of morphemes. It can be thought of as a system of adjustments in the shapes of words that contribute to adjustments in the way speakers intend their utterances to be interpreted. A word is sometimes placed, in a hierarchy of grammatical constituents, above the morpheme level and below the phrase level. We will discuss more on the concept of constituents in Section 2.1.2.

A morpheme is the smallest meaningful unit in the grammar of a language. The word ‘dogs’ consists of two morphemes: ‘dog’, and ‘-s’, a plural marker on nouns.

A morpheme can be categorized based upon how it combines with other morphemes to form a word. Here are some kinds of morpheme types:

• Bound morpheme: A bound morpheme is a grammatical unit that never occurs by itself, but is always attached to some other morpheme. In above example, ‘-s’ is a bound morpheme.

• Free morpheme: A free morpheme is a grammatical unit that can occur by itself.

However, other morphemes such as affixes can be attached to it. In above example,

‘dog’ is a bound morpheme.

• Affix: An affix is a bound morpheme that is joined before, after, or within a root or stem. In above example, ‘-s’ is an affix.

• Root: A root is the portion of a word that carries the principle portion of meaning of the words in which it functions. It is common to a set of derived or inflected forms, if any, when all affixes are removed. A root is a stem also. In above example,‘dog’ is a root. Another example of root is ‘speak’. It carries the principle portion of meaning of this word. ‘speaker’ is not a root rather it derived from root. CHAPTER 2. BACKGROUND AND RELATED WORKS

• Stem: A stem is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. In the above two examples, ‘dog’ is a root and a stem. But, ‘speaker’ is a stem and ‘speak’ is its root.

• Clitic: A clitic is a morpheme that has syntactic characteristics of a word, but shows evidence of being phonologically bound to another word. Example of clitic can be ‘within’, ‘into’, etc.

Among these morphemes clitic is beyond the scope of this thesis. Root, stem and affix will be discussed after the discussion of morphosyntactic operations.

Morphosyntactic operation is an ordered, dynamic relation between one linguistic form and another. There are two kinds of morphosyntactic operations:

• Derivation – is the formation of a new word or inflectable stem from another word or stem. It typically occurs by the addition of an affix. The derived word is often of a different word class (or category) from the original. It may thus take the inflectional affixes of the new word class. Example – ‘speaker’ is derived from‘speak’. ‘Speak’ is root and stem also. ‘Speaker’ is a new stem which is derived from

‘speak’ by derivational operation. Here derivational affix (suffix) ‘er’ is used for this operation. The derived word ‘speaker’ is a stem but not root. This is because, it can be further analyzed into meaningful unit ‘speak’ which is the root of ‘speaker’. Another notable thing in this example is ‘speak’ is a verb where the derived word‘speaker’ is a noun. Thus the word class of derived word is changed from its root.

• Inflection – is variation in the form of a word, typically by means of an affix, that expresses a grammatical contrast which is obligatory for the stems word class in some given grammatical context. As an example, ‘speakers’ is inflected from the stem ‘speaker’. This inflection is necessary if ‘speaker’ is used for plural form. Here‘s’ suffix is used for inflection. The word ‘speakers’ is not a stem. Its category is CHAPTER 2. BACKGROUND AND RELATED WORKS

same as the category of ‘speaker’. Thus, it is different from derivation as syntactic category does not change here.

Morphology deals with two kinds of information.

• Firstly, what information is encoded by the morpheme. For example, we can take an Arabic word kataba – he wrote. A variety of information is encoded in this word and its other inflected or derived form. Some are listed below:

– Agreement: kataba –

???

? ?? ? – he wrote. Person – 3rd, Number – Singular,

Gender – Masculine., Mood – Indicative.? ??? – Event structure: kataba – ??? ?? ?? – he wrote. Tense – Past, Aspect – Perfect. – Agency: kutiba – ?? ??? – it was written. Voice – Passive. ? ?

– Illocutionary force: uktub – ??

? ? ??? ? – Write. Mode – Command.?? – Part-of-Speech: kitabun – á? ?? ????? – a book. kataba – ??? ??? – verb. ? ?? ?– Definiteness: al-kitabu – ?? ???? ?? – the book Determiner – Definite.? ???? – Complex Predicate: kattaba – ??– Causation. ?? – he made to write. Semantic relation

There are many more syntactic and semantic phenomena those can be expressed using morphology.

• Secondly, morphological process which is a means of changing a stem to adjust its meaning to fit its syntactic and communicational context. It encodes mor- phosyntactic operations. As an example, plural formation is a morphosyntacticCHAPTER 2. BACKGROUND AND RELATED WORKS

operation, whereas suffixation is a kind of morphological process that English uses to encode plural formation. The morphological process for concatenative and non- concatenative morphosyntactic operations are shown below:

– Concatenative operations are those where morphemes are linearly concate- nated. This process is also called Agglutination and the language that use it extensively, is called Agglutinative language. For example:? Prefixation: Morphemes concatenated at the front, e.g., clear – un clear

? Suffixation: Morphemes concatenated at the back, e.g., walk – walked

? Circumfixation: Morphemes concatenated both at the front and back, e.g., mind – un mindful

– Nonconcatenative operations are those where morphemes are nonlinearly embedded. The language that use this process frequently, is called Fusional language. For example:? Infixation: Root letter morphemes embedded at the middle, e.g., kataba

— kat taba

? Simulfixation: Front morpheme shifted to the back, e.g., e at — ate

? Modification: Middle vowel changed, e.g., man — me n

? Suppletion: Whole stem changed, e.g., go — went

In this thesis, we mainly focus on nonconcatenative operation as well as concatenative operation and give a mathematical formalism to capture their rich diversity of Arabic.

2.1.2 Syntax

Syntax is the study of the principles and rules for constructing phrases or sentences in natural languages. In addition, the term syntax is also used to refer directly to the rules and principles that govern the sentence structure of any individual language. There areCHAPTER 2. BACKGROUND AND RELATED WORKS

a number of theoretical approaches to the discipline of syntax. Some popular approaches among these are –

• Generative grammar,

• Categorical grammar,

• Dependency grammar,

• Stochastic/probabilistic grammars/network theories,

• Functionalist grammars.

Modern research in syntax attempts to describe languages in terms of such rules which are often addressed as construction rules. These rules are the base of generative grammar. Our current research is also on forming these rules. So, in the discussion of syntax we put much emphasis on construction.

A construction is an ordered arrangement of grammatical units forming a larger unit. Different usages of the term construction include or exclude stems and words. There are several kinds of construction. Some of these are –

• Apposition – is a construction consisting of two or more adjacent units that have identical referents. Example – My friend John.

• Clause – is a grammatical unit that includes, at minimum, a predicate and an ex- plicit or implied subject, and expresses a proposition. Example – It is cold, although the sun is shining. This sentence contains two clauses. It is cold – it is the main clause and although the sun is shining – it is the subordinate clause.

• Direct speech – is quoted speech that is presented without modification, as it might have been uttered by the original speaker. Example – Patrick Henry said, “Give me liberty or give me death”. CHAPTER 2. BACKGROUND AND RELATED WORKS

• Indirect speech – is reported speech that is presented with grammatical modifica- tions, rather than as it might have been uttered by the original speaker. Example- Patrick Henry said to give him liberty or give him death.

• Phrase – is a syntactic structure that consists of more than one word but lacks the subject-predicate organization of a clause. For example, the house at the end of the street is a phrase. It acts like a noun. Unlike clause, phrase lacks the subject- predicate organization.

• Sentence – is a grammatical unit that is composed of one or more words or phrases that generally bear minimal syntactic relation to the words or phrases that precede or follow it.. Example – I am reading a book. This sentence is a composition of three phrases.

• Stem – is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. It has been discussed in detail in Section 2.1.1.

• Word – is a unit which is a constituent at the phrase level and above. It is sometimes identifiable according to such criteria as being the minimal possible unit in a reply.

All these constructions can be classified into two categories. These are lexical con- struction and phrasal or combinatoric construction. Lexical construction deals with the forming of lexicon that is forming of words and stems. As an example, forming of speaker from speak is a lexical construction. On the other hand phrasal construction deals with formation of larger unit than word and stem. So, this type of construction forms phrase, clauses and sentences.

Constituent is an import concept in discussion of construction. A constituent is one of two or more grammatical units that enter syntactically or morphologically into a construction at any level. For example, the sentence, I eat bananas every day. – contains the following constituents:

1. Immediate constituents: I, eat bananas everyday CHAPTER 2. BACKGROUND AND RELATED WORKS

2. Ultimate constituents: I, eat, banana, -s, everyday

There are several related, cross-cutting and sometimes confusing concepts related to constituents. We explain the concepts at syntactic level. Syntactic constituents can be classified under syntactic category. A syntactic category is a set of words and/or phrases in a language which share a significant number of common characteristics. The classification is based on similar structure and sameness of distribution (the structural relationships between these elements and other items in a larger grammatical structure), and not on meaning. It is also known as syntactic class. Among the major syntactic categories there are phrasal syntactic categories like NP (noun phrase), VP (verb phrase), PP (prepositional phrase) and lexical categories that serve as heads of phrasal syntactic categories like noun, verb and others. For example a prepositional phrase (PP) is a phrase that has a preposition as its head. The definition is similar for noun phrase (NP) and A verb phrase (VP).

Constituents can perform syntactic functions in the construction. A syntactic function is the grammatical relationship of one constituent to another within a syntactic construction. There are various kinds of syntactic functions such as subject, predicate, object, complement, adjunct, modifier and others.

Syntactic functions are significant in categorical grammar. As HPSG is based on generative paradigm, here syntactic function are not used for syntax modeling. Here we model the syntax by construction rules.

2.1.3 Semantics

Semantics is the study of meaning. It typically focuses on the relation between signifiers, such as words, phrases, signs and symbols. In linguistics, it is the study of interpretation of signs or symbols as used by agents or communities within particular circumstances and contexts. The formal study of semantics intersects with many other fields of inquiry, including lexicology, syntax, pragmatics, etymology and others. The formal study of CHAPTER 2. BACKGROUND AND RELATED WORKS

semantics is therefore complex.

Semantics is very much related with reference. References are used for agreement. There are several types of agreements as mentioned in HPSG 94 [33]. Some of these are –

1. Index agreement: It arises when indices are required to be token identical. That is the value of semantic index of a lexicon needs to agree with the same value of semantic index of other lexicon.

2. Syntactic agreement: It arises when strictly syntactic objects (e.g. CASE values) are identified. That is the a lexicon has a syntactic requirement and this requirement can be fulfilled by other lexicon which has certain syntactic object value.

3. Pragmatic agreement: It arises when contextual background assumptions are required to be consistent.

Agreement is not syntactic in most of the languages. To show this, we consider this sentence – the beef sandwich at table six is getting restless. The referent of subject in this sentence is not “the beef sandwich” rather the customer who ordered it. Like English, agreement in Arabic language is not syntactic; rather it is semantic. Which properties of referents are encoded by agreement features is subject to cross-linguistic variation, but common choices include person, number, gender. In some languages, gender distinctions correspond to semantic sortal distinctions such as sex, human/nonhuman, animate/inanimate or shape. Arabic is an example of this type of language. So, here along with person, number and gender, human/nonhuman distinction must be preserved for agreement. We will discuss this with example in Section 3.1.3.

2.2 Arabic Morphology

Arabic is rich in nonconcatenative morphology. This nonconcatenative morphology is mainly root-pattern morphology. In this section, we introduce root-pattern morphology CHAPTER 2. BACKGROUND AND RELATED WORKS

and its effect in Arabic verb and verbal noun. Then, we discuss different types of Arabic verbal nouns.

2.2.1 Root-Pattern Morphology

Arabic verb is an excellent example of nonconcatenative root-pattern based morphology. A combination of root letters are plugged in a variety of morphological patterns with priory fixed letters and particular vowel melody that generates verbs of a particular type which has some syntactic and semantic information [3]. Root of any stem denotes a semantic core and vowel pattern bears the syntactic information. Derivation from common root but different pattern shares common meaning. Similarly, derivation from same pattern but different root shares common syntactic information. A particular combination of root- pattern brings fixed syntactic and semantic meaning. Root and pattern must co-exist and combination of root and pattern specify semantic meaning.

These information will be conceivable from the following figures. Figure 2.1 shows how different sets of root letters plugged into the same vowel pattern generate different verbs with same syntactic information. Similarly, Figure 2.2 shows how same set of root letters plugged into different vowel pattern generate two lexemes with completely different syntactic information. But at the same time, these two lexemes share related semantic meaning.

Besides vowel pattern, a particular verb type depends on the root class. This root class is determined on basis of the phonological characteristics of the root letters. Root classes can be categorized on basis of the number of root letters, position or existence of vowels among these root letters and the existence of a gemination (tashdeed). Most Arabic verbs are generated from triliteral and quadriliteral roots. In Modern Standard Arabic five character root letters are obsolete. Phonological and morphophonemic rules can be applied to various kinds of sound and irregular roots. Among these root classes, sound root class is the simplest and it is easy to categorize its morphological information. A CHAPTER 2. BACKGROUND AND RELATED WORKS

Root (k,t,b) Root (n,s,r)

kataba nasara(He wrote) (He helped)

stem stem

Pattern (_a_a_a)

Figure 2.1: Root-pattern morphology1: 3rd person singular masculine sound perfect active form-I verb formation from same pattern ( a a a)

sound root consists of three consonants all of which are different [37]. On the other hand, non-sound root classes are categorized in several subtypes depending on the position of weak letters (i.e., vowels) and gemination or hamza ( ??). All these subtypes carry mor-phological information.

2.2.2 Morphology in Arabic Verb and Verbal Noun

From any particular sequence of root letters (i.e., triliteral or quadriliteral or weak or sound), up to fifteen different verb stems may be derived, each with its own template or vowel pattern. These stems have different semantic information. Western scholars usually refer to these forms as Form I, II, . . . , XV. Form XI to Form XV are rare in Classical Arabic and are even more rare in Modern Standard Arabic. These forms are discussed in detail in [37]. Table 2.1 shows the semantic effect and example of the mostly used verb CHAPTER 2. BACKGROUND AND RELATED WORKS

Root (k,t,b)

kataba(He wrote) stem kaa ti bun(Writer)

Pattern(_a_a_a) Pattern(_aa_i_un)

Figure 2.2: Root-pattern morphology2: same root (k,t,b) contains same kind of semantic meaning

forms [i.e. Form I to X]. Every particular sequence of root letters may not have a meaning word for a particular verb form. As an example, the root sequence – k, t, b, does not have a meaning word for Form IX.

These morphological verb forms has no relation with the verb form based on events structure. There are three type of verb form based on event structure – perfect, imperfect and imperative. Perfect indicates that the event has been completed, imperfect indicates that the event has not yet been completed, and imperative indicates that the event is a command. It is worth mentioning that Form I has eight subtypes depending on the vowel following the middle letter in perfect and imperfect forms. Some types of verbal noun formation depend on these subtypes. Any combination of root letters for Form I verb will follow any one of these eight patterns. We refer these patterns as Form IA, IB, IC, . . ., IH. These subtypes are shown in Table 2.2 with corresponding examples. For example, the vowels on the middle letter for Form IA: nasara yansuru are a and u for perfect and imperfect forms, respectively. Similarly, other forms depend on the combination of vowels on these two positions. Not all kinds of combinations exist. In Form IH, the middle letter is a long vowel and there is no short vowel on this letter. In summary, we can generate different types of verbal nouns based on these verb forms, root types (position of weak CHAPTER 2. BACKGROUND AND RELATED WORKS

Table 2.1: Arabic Verb Form

?? ?? (kataba )

?? ?? (kattaba )?? ? ?? (kataba )

?? ?

? (aktaba )

?? ???? (takattaba )?? ? ??? (takataba )?? ????? (inkataba )

?? ??? ?? (iktataba )

?Ô??? (ih. marra ) CHAPTER 2. BACKGROUND AND RELATED WORKS letter or gemination) and number of root letters.Table 2.2: Subtype of Form I Root Verb

?å?? ?å?? (na.sara yan.suru )

?? ??å? ?? ?å? (d. araba yad. ribu )

???

??? (fatah. a yaftah. u )

?Ò?

?? Ö?Þ

(sami,a yasma,u )

Ð?????

?? ?? ??

??? ???

Ð??? (karuma yakrumu )

??? ?? ? (h. asiba yah. sibu )

??? (fad. ula yafd. ilu )

All these verb stems, derived from a single root verb, have different verbal nouns.

Table 2.3 shows the list of active participle and passive participle for all verb stems

including the root verb

??? ? ?? ? (kataba ). All type of verbal noun may not exist for a

particular form. In Table 2.3 passive participle does not exist for Form?IX.

2.2.3 Classification of Arabic Verbal Nouns

In this part, we discuss the eight types of nouns derived from verbs [22]: CHAPTER 2. BACKGROUND AND RELATED WORKS

Table 2.3: Verbal Nouns Derived from Different Forms

?? ?? (kataba )

?? ?? (kattaba )

?? ??? (kataba )

?? ?? ? (aktaba )

?? ??? (takattaba )

?? ???? (takataba )

?? ????? (inkataba )

?? ??? ?? (iktataba )

?? ?? ?? (iktabba )

?? ?????? (istaktaba ) ?? ?? ?? (katibun )

??? ??Ó (mukattibun )

?? ?? ??Ó (muk¯atibun )

?? ???Ó (muktibun )

?? ???Ó (mutakattibun )

?? ?? ???Ó (mutakatibun )

?? ????Ó (munkatibun )

?? ????Ó(muktatibun )

?? ??Ó (muktabbun )

?? ?????Ó (mustaktibun ) ?? ñ??Ó (maktuwbun )

??? ??Ó (mukattabun )

?? ???Ó (muk¯atabun )

?? ??Ó (muktabun )

?? ???Ó (mutakattabun )

?? ????Ó (mutakatabun )

?? ???Ó (munkatabun )

?? ???Ó(muktatabun )

?? ????Ó (mustaktabun ) CHAPTER 2. BACKGROUND AND RELATED WORKS

1. Gerund ( P?? ?Ó?

verb.

Õæ??? – ism ma.sdar )- names the action denoted by its corresponding ?2. Active participle ( ??? ???? Õæ??? – ism alf¯a,il )- entity that enacts the base meaning i.e.the general actor.

?3. Hyperbolic participle ( é????? ÜÏ? Õæ??? – ism almubalag? ah )- entity that enacts the base

meaning exaggeratedly. So it modifies the actor with the meaning that actor does it excessively.4. Passive participle ( ?ñ?? ?? Ü? Ï? Õæ??? – ism almaf,uwl )- entity upon which the base meaning is enacted. Corresponds to the object of the verb.

? ? ?? ?? ?

?? ?? ? 5. Resembling participle ( é î?? ? ÜÏ? é ? ?? ?? – al.sifatu’lmu?sabbahah )- entity enacting (or upon which is enacted) the base meaning intrinsically or inherently. Modifies the

actor with the meaning that the actor does the action inherently.

?? ?6. Utilitarian noun ( é??? Õæ??? – ism alalah )- entity used to enact the base meaning i.e. instrument used to conduct the action.

? ??7. Locative noun ( ????? Õæ??? – ism al.zarf )- time or place at which the base meaning is enacted.

8. Comparative and superlative ( ? ?? ?? ?? ?? Õæ???

– ism altafdil )- entity that enacts (or ? ? .

upon whom is enacted) the base meaning the most. In Arabic, this type of word is categorized as a noun, but it is similar to an English adjective.

Examples of these eight types of verbal nouns are presented in Table 2.4. Each of these types can be subcategorized on the basis of types of verbs. To understand complete variation of verb and its morphology we should have some preliminary knowledge of the Arabic verb [20]. CHAPTER 2. BACKGROUND AND RELATED WORKS

Table 2.4: Different Types of Verbal Nouns

Õ??? ??

éÓ??

Ðñ??Ó?

Õ??Ó?

Õ??? CHAPTER 2. BACKGROUND AND RELATED WORKS

2.3 An HPSG Primer

HPSG is highly lexicalized, non-derivational constraint-based, surface oriented grammat- ical architecture developed by Carl Pollard and Ivan Sag [32, 33]. It combines the best idea from its predecessors – Generalized phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Government and binding theory (GB) [8]. It combines linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) and for this reason, it is very attractive in Natural Language Processing. Its highly lexicalized prop- erty gives the flexibility to modify the lexicon depending on language to capture different features. A lexical entry, represented in AVM (Attribute Value Matrix), may describe the sign partially. Each lexical entry must have a type, and its subtypes are part of a big structure that forms the type hierarchy. Thus, HPSG is seen consisting of inheritance hierarchy of sorts with constraints of various kinds on the sort of linguistic object in the hierarchy [16]. There is no distinction between terminal and non-terminal nodes in HPSG. This is related to the fact that HPSG is a “fractal” [A fractal is a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole], every sign down to the word level has syntactic, se- mantic and phonological features encoded in a similar manner [31]. Thus we can work on a specific level or surface of this hierarchy and use unification to reuse and extend the structure.

HPSG includes grammar rules and lexical entities. Normally, the latter are not con- sidered to belong to a grammar. The formalism is centered around lexicons. This means that the lexicon is more than just a list of entries; it is in itself richly structured.

In HPSG terminology, the basic grammatical type is the sign, which is a formal rep- resentation of words, phrases and sentences. All human utterances are captured by signs. A rule that licenses a sign, is captured by another object called construct. Signs and constructs are formalized as typed feature structure which is a set of attribute-value pairs. Attributes are called linguistic objects. The value of an attribute may be either atomic or CHAPTER 2. BACKGROUND AND RELATED WORKS

complex i.e. function. Functions are those feature structures which are described using an attribute value matrix (AVM).

The generic construct of a sign is presented in Figure 2.3. The AVM basically maps features to feature structure. A feature in an AVM can be of two types: (a) category name, i.e., sort description and (b) agreement (or constraints), which is a list of attributes andtheir values.

Feature Value ? PHON?MORPH phonobj ?morphobj ? Phonology

Morphology ? SYN? SEM?? M synobjsemobjM ? Syntax?? Semantics??

An HPSG Sign

Figure 2.3: An HPSG Sign.

A construct is represented using a feature structure with MOTHER (MTR) feature and DAUGHTERS (DTRS) feature. The value of MTR feature is a sign and the value of DTRS is a nonempty list of signs. A typical description of a construct is shown in Figure 2.4. The licensing of signs follows the Sign Principle which states that “Every sign must be lexically or constructionally licensed. A sign is lexically licensed only if it satisfies some lexical entry, and constructionally licensed only if it is the mother of some construct ” [39].

HPSG modeling of any language starts from building a very detailed type hierarchy which is both linguistically motivated as well as captures the language independent con- straints. From this type hierarchy, the attribute value matrix for linguistic signs can be constructed. In this thesis, we use the Sign-Based Construction Grammar (SBCG) [38] version of HPSG. Unlike standard presentations of HPSG, where the type constraints form part of the signature of a grammar, the type constraints of SBCG are an essential part of CHAPTER 2. BACKGROUND AND RELATED WORKS

Feature Value

? MTR?

sign Mother?? ?DTRS list (sign)?List of Daughters

An HPSG Construction

Figure 2.4: An HPSG Construction.

the body of the grammar. A standard SBCG type hierarchy is shown in Figure 2.5.

From the type hierarchy, we know that every linguistic object can be modeled using feature-structure. There are two types of feature structures. Atoms are simple feature structures, which indicate the terminal value of various linguistic attributes. Functions are complex feature structure, which are expressed using attribute value matrix and can contain other feature structures as their feature values. Sign and cxt(construct) both are feature-structure. The attribute of signs are also feature-structure; phon-obj, syn-obj, sem-obj, etc. Frames are semantic representation of events. There are two types of constructions; phr-cxt (phrasal) and lex-cxt (lexical). There are also two types of signs; lex-sign and expression. For the detail of this type hierarchy, see [38].

In HPSG, the semantic information is expressed in Minimal Recursion Semantics (MRS), as developed in CSLI’s Linguistic Grammars Online (LinGO) project [10, 11]. Most semantic information in MRS is contained under the feature FRAMES. In this list, for verb there is a frame event-fr which contains a Davidsonian event variable and index-valued features such as act(or) and und(ergoer) [12, 13]. These variables are used for contain information which is used for agreement purpose also. In Section 2.1.3, we discuss about these semantic agreements. CHAPTER 2. BACKGROUND AND RELATED WORKS

feature-structure

function atom

cat phon-obj pos

sign syn-obj cxt sem-obj frame noun verb

lex-sign expression phr-cxt lex-cxt event-fr

word phrase … infl-cxt und-fr act-fr soa-fr

lexeme … deriv-cxt … act-und-fr act-soa-fr

si-lxm sc-lxm … und-only-fr act-und-soa-fr try-fr

trans-lxm

sr-lxm

to-be-split-fr write-fr

cause-fr

Figure 2.5: A Standard SBCG type hierarchy

2.4 Related Works

This section is dedicated for discussion of linguistic modeling of morphology related works. At the beginning of this section, we give an overview of overall works related to computa- tional modeling of Morphology. Then we put emphasis on HPSG modeling of morphology. As Semitic languages like Arabic, Amharic and Hebrew are rich in morphology, we give a glimpse on HPSG modeling of Hebrew as there are mentionable amount of works done in this area. At the end of this section, we discuss HPSG modeling of Arabic language and its morphology.

2.4.1 HPSG Modeling of Morphology

HPSG is one of the most successful grammars to process natural languages specially to process syntactic and semantic aspects but it has inadequate coverage on morphologicalCHAPTER 2. BACKGROUND AND RELATED WORKS

construction specially for nonconcatenative morphology. Nonconcatenative morphology is not so plentiful in the mostly used languages. But this phenomenon is abundant in Semitic languages such as Arabic, Amharic, Hebrew, etc. Among these Semitic languages, Arabic is the mostly used and very rich in nonconcatenative morphology. Its precious mor- phology attracts several series of research projects [1, 7, 40]. These research projects are mainly based on development of toolkit for Arabic morphological analysis. These projects are not based on compiler development rather these are dedicated for morphological an- alyzer which designs and implements finite state morphological models. From linguistic perspective, these models describe rules of lexicon development and derive lexicons.

Morphology of Sierra Miwok and French were modeled in HPSG by phonological realization [5]. The author also showed how nonconcatenative morphology can be captured by his framework. He further mentioned the idea how consonant and vowel melody forms the word in Arabic. But he did not show any construction rule for any language.

Susanne modeled concatenative morphology in German and English by HPSG formal- ism in 1998 [35, 36]. In that paper, she captured the morphological derivation by a special feature called MORPH-B which means morphological base. This MORPH-B feature serves the purpose of derivation. This MORPH-B feature can be used to capture non- concatenative morphology also. The alternative of this mechanism is lexical construction rule [38]. This is also widely used in HPSG modeling.

An HPSG formalism of morphological complex predicate is outlined [9]. Here the au- thor mostly focused on syntax and semantics of causative construction. He used lexical rule with semantic frames to capture morphological effect. As Japanese is an Agglu- tinative language, the morphology used here is concatenative morphology. Thus HPSG modeling of nonconcatenative morphology is still untouched.

As mentioned earlier, HPSG modeling of nonconcatenative morphology is relatively new area of research. There are few mentionable works in nonconcatenative morphology of Semitic languages. We discuss about this in detail in the Sections 2.4.2 and Section CHAPTER 2. BACKGROUND AND RELATED WORKS

2.4.2 HPSG Modeling of Hebrew

Semitic languages exhibit rich morphological operations. Both concatenative and noncon- catenative morphology are abundant in these languages. Among these languages, HPSG modeling of Hebrew is not new but it lacks its coverage on morphology. In 2000, Nathan Vaillette presented a paper on Hebrew relative clauses [41]. In this paper, he nicely mod- eled the phrasal construction rules to capture Hebrew relative clauses. He did not put emphasis on morphological operation.

Susanne extended her work on German and English concatenative morphology in

2001 and along with German and English, she added the nonconcatenative morphology of Hebrew verbal nouns [36]. She proposed an AVM for Hebrew verbal noun. This AVM has similarity with the AVM we proposed for verbal noun regarding the morphological feature. But she did not show any syntactic effect of this morphology. She articulated the AVM by placeholders for consonants. By placing the list root consonants, from this AVM, verbal noun AVM will be generated. She did not ensure that only valid verbal nouns will be generated from this AVM. Her solution can be used to automate lexical entry in dictionary or corpus but will not reduce the number of entry. Actually, she just gave a glimpse on morphology of Hebrew verbal noun in her massive work.

A detail work on verb initial construction (which is also called verbal sentence as opposed to nominal sentence and in this type of sentence verb precedes the subject) was shown [26]. In that work, the authoress put emphasis on Modern Hebrew verb related phrasal construction. She discussed the agreement of verb with its subject and complement. She also showed concatenative and nonconcatenative morphology of Hebrew verb in that paper but did not give any formalism of this morphology like what were modeled in German or Japanese [9, 36]. She mainly discussed the syntactic effect of these inflected verb forms. She also presented an implementation framework of HPSG grammar. CHAPTER 2. BACKGROUND AND RELATED WORKS

In 2007, Nurit presented a comparision of the implementation platform of HPSG [27]. She discussed the advantages and disadvantages of TRALE (An extension of the Attribute Logic Engine) and Linguistic Knowledge Building (LKB). This paper is very useful to choose the implementation platform of HPSG.

2.4.3 HPSG Modeling of Arabic

In 2006, an HPSG analysis of broken plural and gerund has been presented [24]. Main assumption in that work revolves around the Concrete Lexical Representations (CLRs) located between an HPSG type lexicon and phonological realization. Here, HPSG sign was represented using CLR function not by AVM and this function put more emphasis on phonology instead of morpho-syntactic operation. But main drawback of this work is it does not deal with other type of verbal noun and it does not dictate any implementation of CLR.

HPSG modeling of Arabic triliteral strong verb was proposed in 2008 [2–4]. The authors in these papers, show regular morphology of Arabic verb. They designed the SBCG AVM of Arabic verb. They also designed several verb lexeme construction and morphologically complex predicates (MCP). But they did not touch the morphological derivation of verbal noun. Also, they did not give any distinct way to implement the construct proposed in their works. During our work on verbal noun construct, we have to work with SBCG verb lexeme too. We adopt the verb lexeme proposed in these papers and modify it to cope with all the cases that we have found. The authors did not propose any idea about SIT-INDEX and INDEX and they actually duplicated the INDEX feature with ref-fr semantic frame which is never used in any HPSG or SBCG literatures. The atomic features (person, number and gender), that are used under INDEX function feature by Pollard and Sag [33], are used under ref-fr in these papers where at the same time they still keep INDEX feature and does not show its components. We correct this INDEX and SIT-INDEX related problem. This will be discussed in Section 3.2. CHAPTER 2. BACKGROUND AND RELATED WORKS

A nice HPSG formalism of Arabic nominal sentence is presented [29]. The paper intro- duces a grammar for Arabic nominal sentence. They have implemented their formalization using LKB system. The main limitation of this work is it deals with only agreement of nominal sentences and it does not discuss on morphology at all. Another big limitation in this work is the assumption – agreement information in Arabic arises from syntactic rules and that it obeys grammar rules. But in Section 2.1.3 and 3.1, we have established that agreement in Arabic is not always syntactic and the agreement feature needs another feature humanness (HUM) which is not mentioned in the discussed work.

A parser on Arabic relative clause is designed in [17]. It is not a deep research and a study about different forms of relative clauses to process relative sentences. Thus, we can conclude that the rich nonconcatenative morphology of Arabic verbal noun is not yet explored and we have the opportunity to do it. In 2010, part of this work was published [19]. In that paper, we proposed the construction rules but did not articulate any implementation.Chapter 3

HPSG Formalism for Verbal Noun

In this chapter, we model the HPSG categories of verbal nouns and their derivation from different types of verbs through HPSG formalism. In Section 2.3, we mention that we adopt the SBCG [38] for this analysis. Here, we give an AVM for nouns and extend it for verbal nouns. We extend the verb AVM proposed by Bhuyan et al. [2–4]. We propose a multiple inheritance hierarchical model for Arabic verbal nouns and how to get a sort description from the type hierarchy. Finally, we propose construction rules of verbal nouns derived from strong triliteral i.e. Form I root verbs.

3.1 AVM of Arabic Nouns

We modify the SBCG feature geometry for English and adopt it for Arabic. The SBCG AVMs for nouns in English and in Arabic are shown in Figure 3.1 and Figure 3.2, respec- tively.

The PHON feature is out of the scope of this paper. Three main function features – MORPH, SYN and SEM are discussed in the following subsections.CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

? ?noun-lex? ?? ??phon [] ?? ?? ?form [] ????arg-st list(sign) ?? ? ?? noun?????? ? ?????????????? ? ? ?case . . .??? ? cat ?? ????? ? ? ?select . . .??? ?syn ? ?? ? ? ?????? ? ? ?xarg . . .??? ? ? ?? ?? ?? ?? ?? ? ????????????? ??????????sem ?val list(sign) ??? ???mrkg mrk ???index i ?? ??frames list(f rame) ? Figure 3.1: AVM for English noun

3.1.1 MORPH

The MORPH feature captures the morphological information of signs and replaces the FORM feature of English AVMs. This feature is similar to MORPH feature used for Hebrew verbal noun [36]. The value of the feature FORM is a sequence of morphological objects (formatives); these are the elements that will be phonologically realized within the sign’s PHON value [38]. On the other hand, MORPH is a function feature. It not only contains these phonologically realized elements but also contains their origins. MORPH contains three features – ROOT, STEM and DEC. ROOT feature contains root letters for the following cases:

1. The root is characterized as a part of a lexeme, and is common to a set of derived or inflected forms

2. The root cannot be further analyzed into meaningful units when all affixes are removed

3. The root carries the principal portion of meaning of the lexemeCHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

? ?noun-lex? ?? ?phon [] ???? ?root list(letter)? ??? ? ? ?morph ?stem list(letter)? ? ? ? ?? ?dec . . . ? ?? ?? ?? ??arg-st list(sign) ?? ?? ? ? ? ????????? noun ?? ?????????????? ? ? ? ?case . . .?? ? ? ? ? ?? ? ? ?cat ?def . . .?? ? ? ?? ?? ? ? ?????? ? ?syn ? ?select . . .?? ? ? ?? ? ? ???? ? ? ? ?xarg . . .?? ?? ? ??? ? ??? ?lid . . .?? ?? ?? ?? ? ? ??val list(sign) ?? ?? ? ? ?? mrkg mrk ?? ?? ? ? ? ?????

person . . . ???????? ? ? ?number . . .??? ? ?index ?? ? ? ????????? ?sem ? ?gender . . .??? ? ???? ???? hum . . . ??? ??? ? ??? ?frames list(f rame) ??

Figure 3.2: AVM for Arabic noun CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

In rest of the cases, the content of this feature is empty.

The STEM feature contains a list of letters, which comprises the word or phrase or lexeme. We can identify the pattern in a lexeme by substituting the root letters by the placeholders if any root exists in STEM. As an example, the ROOT of the lexeme ‘kataba’ contains ‘k’, ‘t’ and ‘b’ and the pattern of the STEM is ( a a a). Without the existence of this pattern, the ROOT is irrelevant. Thus a pattern bears the syntactic information and a ROOT bears the semantic information. Lexemes which share a common pattern must also share some common syntactic information. Similarly, lexemes which share a common root must also share some common semantic information. STEM is derived from the root letters by morphology if root exists.

The DEC (declension type) feature under the MORPH feature maps to the declension type of noun. It determines how the end vowel of noun lexemes changes to reflect its case. The change of end vowel changes the form of a lexicon. There exists nine possible ways in which grammatical cases can be represented on an Arabic noun. So, for declinable noun, value of DEC feature can be T 1, T 2, T 3, . . . , T 9, corresponding to the nine declension types. The value of this DEC feature can be determined from type hierarchy of noun lexeme. It needs further research and it is beyond the scope of this thesis. In our current research, we will not mention this feature in the following AVM’s but we keep it in our basic design to make our design robust for inflection also.

3.1.2 SYNTAX

The SYN feature contains CAT, VAL and MRKG features. We modify the CAT feature of SBCG to adopt it for Arabic language. Note that, for all kinds of verbal nouns the sort description of the CAT feature is noun. In Arabic there are only three parts of speech (POS) for lexemes or words: noun (in Arabic pronoun is also considered as noun), verb and particle. Any verbal noun serving as a modifier is also treated as noun. In the case of the Arabic noun, the CAT feature consists of CASE, DEF, SELECT, XARG and LID CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

features. Among these features, we introduce DEF feature which is used for syntactic agreement in phrasal construction. This feature also strengthen our design. As Arabic has three cases for noun, the value of CASE will be nominative, accusative and genitive.

The DEF feature denotes the value of definiteness of an Arabic noun. There are eight

ways by which a noun word or lexeme becomes definite [21]. Personal pronouns such as? ?? ?

“he”, “I” and “you” are inherently definite. Proper nouns are also definite. é??? (-al-lahu )

is another instance of definite lexeme. These examples confirm that definiteness has to be specified at the lexeme level. The article ‘al’ also expresses the definite state of a noun of any gender and number. Thus if the state of a noun is definite, the noun lexeme contains yes as the value of DEF, otherwise its value will be no.

In Arabic, there is a significant role of this definiteness (DEF) feature for syntactic

agreement. A nouns and its modifier must agree on the DEF feature value. For example,

??Ô?? ? ?? ?????? ???????

? ? ?? ? (alkitabu ’l–ah. maru ) means “the red book”. ?? ??

? (alkitabu ) means “the ? ?book” and ?Ô?? (-ah. maru ) means “red”. As “red” is used as a modifier for “the book”,?? ? ?the definiteness prefix ‘al’ has been added to ?Ô?? (-ah. maru ) yielding ??Ô??? (al–ah. maru ).

3.1.3 SEMANTICS

Like SBCG in English, SEM feature in Arabic contains two function features – INDEX and FRAMES. The INDEX is used for index based semantic agreement which is mentioned in Section 2.1.3 and FRAMES contains the list of frames which contain semantic information in Minimal Recursion Semantics (MRS). CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

As mentioned earlier in Section 2.1.3, person, number, gender and human/nonhuman

– these information must be kept for semantic agreement. So, INDEX feature is composed of PERSON, NUMBER, GENDER and HUM and it is contained under SEM. We use this index based agreement [33] as opposed to putting the agreements under AGR feature [23]. This is because index based agreement is more customary in HPSG and most of the scholars use index based agreement.

HUM feature is introduced by us for Arabic. The other three features are also used for semantic agreement in English [33]. This HUM feature denotes humanness. Depending on languages, agreement may have gender, human/non-human, animate/inanimate or shape features [33]. In Arabic, Humanness is a crucial grammatical factor for predicting certain kinds of plural formation and for the purpose of agreement with other components of aphrase or clause within a sentence. The grammatical criterion of humanness only applies ? ? ? ?? ? to nouns in the plural form. As an example, “these boys are intelligent” ( ??ð?? ??ñ ë ??? ???? ?? ? ? ????? ?? ? – ha-ul¯a- alawl¯adu -adkiy¯a- ) and “these birds are intelligent” ( é???? ? P?ñ????? è? ?? ë – hadihi¯ ¯’ltuywru dakiyyatun ). Both of these sentences are plural. But the former refers to human¯beings whereas latter refers to non-humans. So the same word “intelligent” (dakiyyun )?? ???? ?? ¯ has taken two different plural forms in two sentences: ?????? ?? ? (-adkiy

– ) and é???? ? (dakiyyatun ¯ ?? ?? ¯ ). In the case of boys, it is in the third person masculine plural form ( ?????? ?? ? – -adkiy – )???? ?? ¯whereas in case of birds, it is in the third person feminine singular form ( é???? ? – dakiyyatun?? ?? ?? ¯). Also, from the third person feminine singular form ( é ?? ?? ? – dakiyyatun ), we cannot¯readily say that it refers to feminine. In fact, it may refer not plural of nonhuman beings too. This is why, along with PERSON, NUMBER and GENDER, we keep HUM as a semantic agreement feature. CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

If the noun refers to a human being then the value of HUM is yes, otherwise it is no. The value of PERSON for Arabic noun can be 1st, 2nd or 3rd. There are three number values in Arabic. So, the value of NUMBER can be sg, dual or pl denoting singular, dual or plural, respectively. The GENDER feature contains either masc or f em denoting masculine and feminine respectively.

3.2 AVM of Arabic Verbs

As we will formulate construction rules which capture the linguistic derivation of noun from verb, we need to model the AVM of verb. We modify the verb AVM proposed by Bhuyan et al. [2]. We correct the index related problem found in that work. We disscuss the problem in detail in Section 2.4.3. We try to align the design of verb AVM with that of noun AVM. Figure 3.3 shows the SBCG AVM of Arabic verb.

? ?verb-lex? ?? ??phon [] ?? ? ??????morph ? ? ?root list(letter) ?? ? ?? ?? ? stem list(letter)? ?? ?? vdec list(letter) ?? ?? ?? ??arg-st list(sign) ?? ? ? ?? verb?????? ? ? ???? ??? ??? ??? ? ? ? ?vform . . .?? ? ? ?? ?? ?? ?cat? ? ? ?? ??voice . . . ? ??? ? ? ? ? ?mood . . .?? ? syn ? ?? ??? ? ? ? ?select . . .?? ? ? ? ?? ?? ? ?? ??? ? ? ? ?xarg . . .?? ?? ? ? ?? ?? ? ? ?? ? ? ?? ? ? ?? ? ? ?? ? ? ?? ?val list(sign) ? ?? ? ? ?? ?mrkg mrk ? ?? ????sem ?sit-index??

situation . . . ?????? ? ?frames list(f rame) ??

Figure 3.3: AVM for Arabic verb CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

MORPH feature in the verb AVM is similar to MORPH in noun AVM except the VDEC feature. It captures the declension type of verb and it replaces the DEC feature which captures the declension type of noun. Like DEC, it determines how the end vowel of noun lexemes changes to reflect the mood of vowel. The change of end vowel changes the form of a verb lexicon. There exists five possible ways in which grammatical cases can be represented on an Arabic verb. So, for declinable verb, value of VDEC feature can be V T 1, V T 2, V T 3, . . . , V T 5, corresponding to the five declension types. The value of this VDEC feature can be determined from type hierarchy of verb lexeme. It needs further research and it is beyond the scope of this thesis. In our current research, we will not mention this feature in the following AVM’s of verb. We keep it in our basic design to make our design robust for inflection also.

SYN