Automatic Parsing For Arabic Sentences by ijcsis


									                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                   Vol. 9, No. 3, March 2011

             Automatic parsing For Arabic sentences

Zainab Ali Khalaf*                                                            Dr. Tan Tien Ping

School of computer science                                                    School of computer science
Universiti Sains Malaysia (USM)                                               Universiti Sains Malaysia (USM)
Penang, Malaysia                                                              Penang, Malaysia
E-mail:                                           E-mail:
*(Ass. Prof. In Computer Science Dept.,
Basra University, Iraq)

Abstract__The designed system is a parser for Arabic
sentences using syntactic and semantic relations                      The proposed system aims to use these properties
between deep and surface structures. The system                   to parse Arabic sentences depending on the position
depends on implementation of Case theory of Fillmore.             of the words in the sentence and the functional
                                                                  meaning of them.
The parsing algorithm starts analyzing the input
sentence to check its syntax, semantic and spelling using
Arabic transformation rules proposed in Al_Khouly to
gain semantic strength. The proposed system depends                               II. SYSTEM COMPONENTS
on the effective elements represented by the verb of the
sentence .This element is used to control the parsing
operation.                                                             The syntactical properties of any natural
     The proposed system permits as input different
                                                                  language are formally described by the use of what
surface structures of Arabic sentences to produce
automatic parsing forms for these input sentences.                Chomsky calls production systems. A formal system
                                                                  generally depends on three types of data [2,3,6]:
Keywords__Artificial intelligence; natural language
processing; transformation rules; deep structure and                        A. Data of vocabulary lexicon
surface structure; parsing Arabic sentences .

                                                                       The lexicon plays an important role in any NLP
                                                                  system. It is a huge data base of variable entries
                   I. INTRODUCTION                                describing the meaning of words in synonymy (and
                                                                  antinomy) contextual fashion [3,6]. The implemented
                                                                  lexicon consists of entries saved as a rule ( Entrance
    Arabic language is a parsing language . Parsing               [ Word , Features ] ).
means the relation among the words in the sentence.
The most important component is the verb which acts               • The Entrance is one of the following indicators :-
as the basic unit to control the rules of choosing other            Verb , Noun , Preposition , Determinate , Assistant
elements. Although Arabic sentences have different                  and Negation.
structures , but it is recognized as a ( verb , subject ,
object ) language. The subject or the object may be               The Word is a string index for the lexicon entry.
precede the verb in the Arabic sentences according to
the pragmatic necessity [1,3,4].
                                                                  • The Features is a list of structured integers coded
                                                                    to hold the syntactical and semantic information of
     Arabic Syntactic facilitates the flexibility of the            the word. Each coded integer, written as [Fp],
deep structure and the surface structure of sentence to             consists of two parts F and p. The [p] part is either 1
be connected together strongly. This propriety helps                or 0 depending on whether the feature [F] exists or
Arabic language accept for automatic processing                     not. The [F] part is the feature code.

                                                                                              ISSN 1947-5500
                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                   Vol. 9, No. 3, March 2011

           B. Data of syntactical rules
                                                                    The presence of the verb is necessary and obligatory,
                                                                    whereas the presence of other elements is optional
       These rules are formalized to describe the                   and dependent on the verb rules [1,4].
  language in order to relate each one deep structure
  into so many corresponding surface structures of the
  same meaning. These rules are actually inductive and
  sequential. Some are obligatory and others are                             III. DENESIGD SYSTEM STRUCTURE
  optional rules. From the optional rules, one can obtain
  various surface structures that act as contextual                   The designed system has many stages : Figure (1)
  linguistics. The transformations are mainly operations            acts flowchart of these stages which are described
  that are addition, deletion, moving forward, moving               below :
  backward and some other secondary operations. These
  operations are, in general, not performed at random,
  but are governed and selected according to a set of                       A. Input sentence stage
  conditions and rules of structure description. These
  operations will generate all surface structures
  emerging from that one deep structure.                            The function of this stage is to input Arabic sentence
                                                                    from the keyboard to the computer , this sentence
                                                                    ended by dot or semicolon or space character .
           C. Data of syntactic structure

        These data are rules described in BNF for
  Arabic language , and acts as constraints and controls                   B.    Segmentation stage
  to form the sentences of Arabic language. The most
  important component, as Fillmore and Shank
  recognized, is the verb element which acts as the                 The function of this stage is to segment the input
  basic unit that controls rules of choosing other                  sentence into words depended on space character
  elements. The dependent phrase structure rules used               (free number of space characters).
  are the following :-
                                                                           C.    Lexicon search stage
<Sentence> ::= <Modality> + < Auxiliary > + <
Proposition >
<Sentence> ::= < Auxiliary > + < Proposition > <                    The function of this stage is to search for all sentence
Modality > ::= < External Condition > + < External                  words in the lexicon . If the word is not found in the
Adverb >                                                            lexicon, the program gives spelling error message
<Proposition > ::= < Verb > + < Theme > + < Indirect                and stop .
Object > + < Place > + < Tool > +         < Agent >
< Theme > ::= < Noun Phrase >                                              D.    Syntactical analysis stage
< Agent > ::= < Noun Phrase >
< Tool >        ::= < Noun Phrase >
< Place > ::= < Noun Phrase >                                       The function of this stage is to ensure and govern the
< Indirect Object > ::= < Noun Phrase >                             correctness of input sentence from its syntactical side
<Noun phrase> ::=<Proposition> + <determinate > + <                 . If the processing found errors , the program gives
  Noun >                                                            syntactical error massage .
< Noun Phrase > ::= < Proposition >+ < Noun>
< External Condition > ::= semi statements used to
combine two sentences such as          ( in spite of                     E. Semantic analysis stage
   ) or ( moreover                  ) etc.
< External Adverb > ::= <Time Adverb>+<
Interrogative Words> +<Negation Words>                              The function of this stage is to ensure and govern the
    < Auxiliary > ::= lexical words such as ( ) or (                correctness of input sentence from its harmony, its
         ) etc.                                                     vocabulary and correctness of its meaning . If the
    < Verb > ::= A dictionary verb such as ( write   )              sentence is not correct in its meaning, the program
    etc.                                                            gives semantic error massage .
    < Noun > ::= A dictionary noun such as ( boy     )

                                                                                              ISSN 1947-5500
                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                               Vol. 9, No. 3, March 2011

    F. Generative deep structure stage

                                                                                      IV. EXAMPLES
Transformational operations will carry out , and try
to compile the addition, deletion, replacement and
other operations to obtain on the sentence structure
which acts as the deep structure .                                 For example we want to know the parsing of the
                                                              following sentences. Figure (2) depicted this
                                                              mechanism :-
    G. Parsing stage
                                                                  A. Example 1
The function of this stage is to parse sentence which
depends on its effective element and its position in
structure phrase . This stage has many Arabic                      The system prints the following parsing :
language rules which control the parsing operations                           .                                                        :
.                                                                                           .                                         :
Here an examples of sentences that the system can                                                                                 :
parse its :-
                                                                  B. Example 2
                                    .2                             The system prints the following parsing :

                                    .3                                                                                        :
                                    .4                              .                                                         :
                                                              .                                                               :
                                                                  C. Example 3
                                    .7                             The system prints the following parsing :

                                    .9                                                                            :
                                    .10                                                                           :
                                    .11                                                                           :
                                    .12                           D. Example 4

                                    .13                                               .
                                                                   The system prints the following parsing :
                                    .15                                   .                                           :
                                                                          .                                           :
                                    .16                                                                               :
                                    .17                                                                               .
                                                                  E. Example 5
                                                                   The system prints the following parsing :

                                                                                          ISSN 1947-5500
                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 9, No. 3, March 2011

                   .                        :
                                             :                      [1] Abo-Arafah .A. , "A grammar for the Arabic language suitable
                                                     .              for machine parsing and automatic text generation ", PH.D. thesis ,
                                             :                      Illinois of technology , Chicago , USA,1995 .
  .                                          :
                                                                    [2] Ali .N. , “Arabic language and Computer” , "Al-Tareeb
                                                                    Publishing House, Cairo, Egypt, 1988.

                                                                    [3] Al-Khouly, M. , “ Transformation rules for Arabic language”,
                                                                    Al- Riyadh, 1981.
                                                                    [4] Al-Shalabi .R. , Evens .M ." A Computational Morphology
                                                                    System For Arabic " , Dept. Of Computer Science and applied
                                                                    Mathematics , Illinos Institute Of Technology , Chicago , USA ,
    The present research ends up with the following                 W.D.
conclusions :-
                                                                    [5] Gheith .M. , Mashour .M . " A Computer Based System For
  1. The verb is the main component which controls                  understanding Arabic language ", Computer Science Department
  all other component appearing with it . From this                 Inst. Of Statistical Study & Research , Cairo University , Egypt
  point, we consider all deep structures as containing              ,W.D.
  the verb in its structure .
                                                                    [6] Khalaf .Z. , “Computerized Implementation For Processing
                                                                    Arabic Sentences By Interpretation Synonymy Relationships” ,
  2. The word meaning depends on the essential                      M.Sc. thesis, Basra University, Iraq, 2001.
  effective element ( the deep element ) .

  3. The lexicon plays the essential element to
  provide any system by vocabulary and its features .
  By these features, we can control the different
  processing levels of syntax and semantics .

4. The absence of vowelization might bring some
 ambiguities to sentence understanding. However the
 transformation rules are used to remedy these
 ambiguities in an explicit and easy way, as in the
 following sentences which show where, in all the
 sentences, the man is the subject and the lion is object


I would like to express my sincere appreciation to
TWAS organization , USM university for their
encouragement and continuous financial support
through the providing PHD fellowship. In addition we
would like to thank school of computer science for
their encouragement and motivation of international
students in the faculty.

                                                                                                  ISSN 1947-5500
                    (IJCSIS) International Journal of Computer Science and Information Security,
                    Vol. 9, No. 3, March 2011

                                  User Interface

                                  Input Stage

    Lexical Rules

                              Lexicon Stage

                            Initial Descriptive

                        Transformational Rules                                        Errors
                       Transformational Rules

 Stage                        Deep Structure

Semantic                    Parsing stage

                         User Interface

                             Figure (1) acts flowchart of
                                 Parsing operations

                                                               ISSN 1947-5500
                            (IJCSIS) International Journal of Computer Science and Information Security,
                            Vol. 9, No. 3, March 2011

                     Surface structure


                   Transformation Rules

           An agent (        ) used a tool (          ) to
         perform the verb (          ) to get the object
                           (     )
                        Deep structure

Verb (      ) , Subject (      ) , Object (           ) , Tool (        )

                        Sentence structure

                         Parsing Stage

                                                ­            :
     .                                                       :
 .                                                               :
           .                                                     :

           Figure (2) acts the mechanism to Parse
                       Arabic sentence

                                                                       ISSN 1947-5500

To top