Docstoc

PC-I.doc - PC-I Form

Document Sample
PC-I.doc - PC-I Form Powered By Docstoc
					Name of Project: Lexicon, Machine Translation & Text- to – Speech Software for
                                    Urdu
                     (Extendable to Regional Languages)

                                   (Revised)




                         Original Sanction date May2003




                     Electronic Government Directorate
                     Ministry of IT & Telecommunication
                           Government of Pakistan
                                  Islamabad

                                  May 2005
                              TABLE OF CONTENTS


S.N.   Section        Description                                      Page No.

 1     Part ‘A’       Project Digest                                     2-5

 2     Part ‘B’       Project Description                                6-18

 3     Part ‘C’       Project Requirements                                19

 4     Annexure A     Comparative Cost Estimates, Total Expenditure,    20-23
                      Progress of work & Project History

 6     Annexure B     Summary of Capital Expenditures                     24
 7     Annexure I     Payment Schedule to FAST                          25-26

 8     Annexure II    Cost Of Products & Services                         27
 9     Annexure lll   Project Staff Salaries                              28
 9     Annexure lV    Operating Expenditures                              29
10     Annexure V     Maintenance Cost                                    30
11     Annexure VI    Annual Phasing of EGD Funds                         31


12     Annexure VII   EGD Administrative Expanses details                 32




                                               1
                                    Code Number of Project*

                                                        *(To be filled in Planning Commission)
                                                                             (Rupees in Million)

                                           PART ‘A’

                                       PROJECT DIGEST


1.   Name of Project:                  Lexicon, Machine Translation & Text- to - Speech
                                       Software for Urdu (Extendable to Regional
                                       Languages)

2.   Location:                         The Project office would be located at Electronic
                                       Government Directorate, Islamabad and camp office
                                       in the FAST-NU Lahore.

3.   Authorities responsible for:
     a) Sponsoring                     Ministry of Information Technology and
                                       Telecommunications

     b) Execution                      Electronic Government Directorate

     c) Operation & Maintenance        Electronic Government Directorate

4.   Time required for completion      44 Months
     of Project (in months):

5.   a) Plan Provision
     i.   If the project is included   Project is covered by Electronic Government
          in the current five-year     Programme which is a part of the Federal Cabinet’s
          plan, specify actual         approved IT Action which is a part of the current Five
          allocation                   Year Plan.
     ii. If not included in the        Not Applicable
          current plan, how is it
          now proposed to be
          accommodated
          (inter/intra-sector)
          adjustments in allocation
          or other resources may
          be indicated.
     iii. If the project is proposed   The project would be funded out of the block
          to be financed out of        allocation for E-Government projects during FY
          block provision IT for a     2002-03, 2003-04, 2004-05, 2005-06,2006-07
          program indicates.

6.   Relationship of the project       6.1 The E-Government initiative is an integral part of
     with the objectives of the        Federal Cabinet’s approved National IT Policy Action
     sector. Indicate the              Plan. The main objective of the electronic
     contribution of the project,      government programme is to improve the quality of



                                              2
     quantified if possible, to the   government’s service delivery to citizens. By using
     targets in the five-year plan,   IT, this purpose will be achieved. The Provision of IT
     and the names of other           infrastructure for the automation of Urdu and
     projects (whether sanctioned     Regional Languages and their software development
     or under preparation), which     is an area well emphasized in IT Action Plan which
     would form part of an            has been approved by federal government.
     integrated program within the
     sector                           6.2 English is the official language of Government of
                                      Pakistan. This entails that an immense amount of
                                      information has been produced to date and is being
                                      produced by GoP every year.             However, this
                                      information (printed or published on the internet) is
                                      not accessible to a vast majority of Pakistanis
                                      because they do not understand English. Due to the
                                      enormity of the task and resource requirement, it is
                                      not possible to regenerate the material in Urdu, our
                                      national language and the commonly understood all
                                      over Pakistan.

                                      It is possible to bring all this information to the
                                      general masses of Pakistan by making a Machine
                                      Translation System.     This computer program will
                                      take English text and convert it and display it in Urdu
                                      on-the-fly, enabling most Pakistanis to have access
                                      to English language material

                                      Currently E-government Directorate has 368 MB of
                                      information       published    on       its      portal
                                      www.Pakistan.gov.pk, which is increasing at the
                                      growth rate of 100 MB per month. It will be very
                                      costly to duplicate this effort for Urdu and make
                                      mirror sites. However, with this Machine Translation
                                      system (attached with the portal) all this information
                                      will be viewable in both languages at the click of a
                                      button. As a matter of fact in the absence of this
                                      facility the efficacy of our web portal diminishes
                                      considerably.

                                      6.3 To bring the benefits/advancement of IT to the
                                      common man, it is essential to have an interface
                                      accessible to common man. This is only possible if an
                                      oral interface is provided within the computer
                                      equipment, which is accessible to the common man
                                      comprising more than 80% of the masses of the
                                      country. Government is pushing for e-government
                                      services. However, masses cannot read/write and
                                      therefore verbal interface is required to reach the
                                      masses. Without the automation of Urdu and
                                      Regional     languages, the goals like e-government,
                                      e-commerce will not be achieved.

7.   Capital Cost of the Project
     Local:                           Local Rs. 38.9965 million


                                             3
      F.E.C:                       Nil

      Total:                       Rs. 38.9965 million (Details are given in Annexures)

8.    Recurring Cost after         Rs. 0.09 million per year. For details of recurring
      completion of the Project:   expenses please see Annexure V.
                                   After completion of the project the recurring cost for
                                   2006-07 and onward will be borne by the Ministry of
                                   IT and Telecom.

9.    Date of commencement:        September 2003

10.   Proposed date of                 As per original PC-1     As per revised PC-1
      Completion:                      September 30, 2006       June 30, 2006

11.   Objectives of the project:   The project will result in the following applications:
                                   Components
                                    Development of Urdu Lexicon Component
                                    Development of English–to-Urdu Machine
                                     Translation Component
                                    Development of Urdu TTS Component
                                   Applications
                                    Development of Online Urdu Dictionary
                                    Development of Online Text Translator
                                    Development Urdu Email and Website Reader




                                             4
Prepared by:   Naila Yusuf
               Project Manager
               Electronic Government Directorate
               Ministry of Information Technology
               IT & Telecommunication Division
               Government of Pakistan
               Islamabad




Checked by:    Syed Raza Abbas Shah
               Director General (Projects)
               Electronic Government Directorate
               Ministry of Information Technology
               IT & Telecommunication Division
               Government of Pakistan
               Islamabad




Approved by:   Seerat Asghar
               Programme Coordinator
               Electronic Government Directorate
               Ministry of Information Technology
               IT & Telecommunication Division
               Government of Pakistan
               Islamabad




               Dated         May 2005




                                5
                                       PART 'B'

                               PROJECT DESCRIPTION

12.   Description of the Proposed   12.1 Brief history
      System:
                                    A team of dedicated purely volunteer group,
                                    comprising Urdu level linguists, academicians,
                                    software developer and technologist spent a number
                                    of months on brain storming for the basic for speech
                                    recognition,    machine     translation  etc.    Many
                                    government departments like, Pakistan Computer
                                    Bureau, National Language Authority and FAST were
                                    also involved in the deliberations and decisions.

                                    Three software components will be developed as a
                                    result of this project and are explained in 12.2, 12.3
                                    & 12.4.

                                    12.2 Development of Lexicon

                                    Lexicon is an essential requirement for Speech and
                                    language applications (Speech synthesis, speech
                                    recognition, grammar checker, machine translation,
                                    etc.). In addition to Speech and Language
                                    applications, the Lexicon is also required for desktop
                                    applications (spell checker, grammar checker, etc.).
                                    Lexicon has to be carefully designed for scalability,
                                    expandability, performance and componentization,
                                    and to avoid redundancy of information. The issues
                                    include not only linguistics but how to store multi-
                                    modal and multilingual information as well (e.g. for
                                    applications for Machine Translation).

                                    The major factors considered for the development of
                                    Lexicon will be Linguistic content specification,
                                    Storage mechanism, Retrieval algorithms and Data
                                    entry.

                                    a) Linguistic Content Specification
                                    Before designing a lexicon it has to be determined
                                    which information need to be stored at various
                                    levels. Since the proposed lexicon will be designed
                                    for speech and language applications, it will be
                                    necessary to store the information at phonetic,
                                    phonological, morphological, syntactic and semantic
                                    levels. In addition, other information for desktop
                                    applications (e.g. synonyms, etymology, etc.) will
                                    also be required to be stored. The specification of
                                    content in the lexicon at these levels needs to be
                                    researched and defined.




                                           6
b) Storage Mechanism

This phase requires to research and compare
various standard or other storage policies to
conserve space, keep it easily extendable (e.g. to
add other dimensions of information), get good
retrieval performance; break into loosely coupled
components and to avoid redundancy.

c) Retrieval Algorithms

Language data stored must be efficiently retrieved
by applications. As most of the processing required
by these applications is near real-time, efficient
algorithms for data retrieval is required.

 In addition, these applications may also require
data searches based on incomplete information and
therefore algorithms are required which can
converge using partial inputs.

Many such algorithms exist for data. However, they
need to be tailored for speech and language
applications, and specifically for Urdu.

d) Data Entry

Once the specification of data to be stored has been
decided,    and    logical  and    physical   storage
mechanisms are finalized, significant amount of
linguistic data has to be entered in the lexicon and
verified for quality control. This portion requires
black box testing of randomly sampled data for
accuracy.

To identify the areas and initiate research studies
supporting Urdu and Regional Languages, Machine
Translation, Desktop Publishing, Optical Character
Recognition, Databases, Speech Synthesis, Internet
Browser/Publisher   and    email    programs,     e-
commerce, user interfaces, plug-ins for Star Office,
MS Office and other directly needed software’s
programs.

To formulate standards for all the above-mentioned
as well as the new areas, this will emerge in due
course of technological advancement in IT sector.

A limited customized lexicon will be developed.
However, facility will be given for integration with a
separate lexicon as well. Integrating with a lexicon
is an important part of making the system. All the
components of the system will be interacting with


       7
the lexicon that will be made as a separate module
of this project, and will be a multipurpose lexicon
containing a standard interface to interact with other
applications, and a wide variety of information apart
from just the meanings of words. The proposed
lexicon will have at least phonological morphological
and syntactic information.

12.3 Machine Translation
English to Urdu Machine translation will be achieved
in the following phases:

a) Analysis of Urdu Linguistics

Before the translation can be automated, complete
knowledge of Urdu linguistics is required.   This
would entail inquiry into Urdu morphological,
syntactic and semantic structures. Grammars will
be developed and documented to cover these
aspects of Urdu.

b) Analysis of English Linguistics

As in Urdu, a similar exercise will be undertaken for
English. Much of this work is already done by
western linguists, however, these details will have to
be dug out, properly filtered and documented in the
desired format. Any missing information will also
need to be analyzed and developed.

c) Analysis of English-Urdu Translation
Mechanism

After the grammars of these languages have been
documented, the translation rules from English to
Urdu will be investigated and documented. This
study will focus on how different English morpho-
syntactic structures map onto licensed Urdu
structures.

In addition, the semantic dimension of this
conversion will also be taken into consideration and
models which entail domain specific accurate
translations will also be investigated and
documented.

d) Computational Modeling

Once the grammars have been finalized and English
to Urdu mapping mechanism has been developed,
the Machine Translation Engine will be designed and
developed to automate the computers for this
process.


       8
12.4   Text-to-Speech Software

The major factors for consideration of Text-to-
Speech will be that a text-to-speech engine will be
bought. The TTS engine will be made to interact
with the Urdu language model. The engine will also
be interacting with the Lexicon. Efforts will be made
to achieve the best quality of the synthesis so that
the synthesized speech would feel natural.

a) Grammars (Urdu language models)

The rules of language are usually encoded in
different kinds of grammars for the computational
purposes. Nearly all the components of the systems
will be interacting with different types of grammars,
which will be used to encode various linguistic rules
as well as mathematical machines (automata). A
brief description of some of the main grammars is
explained below

b) Grammar for Aerab to vowel mapping

In languages like Urdu, Persian, and Arabic, aerabs
are used in orthography representation to represent
some vowel sounds. Therefore a grammar will be
needed to represent the aerab to vowel mapping.

c) Grammar for Stress Assignment

A grammar will be needed to represent the stress
patterns of the Urdu words. This can be helpful in
both TTS and ASR systems. This Grammar is
necessary in the speech synthesis engine to make
the speech more natural rather than just robotic. It
can also help in handling the variations of speaking
a particular word with different stress patterns.

d) Grammar for Syllabification

A Grammar will be needed to identify the syllables in
a particular word. The syllabification Grammar will
help in the speech synthesis system as well as the
lexicon.

e) Grammar for Intonation Assignment

A Grammar will be needed to assign intonation
patterns to sentences of Urdu. It can help in making
the synthesized speech close to natural by applying
the typical into national patterns of Urdu language.




       9
f)   Grammar for Timing Rules

Generally in continuous speech, duration of different
sounds changes in different contexts. A grammar
will be needed to represent the difference of the
duration of different phonemes in different cases.

g) Grammar for Phonetic Variations

The sounds of different phonemes can be effected
by the phonemes preceding them e.g. the sound of
vowel ‘a’ gets a bit changed when it occur after ‘n.
There can be many other such cases to handle in
Urdu language. A grammar will be needed to handle
such cases.

12.5 Type of Activities

The exercise of building this project will involve
these types of activities:

Research Activities
Designing and Development
Quality Assurance
Standards Conformance
Publications

a) Research Activities

Unfortunately, little work has been done to formalize
the rules of Urdu language. No morphological,
syntactic, semantic, or even phonological rules of
Urdu language had been sufficiently formulated.
Hence linguistic analysis of Urdu language is yet to
be done. The proposed Systems cannot be
developed without going through all this research
activities.

b) Design and Development

Activities under this category are mainly the
software engineering activities. Designing this whole
system and its sub modules, modeling the
grammars, designing and developing TTS engine,
etc. will all the covered under this category.

Some of the main design and development activities
are briefly described below.

c) Developing Grammars

Modeling and developing grammars for various tasks
is the Min. of Sc. & tech. Important part of the


       10
                         proposed Systems. These grammars will be made
                         on the basis of the results of the linguistic research
                         on Urdu language.


                         d) Develop SDK to Wrap Engine

                         To enable other application programmers to use our
                         system, it will be wrapped in an SDK layer. The SDK
                         will provide interfaces for all the major development
                         platforms.

                         This will ensure that multiplicity of the system and
                         definitely accelerate the pace of development of
                         Urdu speech driven systems.

                         e) Quality Assurance

                         Quality assurance and quality control, both
                         techniques will be used to assure the quality of the
                         system and the quality of the publications.

                         f)   Standards Conformance

                         Urdu Zabta Takhti (UZT) is the standards code plate
                         for Urdu characters. Pakistani government in year
                         2000 has formalized this standard. UZT codes will
                         be used in the system wherever Urdu text will be
                         manipulated.

                         g) Publications

                         With a lot of scope for research in this project, there
                         will be many potential publications.


13   Scope of Project:   The project will result in the following applications:

                         Components

                                 Development of Urdu Lexicon Component
                                 Development of English-to-Urdu Machine
                                  Translation Component
                                 Development of Urdu TTS Component

                         Applications

                                 Development of Online Urdu Directory
                                 Development of Online Text Translator
                                 Development of Urdu Email and Website
                                  Reader




                                  11
14   Justification of Project   For bring the benefits/advancement of IT to the
                                common man, it is essential to have an interface
                                accessible to common man. This is only possible if
                                an oral interface is provided within the computer
                                equipment, which is accessible to the common man
                                comprising of more than 80% of the masses of the
                                country. Government is pushing for e-government
                                services. However, masses cannot read/write and
                                therefore verbal interface is required to reach the
                                masses. Without the automation of Urdu and
                                Regional    languages, the goals like e-government,
                                e-commerce will not be achieved.

                                Benefits of Project

                                   The development of Lexicon and text to speech
                                    software will facilitate different e-Government
                                    and e-Commerce applications. It will also allow
                                    Urdu databases such as National I.D. cards,
                                    driving licenses electoral role etc. It will also
                                    make possible computer assisted translation
                                    from English to Urdu and vice versa.

                                   To lead Pakistan’s research, development and
                                    implementation    of    Urdu   (and      Regional
                                    Languages) electronic standards for the local and
                                    global IT market

                                   To design and implement Lexicon for data bases,
                                    hand-held and cellular technologies, machine
                                    translation, voice operated devices, and other
                                    hardware and software areas, so that Urdu
                                    usage is pervasive in the local as well as in
                                    global markets.

                                   To conduct research in leading edge technologies
                                    such as machine translation, optical character
                                    recognition, neutral (Urdu) user interfaces, voice
                                    operated systems, industrial automation, etc. for
                                    Urdu and Regional Languages.

                                   To achieve e-governance goal at government
                                    level.

                                   The provision Lexicon of Urdu and Regional
                                    Software will increase the use of computers by
                                    the public at large.

                                   The provision Text to Speech System in Urdu
                                    and Regional Languages Software will increase
                                    the use of computers by the public at large.




                                       12
                         This will increase the IT access facilities to the
                          public by more than 80% as compared to
                          present only by 10% as current facilities are
                          available only in English.

                         It will broaden the consumer base for e-
                          Government and will increase the application
                          utilization by more than 80% of the public.

                         The widen use of Urdu and Regional Languages
                          Software will generate more local software
                          development work and from the used skill set for
                          this development will be readily useful for other
                          multilingual software.

                         The groundwork so initiated will help and provide
                          support different applications at national and
                          international level.

15   Implementation   Administrative arrangements for implementation of
     Methodology:     the project

                      Electronic Government Directorate (EGD) will
                      execute the project administratively. The project will
                      be entrusted to FAST which is the only organization
                      that has the capability to undertake this
                      development work in Pakistan.
                      FAST-NU Lahore has developed a center for
                      Research in Urdu Language Processing (CRULP)
                      which is dedicated to conducting research and
                      development in various facets of Urdu language.
                      These include speech and language processing, font
                      generation and recognition, desktop publishing,
                      machine translation, Internet publishing and Urdu
                      text storage, retrieval and transmission strategies.
                      FAST-NU Lahore is doing research on it for past one
                      and a half year. FAST-NU has good facilities and
                      faculty to carry out this project. It is recommended
                      that under the supervision of Electronic Government
                      Directorate the FAST-NU Lahore should develop the
                      project. For details of CRULP see in Annexure IX.

                      The requirements will be based on Component
                      based Lexicon with at least 50,000 Headwords (and
                      at least 10,000 words with specified content, with at
                      least     15   dimensions      including    phonetic,
                      phonological,   morpho-syntactic      and   syntactic
                      information), Component based TTS System with
                      perceptually intelligible speech output for the core
                      10,000 words of Lexicon and at least two intonation
                      patterns (declarative and interrogative), Complete
                      API for Lexicon, Complete API for TTS, Test
                      Application (Urdu UI) for Lexicon, which uses the


                             13
API and Test Application (Urdu UI) for TTS, which
uses the API.

Specialized people required include:

Computer speech Scientists, who is qualified for text
to speech synthesis. The person should have the
knowledge of phonetic and phonology and speech
processing. In addition the person should have
experience in working in text-to-speech system.

Computational Linguist, who is qualified for lexical
development. The person should have a strong CS
background with knowledge of grammar, modeling
of grammar, theory of computation, and theory of
compiling and parsing. In addition, knowledge of
linguistics and how it is computationally modeled is
also required.

The payment schedule will be according to the
national and international financial norms see
Annexure VIII.

The product generated after the research and
development activities carried-out through the
project will be platform independent, as far as
design and technology permits, and will work with
Linux and Windows Operating System. The end
product will be the property of Electronic
Government Directorate, Government of Pakistan. If
the selected organization intends to further develop
some addition to the product after its delivery to
EGD, the organization has to take permission from
the government.

Once the proposed software is ready it will be
distributed to all public sector organizations at a
very nominal cot or free of cost as decided by the
Competent authority for implementation.

The EGD will be responsible for the maintaining the
project and manage to have continuous follow up of
the upgrades of the software so that the future need
of Lexicon and Text-to –Speech are properly met in
future.

Lexicon and TTS Project Outputs

The project will result in the following applications:

Online Urdu dictionary. This dictionary could be
used    to    access    phonetic,   phonological,
morphosyntactic and syntactic information about


       14
Urdu words. It would provide wild card searches.
The lexicon would be useful not only for general
public and for general promotion of Urdu Language
but will be an immensely useful tool for linguistic
search as well.

Urdu Email and Website Reader. This application
would install on a PC and would be able to read out
web pages in Urdu.        This application could be
distributed for free to general public to effectively
dispense information published on government (and
other) websites to people who cannot read. This will
give a real boost to e-governance projects and will
greatly enhance accessibility to information on
internet by Pakistani people. This will have other
applications, which will have significant social
benefits. For example, the application will be able
to read out emails sent in Urdu.

Urdu Lexicon Component. This component would be
available to developers and organizations to
incorporate in their programs, e.g. to do spell
checking and grammar checking in word processors.
This component will greatly boost Urdu desktop and
other publishing. In addition this component will
form the basis for further research and application
development for Urdu, e.g. Machine Translation,
Optical Character Recognition. These applications
will use the Lexicon component for functioning
effectively.

Online Text Translator
This application would install on a PC and would be
able to translate English text pages to Urdu. This
application could be distributed for free to general
public to effectively dispense information published
on government (and other) websites to people who
cannot read English. This will give a real boost to e-
governance projects and will greatly enhance
accessibility to information on internet by Pakistani
people. This will have other applications, which will
have significant social benefits. For example, The
vast repository of Internet will become accessible to
people of Pakistan through instant translation into
Urdu.

English-to-Urdu MT Component
This component would be available to developers
and organizations to incorporate in their programs
e.g. e-mail translator. This component could also be
incorporated with websites to transform web content
into Urdu.




      15
                                      Urdu TTS Component. This component would be
                                      available to developers and organizations to
                                      incorporate in their programs, e.g. companies like
                                      PIA, Pakistan railways, PTCL 17 Inquiry, etc. can
                                      incorporate this component into their consumer
                                      response software to develop Automatic Response
                                      Systems (which would work without operators,
                                      twenty four hours, seven days a week) for flight
                                      schedule information, train schedule information and
                                      telephone queries respectively.     This component
                                      could also be incorporated with websites to develop
                                      voice-commerce in Pakistan.

16.   a) Indicate the relationship    The E-Government initiative is an integral part of
         with other programs in the   Federal Cabinet’s approved National IT Policy Action
         same and other sectors       Plan. The main objective of the electronic
                                      government programme is to improve the quality of
                                      government’s service delivery to citizens. By using
                                      IT, this purpose will be achieved. The Provision of IT
                                      infrastructure for the automation of Urdu and
                                      Regional Languages and their software development
                                      is an area well emphasized in IT Action Plan which
                                      has been approved by federal government.

                                       English is the official language of Government of
                                      Pakistan. This entails that an immense amount of
                                      information has been produced to date and is being
                                      produced by GoP every year.          However, this
                                      information (printed or published on the internet) is
                                      not accessible to a vast majority of Pakistanis
                                      because they do not understand English. Due to the
                                      enormity of the task and resource requirement, it is
                                      not possible to regenerate the material in Urdu, our
                                      national language and the commonly understood all
                                      over Pakistan.

                                      It is possible to bring all this information to the
                                      general masses of Pakistan by making a Machine
                                      Translation System. This computer program will
                                      take English text and convert it and display it in
                                      Urdu on-the-fly, enabling most Pakistanis to have
                                      access to English language material

                                      Currently E-government Directorate has 368 MB of
                                      information       published     on       its     portal
                                      www.Pakistan.gov.pk, which is increasing at the
                                      growth rate of 100 MB per month. It will be very
                                      costly to duplicate this effort for Urdu and make
                                      mirror sites. However, with this Machine Translation
                                      system (attached with the portal) all this
                                      information will be viewable in both languages at the
                                      click of a button. As a matter of fact in the absence




                                            16
                                        of this facility the efficacy of our web portal
                                       diminishes considerably.

                                       To bring the benefits/advancement of IT to the
                                       common man, it is essential to have an interface
                                       accessible to common man. This is only possible if
                                       an oral interface is provided within the computer
                                       equipment, which is accessible to the common man
                                       comprising more than 80% of the masses of the
                                       country. Government is pushing for e-government
                                       services. However, masses cannot read/write and
                                       therefore verbal interface is required to reach the
                                       masses. Without the automation of Urdu and
                                       Regional    languages, the goals like e-government,
                                       e-commerce will not be achieved.

      b) Mention immediate output      Nil
         in the form of studies and
         papers etc.:

      c) Administrative                Nil
         Arrangements:

17.   Give date when capital            As per original PC-1          As per revised PC-1
      expenditure estimates were        January 2003                  May 2005
      prepared:

18.   Give Summary of capital cost                                           Revised Cost
      covering the whole of the                                              approved in
      investment period:                                                    DDWP meeting
                                                                            on 18th Jan, 05
                                         S.N.     Description               Rs.(In million)
                                                  Cost of Products /
                                             1    Services (Annexure-II)        2.779
                                                  Project Staff Salaries
                                             2    (Annexure-III)               28.3772
                                                  Operating Expenditures
                                             3    (Annexure-IV)                 5.0616
                                                  Administrative Cost for
                                             4    EGD (Annexure-V)              1.6317
                                                  Maintenance cost for
                                                  one year done by FAST
                                             5    (Annexure-VI)                 1.057
                                                  Contingency (5% of 1-
                                             6    5)                             0.09
                                                  Total Capital
                                                  expenditure (rs.)            38.9965

19.   Basis of cost estimates. (Give   Market prices have been used for purposes of
      full details)                    establishing cost estimates for the project. Please
                                       see Annexure for details.

20.   Estimates of annual recurring    After completion of the project the recurring cost for
      expenditure after completion     2007-08 and onward will be borne by the Ministry of


                                                 17
      of each phase of a project:     IT & Telecommunication. For details please see
                                      Annexure V.

21.   Annual Phasing of Physical      The project will be completed in three years and
      work and financial              eight months.
      requirements for the Project:

22.   Foreign Exchange                Nil
      Expenditure:

23.   Risk Analysis                   Considering the nature of work element of risk is
                                      high. However, as it is being entrusted to the
                                      organization which possesses sufficient professional
                                      expertise the chances of failure are not high.

24.   Sources of rupee component      Out of PSDP Funding.
      of project:

25.   Results of the project:         Development of software as desired by EGD.




                                            18
                                          PART 'C'
                                   PROJECT REQUIREMENTS

23.   a) Manpower:
          i. Regular Manpower to       Nil
             be re-deployed from
             current resources:

          ii. Consultants:             Nil

      b) Give list of employment to    Nil
         be generated by gender:

      c) Give manpower required        See Annexure III & VII
         during the first year of
         implementation of the
         Project. Give details of
         specific skills required
         (scientists, lab/field
         workers, technicians etc.)
         separately for male and
         female and their grades:

24.   Physical and other facilities    Not Applicable
      required for the Project:

25.   Civil Works:                     Nil


26.   In case of imported material     Not applicable
      and equipment for execution,
      indicate:

27.   Beneficiary Participation:       Not applicable




                                              19
                                                Annexures


                                                                                      Annexure-A

                                                                                                 (Milli
1. Comparative cost estimate of the last sanctioned and revised schemes :                        on
                                                                                                 Rs)
                                        Last Sanctioned cost                          Revised Cost

Items
                                                                                     Forei-
                                                  Foreign                            gn
                                                  Excha-                             Excha-
1.                                   Local        nge         Total       Local      nge       Total
a) Cost of Products & Services         2.779         Nil        2.779       2.779       Nil      2.779
b) Project Staff Salaries             28.3772        Nil       28.3772     28.3772      Nil     28.3772
c) Operating Expenditures             5.0616         Nil        5.0616     5.0616       Nil     5.0616
d) EGD Administrative
    Expanses                          1.6317         Nil       1.6317      1.6317       Nil      1.6317
e) Miscellaneous &
    Contingencies                      1.057         Nil        1.057       1.057       Nil      1.057
f) Maintenance cost done by
FAST                                   0.09          Nil        0.09        0.09        Nil       0.09
                            Total:   38.9965                  38.9965     38.9965              38.9965

Give reasons for revision in
cost estimates :
Items                                             Reasons for the revision
a) Cost of Products & Services
                                                  No Change

b) Project Staff Salaries                         No Change

c) Operating Expenditures                         No Change
                                                  DDWP is requested to kindly give the approval for the
                                                  creation of post of Project Manager. Infact, it had all
                                                  along been the intention of executing agency i.e.
                                                  Electronic government Directorate to create the post
                                                  of project manager and engage a professional against
                                                  project    management      expense     for   successful
                                                  implementation of project.
d) EGD Administrative
                                                  However, while preparing the PC-1, the expenditure
Expanses
                                                  was inadverantly demanded under the head of “Project
                                                  Management Expense. To rectify this mistake, DDWP
                                                  is requested to agree to the creation of post of project
                                                  manager and to accord the ex-post fecto for the same
                                                  from the date of approval of original PC-1 i.e. 28th
                                                  April, 2003, as the concerned project manager is on
                                                  duty since June 2003.


                                                   20
e) Miscellaneous &
    Contingencies             No Change
f) Maintenance cost done by   No Change
FAST




                               21
2. Total expenditure occurred so far
Items                                                       Expenditure (In Million Rs.)
                                                                          Foreign
                                                    Local                 Exchange         Total
a) Cost of Products &                               2.779                 Nil              2.779
Services
b) Project Staff Salaries                           17.965                Nil              17.965
c) Operating Expenditures                           2.4705                Nil              2.4705
d) EGD Administrative                               0.7                   Nil              0.7
    Expanses
e) Miscellaneous &                                  0.416                 Nil              0.416
    Contingencies
f) Maintenance cost done by                         Nil                   Nil              Nil
FAST
                          Total                     24.3305               Nil              24.3305
3. Progress of Work                    (a)          (b)                   (c)
                                         As per          Actual            Reasons
                                        schedule     achievements          for delay
                                           last
                                       sanctioned
Signing of the contract                                                   Due to few
between EGD & FAST-                                                       technical
NUCES.                                                Completed in        issues b/w
                                          Yes
                                                      August 2003         EGD &
                                                                          FAST-
                                                                          NUCES
Hiring of Staff / consultants
by FAST-NUCES                                        Completed in
                                          Yes                             None
                                                    September 2003

Purchase of Hardware &                               Completed in
Software by FAST-NUCES                    Yes
                                                     October 2003         None
Development of:-
1. Prototype 30,000 word
    lexicon for Urdu                                 Completed in
                                          Yes                             None
 (containing headwords only)                        December 2003
2. Natural Language Processor
    of TTS
Development of:-
1. Prototype 500 word lexicon
   with content specification
2. Its single word TTS
   synthesizer
                                                      Completed in
3. Prototypes of API and Urdu             Yes                             None
                                                       March 2004
   User Interface (UI) for
   Lexicon and TTS system
4. Prototype Machine
   Translation Engine




                                            22
Development of:-
1. 500 word lexicon with
   content specification
2. TTS system for common
                                                      Completed in
   sentence structures                    Yes                            None
3. UI of Lexicon
                                                       June 2004
4. UI of TTS system
5. Urdu & English Grammar for
   Machine Translation
Development of :-
1. Lexicon with 20,000
   headwords and 5,000
   words with content
   specification.
2. Urdu TTS system with
   existing functionality and                         Competed in
   professionally recorded
                                          Yes                            None
                                                       April 2005
   di-phones.
3. Grammar Rules for
   English-to-Urdu Transfer
   system for Machine
   Translation with 2000
   word lexicon .

                                                                       Planned
                                                                       Period of     Reasons for
                                                                       completion    revision
4. Project History              Date   Cost                    Total   in months
                                       Local        F.E.
                                May-
Original sanctioned              03      38.9965    Nil                         36
                                                                                     To bring
                                                                                     Project
                                                                                     deliverables
                                                                                     forward
                                                                                     and to re-
                                                                                     appropriate
                                Jan-                                                 project
1st Revision                     05      38.9965    Nil                         36   funds
                                Jun-                                                  Ref- Sec
2nd revision (Proposed)          05      38.9965    Nil                       36     1




                                               23
                                                                    Annexure-B
SUMMARY OF CAPITAL EXPENDITURE
                                      Approved
                                      Cost (As per
                                      revision by
                                      DDWP meeting
S.                                    held on                       Variation
No.   Description                     18thJan,05)    Revised Cost   in Cost


CONTRACTOR’S EXPENDITURE
 1    Cost of Products & Services           2.779        2.779          0


 2    Project Staff Salaries               28.3772      28.3772         0


 3    Operating Expenditures               5.0616       5.0616          0


 4    Maintenance of website                0.09          0.09          0
EGD EXPENDITURE
 5    EGD Administrative Expanses          1.6317       1.6317          O


 6    Miscellaneous & Contingencies         1.057        1.057          0

      TOTAL CAPITAL EXPENDITURE
      (Rs.)                                38.9965      38.9965         0




                                      24
CONTRACTOR’S EXPENSES
                                                                        Annexure-I

                           Deliverables Vs Payment Schedule

                    Deliverables details                                Payment      (In
Deliverable Time                                                        Millions)
                    (Revised Plan)
  th
30     September     Nil                                                4.6344
2003
                           Prototype 30,000 word lexicon for Urdu      4.6344
1st    December
                            (containing headwords only)
2003
                           Natural Language Processor of TTS
                           Prototype 500 word lexicon with             5.2964
                            content specification
                           Its single word TTS synthesizer
1st March 2004             Prototypes of API and Urdu User
                            Interface (UI) for Lexicon and TTS
                            system
                           Prototype Machine Translation Engine
                           500     word    lexicon   with   content    5.4398
                            specification
                           TTS system for common sentence
                            structures
25th June 2004
                           UI of Lexicon
                           UI of TTS system
                           Urdu & English Grammar for Machine
                            Translation
                           Lexicon with 20,000 headwords and           3.1199
                            5,000 words with content specification
                           Urdu      TTS   system    with   existing
  th                        functionality      and     professionally
15 April 2005
                            recorded di-phones
                           Grammar Rules for English-to-Urdu
                            Transfer      system     for    Machine
                            Translation with 2000 word lexicon
                           Tested Lexicon with at least 50,000         3.1199
                            Headwords and 10,000 words with
                            specified content
                           Online Lexicon Application (website)
                           Urdu Email and Webpage Reader
                            Prototype Application based on already
14th August 2005
                            shipped TTS system without intonation
                            and duration modeling
                           Prototype English-Urdu Web page
                            Translator with basic translation and
                            excluding semantics and domains with
                            4000 word lexicon
                           Tested TTS System with perceptually         6.7086
31st January 2006
                            intelligible speech output with basic



                                             25
                                     duration and intonation model
                                    Complete API for Lexicon
                                    Machine      Translation   Engine   with
                                     (limited) Semantics with 10000 word
                                     lexicon
                                    Tested TTS System with perceptually        3.2644
                                     intelligible speech output with complete
                                     duration model and intonation model
                                    Complete API for TTS
     th                             Complete API for Machine Translation
25        June 2006
                                     Engine
                                    Machine Translation Engine Enhanced
                                     for three domains with domain specific
                                     lexica added
                                    Linux support for the three engines

           Total   for   FY   2003-04:             20.005 Million
           Total   for   FY   2004-05:             6.238  Million
           Total   for   FY   2005-06:             9.973  Million
           Total   for   FY   2006-07:             0.09   Million




                                                     26
                                                               Annexure-II
       Cost of Products/ Services

                                              Revised Cost as per DDWP
                                              meeting held on 18th Jan,05

                                                         Unit            Total
S.N.   Description                           Qty       Cost (Rs)       Cost (Rs)
       HARDWARE


 1     Server                                      2        300,000         600,000
 2     Computers                                 28          50,000       1,400,000
 3     Amplifier                                   1         50,000          50,000
 4     Mic.                                        2          7,000          14,000
 5     Anechoic Chambers                           1         50,000          50,000
 6     Rack Filter                                 1         65,000          65,000
 7     Printer                                     2        100,000         200,000

                                 Sub-Total                                2,379,000


       SOFTWARE / APPLICATIONS

       Application Development Platforms
 1     and Tools                                                          400,000


 2     TTS Engine & Development Tools        0                     0               0


 3     MT Engine & Development Tools         0                     0               0
                                 Sub-Total                                 400,000


        Total Cost of Products / Services                                 2,779,000




                                    27
                                                                                          Annexure-III
        PROJECT STAFF SALARIES
        Revised Cost as per DDWP meeting held on 18th Jan,05
S.No.   Description                 Min.        Monthly Salary     Months for which     Total Expenditure
                                    No.         Package per        provision is being
                                                Employee           made



 1      Project Manager                     1             30,000                  10                   300,000
                                                          40,000                  24                   960,000
 2      Project Secretary                   1             11,000                  34                   374,000


 3      Network Support Person              1             10,000                  34                   340,000

 4      Administrative support              1              5,500                  34                   187,000

 5      Visiting Expert from                1            100,000                    3                  300,000
        outside Pakistan
 6      Linguist for Syntax,                1             35,000                  24                   840,000
        Semantic, Morphology



 7      Speech Phonetician /                1             40,000                  34                 1,360,000
        Phonologist

 8      Urdu Language expert                1             35,000                  31                 1,085,000

 9      Senior Computational                1             60,000                  34                 2,040,000
        Linguist

 10     Senior Computer Speech              1             60,000                  34                 2,040,000
        Scientist
 11     Computer Speech Scientist           1             32,000                  34                 1,088,000

 12     Computational Linguist              1             30,000                  34                 1,020,000

 13     Research Officers                   8             22,000                  10                 1,760,000
                                           12             25,000                  24                 7,200,000
 14     Associate Research                  8             15,000                  10                 1,200,000
        Officers
                                           14             17,000                  24                 5,712,000
 15     Research Assistants                 3              5,600                  34                   571,200
        TOTAL (Rs.)                                                                                 28,377,200




                                                - 28 -
                                                      Annexure-IV
       Operating Expenditure during life of project
       Revised Cost as per DDWP meeting held
S.N.   Description                                       Total Exp.
 1     Books                                               240,000
 2     Journals                                            150,000
 3     Technical Papers                                     75,000
 4     Printing & Photocopying                             180,000
 5     Mailing                                              30,000
 6     Stationery                                           45,000
 7     Operational Expenses                               2,931,600
 8     Audit & Administrative Support                      150,000
 9     Local Travel                                        360,000
10     Foreign Travel Tickets & Conferences                900,000
       TOTAL (Rs.)                                       5,061,600




                                        - 29 -
                                                              Annexure-V

MAINTENANCE EXPENDITURE FOR SIX MONTHS AFTER COMPLETION OF PROJECT
FOR 2006-07

S.N.   Description                                            Annual Cost
                                              Monthly Cost
                                                      (Rs.)         (Rs.)
       Service Charges for website and
 1     application maintenance                       15,000        90,000
       TOTAL (Rs.)                                  15,000        90,000




                                         30
ELECTRONIC GOVERNMENT DIRECTORATE’S EXPENSES



                                                                             Annexure-VI
                                ANNUAL PHASING OF EGD FUNDS

                          FY 2003-04            FY 2004-05     FY 2005-06      FY 2006-07


     EGD Administrative
         Expanses               0.54              0.54              0.54         .0117


     Miscellaneous &
     Contingencies              0.32             0.3685            0.3685          0

            Total               0.86             0.9085            0.9085       0.0117


                    Total for   FY   2003-04:             0.86     Million
                    Total for   FY   2004-05:             0.9085   Million
                    Total for   FY   2005-06:             0.9085   Million
                    Total for   FY   2006-07:             .0117    Million




                                       - 31 -
ELECTRONIC GOVERNMENT DIRECTORATE’S EXPENSES

                                                                          Annexure-VII
         ADMINISTRATIVE COST AT EGD ADMINISTRATIVE COST AT EGD
         Revised Cost as per DDWP meeting held
                          Basic    House       Utilities         Months for
                          Salary   Rent        Allowan               which
                                   Allowance          ce          provision
    S.                                                   Monthly   is being
   No.                                                   expense      made   Total ExP.
         Project Manager 16,129    7,258          1,613    25000         14    350,000
         Salary           18,548   8,347          1,855    28750         10    287,500
    1                     21,329   9,,598         2,133    33060         12    396,720
    2    Travel                                                                 200,000
         Telephone, Fax,
    3    Internet                                         4,200          44     184,800
         Stationery &
    4    office expenses                                  1,000          44      44,000
    5    Laptop                                                                   80,000
         Advertisements,
    6    Miscellaneous                                                            88,680


                                                                              1,631,700




                                         Page 32 of 32
        88,680


                                                                               1,631,700




                                          Page 32 of 32

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:12
posted:11/12/2010
language:English
pages:33