PDSpread� - Heritage Document Picture Analysis in Spreadsheets

Document Sample
PDSpread� - Heritage Document Picture Analysis in Spreadsheets Powered By Docstoc
					                                                                                         PDSpread©
     A          B              C              D             E              F             G              H              I             J           K   L   M   N
                                   ©
1                        PDSpread - Heritage Document Picture Analysis in Spreadsheets
2
3                                      Copyright 2010 - Kedarnath Jonnalagadda - vaidika grAmam
4                                                   smartxpark@yahoo.com
5
6
7         This is part of book - PDSpread© - Heritage Document Picture Analysis in Spreadsheets by Kedarnath Jonnalagadda
8
9         Pictures of Heritage Documents are great for reading but if you want to search, study and analyze you need analyzable text.
10        This is the type of text that you can key in through the key board. In order to get analyzable text from pictures you
11        have the traditional "transcription" method. This is simply keying in the text manually. This is time consuming and
12        laborious. But this is the best and most accurate in the case of mixed language texts and mixed scripts.
13
14        OCR - optical character recognition technology is being used to save time. But the recognition is not accurate and
15        we usually end up spending more time correcting the mistakes than we might have typing it in the traditional way.
16        Adding to the troubles is the wide spread use of PDF files. And these are absolutely and fundamentally flawed for the
17        illusion that is their basis. These have the keyed in or analyzable text behind the pictures! Is this text true to the picture?
18        You have no clue. While it may be admirable that you have such great trust in fellow Man, placing that much trust on
19        machines is, in my opinion, naivety. PDSpread©, is a great conflict resolver. We have no quarrels with either software,
20        or hardware. We just want to get our work done. And our work is with Heritage Documents.
21
22        How is that achieved? Very simply by using software that is sitting on your desk top or lap top, Spread sheet software
23
24       1 The picture of the page of text is split into columns if it has columns using software such as Gimp © from http://www.Gimp.org
25       2 The picture is pasted into the spreadsheet. I am comfortable with Calc © the spreadsheet component of OpenOfiice.org
26         from http://www.OpenOffice.org. But then I am equally comfortable with Excel © of Microsoft Office©.
27       3 OCR output is pasted into another area of the spreadsheet. It could be cells and cells away but remember you can have
28         split windows in spreadsheets. And that exactly is what we want. Pictures odf text from Heritage Documents in one area. I call this
29         the Museum Area. And the analyzable text is in my Study Area. Now I can see the veracity of transcribed text. And also actually
30         set up bookmarks and hyperlinks in both area to be able to home in either area.


                                                                                             Page 1
                                                                                         PDSpread©
     A          B              C             D              E             F              G               H                 I               J           K             L             M             N
                                       ©
31       4 When we think PDSPREAD , pictures and spreadsheets we think simple and we also think powerful with facilities flowing in for
32         our serious work from little expected quarters and guiding us in a way for better work.
33
34        For example, a complete Sanskrit word list of any document is quite easily obtained. This is done by using an add on called Linguist
35        for Writer© component of OpenOffice.org. Normally, spell checkers in documents highlight (squiggly underline) words that are not
36        in the software's dictionary. An English dictionary of course will not have Sanskrit words and all these are underlined. Linguist-
37        Add on collects all these and gives a report. Well this report is your Sanskrit word list. Of course there would be misspelt English
38        words too in the list that need correcting!
39
40        This is just a column and there are smiles and smiles to go.
41
42
43
44
45                      MUSEUM AREA                                                               STUDY AREA                                      Unrecognized words in software's English Dictionary
46                                                                                                                                                generated by Linguis Add on
47
48                                                                 1A             *3 1. a,the first letter of the alphabet; the                   a'n-anna                  1
49                                                                 2              first short vowel inherent in consonants. [- kAra], m.          a-bhinna                  1
50                                                                 3              the letter or sound [a].                                        a-brAhmaNA                1
51                                                                 4A             N 2. a (pragRuhya, q.v.), a vocative particle                   a-kupya                   1
52                                                                 5              [a Ananta], O Vishnu, T.; interjection of pity, [Ah]            a-pashyat                 1
53                                                                 6A             W 3. [a] before a vowel an, exc. [a-RuNin],                     a-pUpa                    1
54                                                                 7              a prefix corresponding to {Gk. ά, άν,[a'][a'n] Lat. in, Goth.   a-RuNin                   2
55                                                                 8              and Germ. un, Eng. in or un}, and having a negative·            a-saH                     1
56                                                                 9              or privative or contrary sense ([an-eka] not one;               a-spRuhyanti              1
57                                                                10              [an-anta] endless; [a-sat] not good; [a-pashyat] not            a-svaptum                 1
58                                                                11              seeing; rarely prefixed to Inf ([a-svaptum] not to              a-tad                     1
59                                                                12              sleep, T5ndyaBr.) and even to forms of the finite               a-tIkShNa                 2
60                                                                13              verb ([a-spRuhyanti] they do not desire, BhP.; Sis.)            a-yaj                     1

                                                                                             Page 2
                                              PDSpread©
     A   B   C   D   E            F           G               H                 I                  J      K      L       M   N
61                       14            and to pronouns ([a-saH]: not hc, Sis.; [a-tad] not that,      Aditya         1
62                       15            BhP.); occasionally denoting comparison (a-brAhmaNA]           an-anna        1
63                       16            like a Brahman, T.); sometimes disparagement                   an-anta        1
64                       17            ([a-yaj~ja] a miserable sacrifice) ; sometimes diminutiveness- an-eka         1
65                       18            (cf. [ a'-karNa], [an~udarA]); rarely an expletive             Ananta         1
66                       19            (cf. [a-kupya], [a-pUpa]. According to {Pan. vi, 2, 161},      asya           1
67                       20            the accent may be optionally either on the first               atra           1
68                       21            or last syllable in certain compounds formed with [a]          bhAgin         1
69                       22            (as [a-tIkShNa] or [a-tIkShNa'], [a'-shuci] or [a'-shuci],     bhAj           1
70                       23            [a'n-anna] or [an-anna') ; the same applies to stems ending    bhinna         1
71                       24            in [tRu] accentuated on the first syllable before [a] is pre-  BhP            2
72                       25            fixed; cf. also [a'-tUrta] and [a'-tUrta], [a'-bhinna] and     bhU            1
73                       26            [a-bhinna], &c.                                                bhUta          1
74                       27 A          ·!T 4. [a], the base of some pronouns and                      cf             4
75                       28            pronom. forms, in [asya], [atra],&c.                           dv             1
76                       29 A          W 5. a, the augment prefixed to the root in                    Eng            1
77                       30            the formation of the imperfect, aorist, and conditional        exc            1
78                       31            tenses (in the Veda often wanting, as in Homer, the            Gk             1
79                       32            fact being that originally the augment was only prefixed       hArin          1
80                       33            in principal sentences where it was accentuated,               hc             1
81                       34            whilst it was dropped in subordinate sentences where           heirship       1
82                       35            the root· vowel took the accent).                              ifc            1
83                       36 A          G 6. [a], [as], m.,N. of Vishnu, L. (especially                ikA            1
84                       37            as the first of the three sounds in the sacred syllable        Il             1
85                       38            [om]).                                                         ind            1
86                       39 AGÍhÉlÉç   W¥f!Il?( [a-RuNin], mfn. free from debt, L.                    Inf            1
87                       40 AÆzÉç      [a~Msh], cl. 10. P. [a~Mshayati], to divide,                   ja             1
88                       41            distribute, L.; also occasionally Ā. [a~Mshayate],· kalpanA                   1
89                       42             L.; also [a~MshApayati], L.                                   kAra           1


                                                  Page 3
                                              PDSpread©
      A   B   C   D   E              F        G             H             I             J             K            L       M   N
90                        43 AÆzÉ        WST [a'~Msha], [as], m. (probably fr. {√1.as},             karaNa             1
91                        44             perf [An-a~Msha], and not from the above {√a~Msh]karNa                        1
92                        45             fictitiously formed to serve as rt.), a share, portion, kRu                   1
93                        46             part, party; partition, inheritance; a share of booty ;    Lat                1
94                        47             earnest money; stake (in betting), {RV. v, 86, 5};         lat                2
95                        48             TāṇḍyaBr.; a lot (cf. 2.[prAs]); the denominator of m                         9
 96                       49             a fraction; a degree of lat. or long.; a day, L.; {N. of MBh                  1
 97                       50             an Aditya}. [-karaNa], n. act of dividing. [-kalpanA],     mf                 1
 98                       51             f. or ·[-prakalpanA}, f. or ·[pradAna], n. allotment       mfn                5
 99                       52             of a portion. [-bhAgin] or [-bhAj], mfn. one               Msh                2
100                       53             who has a share, an heir, co-heir.[ — bhU], m. partner,    Msha               3
101                       54             associate, TS. [-bhUta]., mfn. forming part of. [-vat]     MshA               2
102                       55             (for [a~Mshumat] ?), m. a species of Soma plant, {Susr.}   Mshaka             2
103                       56             [-savarNana], n. reduction of fractions.[ — svara],        Mshala             2
104                       57             m. key-note or chief note in music. [-hara]. or            MshApayati         1
105                       58             [-hArin], mfn. taking a share, a sharer. [a~MshA~Msha]     MshAvataraNa       1
106                       59             m. part of a portion (of a deity). secondary incarnation.  Mshayate           1
107                       60             [a~MshA~Msi], ind. share by share. [a~MshAvataraNa]        Mshayati           1
108                       61             n. descent of part of a deity; partial incarnation;        MshI               1
109                       62             title of sections 64-67 of the first book of the           Mshi-tA            1
110                       63             MBh. [a~MshI-] {√ 1.[ kRu]}, to share.                     Mshin              1
111                       64             1. [a~Mshaka], mf. ([ikA])n. (ifc.) forming part.          Mshu               1
112                       65             2. [a~Mshaka], [as], m. a share; degree of lat. or long. ; Mshumat            1
113                       66             a co-heir, L.; (am), n. a day, L.                          Msi                1
114                       67             [a~Mshala] Sec [a~Mshala'] next col.                       om                 1
115                       68             [a~Mshin], mfn. having a share, {Yājñ.}. [a~Mshi-tA], f.   perf               1
116                       69             the state of a sharer or co-heir, heirship.                pradAna            1
117                       70 AÆzÉÑ       [a~Mshu'], [us], m. a filament (especially of the          pragRuhya          1
118                       71             Soma plant); a kind of Soma libation, SBr. ; thread;       prakalpanA         1


                                                  Page 4
                              PDSpread©
      A   B   C   D   E   F   G            H   I   J       K       L       M   N
119                                                    prAs            1
120                                                    pronom          1
121                                                    q.v             1
122                                                    ri              1
123                                                    savarNana       1
124                                                    SBr             1
125                                                    shuci           2
126                                                    Susr            1
127                                                    svara           1
128                                                    tRu             1
129                                                    tUrta           2
130                                                    TāṇḍyaBr        1
131                                                    udarA           1
132                                                    un              2
133                                                    vi              1
134                                                    Yājñ            1




                                  Page 5

				
DOCUMENT INFO
Shared By:
Stats:
views:40
posted:6/2/2010
language:English
pages:5
Description: This is part of book - PDSpread� - Heritage Document Picture Analysis in Spreadsheets by Kedarnath Jonnalagadda Pictures of Heritage Documents are great for reading but if you want to search, study and analyze you need analyzable text. This is the type of text that you can key in through the key board. In order to get analyzable text from pictures you have the traditional "transcription" method. This is simply keying in the text manually. This is time consuming and laborious. But this is the best and most accurate in the case of mixed language texts and mixed scripts. OCR - optical character recognition technology is being used to save time. But the recognition is not accurate and we usually end up spending more time correcting the mistakes than we might have typing it in the traditional way. Adding to the troubles is the wide spread use of PDF files. And these are absolutely and fundamentally flawed for the illusion that is their basis. These have the keyed in or analyzable text behind the pictures! Is this text true to the picture? You have no clue. While it may be admirable that you have such great trust in fellow Man, placing that much trust on machines is, in my opinion, naivety. PDSpread�, is a great conflict resolver. We have no quarrels with either software, or hardware. We just want to get our work done. And our work is with Heritage Documents. How is that achieved? Very simply by using software that is sitting on your desk top or lap top, Spread sheet software