Citation Analysis_ Mechanical Translation of Chemical Nomenclature

Document Sample
scope of work template
							Essays of an Information Scientist, Vol:2, p.415-418, 1974-76      Current Contents, #5, p.5-8, February 2, 1976




                       February2,    1976                                                                  Number     5
                          The microstructure of science is very dif-         All this is leading up to a discussion of
                       ferent    from    its macrostructcsre.    For      some work I did which is rarely cited but
                       example, I can confident y assert that             which gave me fantastic satisfaction. I refer
                       “milestone” papers-those which are sub-            to a paper on mechanical translation of
                       jectively rated as important by a large            chemical nomenclature. 8 This was the sub-
                       number of scientists-are, on average, fre-         ject of my doctoral dissertation. Since 1‘m
                       quently cited. However, 1 cannot truthful-         so often asked why, 1‘d like to tell you how
                       ly assert that every single “milestone” pa-        I happened to take a degree in linguistics
                       per is highly cited. A f-     may have been        rather than library science.
                       almost totally ignored, for a variety of rea-         I entered the field of documentation,
                       sons. In fact, some portions of my own             now information science, from chemistry
                       work that I regard most highly have been           by joining the Johns Hopkins University
                       least cited. Thus it is painfully apparent to      Indexing Project in 1951. I stayed until its
                       me that models which are valid and relia-          demise in 1953. By the middle of 1954 I
                       ble in the macrostructurc of science can           had already accumulated a msstcr’s degree
                       crumble when the focus is narrowed to the          in library science and sufficient graduate
                        microstructure of wiencc.                         credits to satisfy the minimum require-
                           How extensive this phenomenon is wc            ments for a Ph.D. But it proved impossible
                       have not yet been able to determine. I             for mc to find a faculty member at Colum-
                       wonder whether Watson and Crick would              bia University who would approve for my
                       agree that thcit 19>3 paper in Nuttire3 re-        dissertation topic the use of machine me-
                       presents the pinnacle of their work? 1 know        thods in scientific information, The only
                       that Oliver Lowry’f correctly asserts that his     sympathetic ear was that of Professor
                        most important papers ate not his most            Merrell Flood, but in order to take a degree
                       cited. But that does not say his most im-          with him, 1 would have had to take under-
                        portant are not heavily cited. Keep in            graduate training in industrial engineering.
                        mind that 1 am not saying citation analysis       in retrospect, I see more clearly how rele-
                       cannot detect the significant though infrc-        vant systems work has been in my career.
                        quendy cited pa~r. Back in 1964 we pro-              I tried to form an interdisciplinary facul-
                        duced a “computerized” history of DNA~            ty group, but I was not interested in
                        which showed some papers that were infre-         spending ten years trying to satisfy an in-
                        quently cited but were significant in break-      tetfacuky group that would supervise my
                        ing the genetic code.                             work. By that time my family had already
                           Examples of this kind have given me            been convinced I was going to be a student
                        reason to question the assertion by Co]&          forever. I left Columbia disappointed. But
                        that there is no validity in the Ortega hy-       in 1954, through my friend and colleague,
                        pothesis.7 This theory asserts that advances      Casimir Borkowski, I met Professor Z.eliig
                        in science depend in part on the contribu-        Harris at the University of Pennsylvania,
                        tions of mediocre scientists. While we may        Deparrmcnt of Linguistics, His work in
                        all stand on the shoulders of giants, they in     structural linguistics was already well
                        turn depend upon many average or less             known to scholars, but in the field of sci-
                        eminent scientists. Whether they depend           entific information hc was unknown. In
                        upon dwarfs is another question.                   1956, I wrote a paper on the application of
                                    preprinted from J. C%em.Inform. & Comp. Sci. 15(3): 153-55,1975.


                                                                    415
structural linguistics to mechanid indcx-             Rccogniaing that the dictionary work a-
ing9 and showed it to Harris. ‘fhottgh it          Ionc might talcc me several yeats unless I
wx never published, Harris became suf-             got help, I proposed that the theory bc
f~iencly interested in the field of irsforma-      proven with respect to acyclic compounds.
tion retrieval to accept some huge grants          During the next few years 1 got into the
from NSF over a ten-year period. Most of           dcmiled problems of discourse analysis for
this work is now continued ptimsuily by            my target language-chemical       nomcncla-
Naomi Sager at New York University. 10             turc. The details arc not essential to this
Some of you may recall mutsformatiomd              story. When I was ready for actual com-
and diSCOUfSC atsdysis.                            puter trials, I got the help of John
  I suppose it was prestigethat made me            O’Connor      in programming      Univac 1,
seek a Ph.D. I ultimatcl y worked out a            which was then in use at Penn. But I found
doctoral program with Professor Harris             that I could never get time on the com-
which commcnccd officially in 1958. Wc             puter, so I had to buy time at the Franklin
had agreed on the amount of course work            Institute computer ccntcr.
and my ulciiate       d~rcatiocr      topic. By       The outcome of all this was ‘ ‘an algori-
then 1 was quite preoccupied with prob-            thm for translating chemical nomenclature
Icms of chemical indexing. We were en-             into molecular formulas. ” 12 When I sub-
coding all new steroids for the U.S. patent        mitted it to the department it was only tcn
Ol%cc under a contract with the Pharma-            pages. My substitute adviser was dumb-
ceutical Manufacturers Association.                founded by this. Dissertations in linguistics
    By 1960, the Institute for Scicntflc in-       arc written by the pound-not     the page. I
formation     (1S1) was publishing         In&x    spent a whole semester filling it out with
Cbemtk.      The original purpose of this ser-     interesting theoretical statements and for-
vice was to index compounds by molecular           mal analyses of chemicaJ morphology, etc.
formula. So it was natuml for me to want           By late 1960 I had made the first succcssfd
to find a way of calculating molecular for-        computer run in calculating a molcctdar
mulas in the simplest way possible. Until          formula directly from a systematic name.8
that time everyone assumed that it was             I had done this manually hundreds of
necessary to dmw a stt-uctuml diagram in           times ca.dicr in the yeax.
order to calculate a molccukar formula.               As it turned out 1S1 was never able to
Even Aschcr Oplcr, 11 who wrote the pi-            finance the research necessary to complete
oneering paper in 1956 on “New Speed to            this work. NSF was not very kindly dis-
Structural Scarchcs’ ‘ , assumed this was the      posed to us in those days. Wc also were up
case. That is why hc fimt wanted to rcprc-         to our cars in the G8rne#s2sCitation index
scnt the compound in a topological matrix          project so 1 had to put chemical nomen-
which later was called a connectivity table,       clature work on the back bumcr. Wc never
    My linguistic studies convinced mc that        did input compound names for Cxme~t
the ‘‘ mcaoing” of chemical nomenclature           Ahtiats    of Chwi.rW (CAC); on the con-
had to include enough information for cal-         trary, wc now input Wiswcsser Line Nota-
culating molecular formuhu straight away.          tion (WIN) for each compound and that is
Otherwise, how could wc do this so quickly         what wc use to compute the molecular
in our heads for simple compounds? I tnld          formula. However, the double bond check-
Professor Harris my theory and hc accepted         ing routines that wc used for so long were
 it as my doctoral thesis, the first in the ncw    inchsdcd in my algorithm.
 field of chcmico-linguistics. Thanks to the          About eight years ago 1 saw the proposal
 recognition by Professor Allen Day of             cbemidAbstracts      made to NSF regarding
 Penn’s Chemistry Department that it was a         chemical        nomenclature      translation
 nontrivial problcm, the topic was agreed          research. Naturally I felt envious that they
upon in the gmduatc school. How~crt be-            should get this support when it was clcady
fore 1 could work on my dissertation, I had        an opemtional development they needed
to prove my theory worked. If it did not, I        more than 1S1. That’s what made it ap-
would have to choose another topic, no             plied for thcm and academic for us.
matter how long I spent on the research.              However, I was very glad someone was



                                                  416
doing this and read with tied       feelings     complete. Consider that 25% of the ab-
the fmtteports ofthis research in 1967.13        stracts in CA are of Russian material. 18
A recent paper in the Jouwd of Chemical          From our extensive citation anal~s        we
Documentatios~4 shows chat this work is          know that this is absurd in relation to the
finally coming to fruition, and I congmcu-       significance of Russian research. They are
late the CA group on their accomplish-           polluting the waters of science with a lot
ment.                                            of mediare and unrefereed material. Pro-
   Returning to the main point of my es-         bably another 10% of CA falls into this
say. Here is a topic of research which has       category. No doubt others do it too, but
multi-million dollar economic significance.      the data show clearly chat the Russians arc
There are only a few people in the world         the worst offenders. Does anyone anywhere
interested in it, so the number of times         doubt the superiority of the Jowwd of tbe
this kind of work will be cited is bound to      Ametshm Chemud So&e/y over the Zhur-
be small. Clearly it is the kind of thing        nd Ohbcbei Kbimiz? How would you
that is less cited than, e.g., papers on         compare the abstracts of the ACS meetings
WLN, but there is an important connect-          to the abstracts of unpublished papers that
ing thread. Perhaps historians will decide       the Russians are now loading into the Rsi.r-
that Opler’s notion of a connectivity table      rtin Josmsd of Phyks6 Chemirby. Un-
for chemical compounds has been the most         doubtedly it gives the Russians sigti]cant
important concept in this field. 11 Most         political leverage to assert they account for
people seem to think that Sussenguth was         25% of CA’s coverage. Maybe they will
the first one to use his concept. 15 But         even claim CA should pay them a royalty
clearly none of these chemical information       for abstracting without their permission.
milestones has had any major discernible         After all, CA abstracts do constitute a sub-
impact outside the field, and that is what       stimte for the original Russian material.
the historian seeks and seems to find in           There is an important distinction to be
large-scale citation analyses. This again        made between unrefereed material appear-
demonstrates that the microstructure of sci-     ing in high-priced journals and unrefereed
ence is very different from its macrosrruc-      material listed in a depository. Each ab-
ture.                                            stract requites the same space and work.
   So much for the history of mechanical         But at least someone was willing to pay for
translation of nomenclature. Let me digress      that so-called high-priced journal. If libmr-
now to make some observations on the             ians are as indiscriminate   as they are ac-
future of chemical and scientific publica-       cused of being, then why aren’t they buy-
tion. This has been much in the news these       ing the original Russian journals and ab-
days, that is, C&E News! Joel Hildebmnd,         stracts? I‘m sure that Earl Coleman would
my freshman chemistry professor, has             be delighted if libraries bought his tmns-
caused a lot of soul-searching with his re-      Iation journals without the slightest evalua-
discoveryof the ancient idea of publication      tion. He knows how hard it is to sell the
by abstract. I’ve had some contact with          best that the Russians publish. He would
him in recent yearn and I know why he is         court disaster to publish” everything with-
making these proposals. Unlike James             out regard to quality.
Stemmie who in C&EN16 seems worried                 h is a rather interesting obwvation that
that some important ideas will be lost to        10% of CA’s budget is about $2 million.
posterity if we adopt any changed systems,       If they cut back on Russian material they
Hildebmnd is trying to tell us that the          would find the same $2 million they want
system is overloaded with useless informa-       the Russians to pay for pirating CA.
tion; he is talking about information pol-          At 1S1we have very mixed feelings about
lution on a huge scale. I have rceentlyll        CA. On the one hand, wc resent theit high
asserted that the abuse of the page-charge       price because a chemistry department is
system may be aggmvating this pollution          generally apt to say that it can’t -afTordthe
problem. And I regret to say Chemicaf            Science Citatib# Index (SCI) but it must
Abstrads may be qually guilty. CA does           buy CA. If for no other reason, it couldn’t
this unwittingly in its hopeless aim to be       get ACS accreditation without it. On the


                                           417
                                                        ..      . .               . . . ----
other hand, the higher CA’s price                  chcckmg rhe Items retmved m the WI!
becomes, the more easily we can convince           This is frequently done when people use
buyers that SCI or CAC is a good value.            MEDLINEand SCISEARCH,2but obvious-
However, given my choice, I would much             ly the inclination to do so is tempered by
rather see CA priced lower. So I have a real       the vast differences in per-hour rates.
concern for their cost-effectiveness. In fact,        In closing, I will mention miniprint,
given my druthers, 1 would provide for CA          which has now come into the limelight. As
a citation index to the chemical literature        the cost of paper goes up, CA and 1S1may
that would complement CA searches. The             well have to adopt such methods. Whether
combined use of CA and SCI is happening            users will accept miniprint more readily
increasingly, but it would be nice if we           than microform is hard to determine, but
could accelerate the use of SCI by chemists        there is a whole new technology opening
as was suggested by the Hartnay Commit-            up now that the ‘‘Oxf6rd English Diction-
tee many years ago. 19                             ary” has become so successful in this medi-
   The recent paper by Party, Linford, and         um. Ralph Shaw and Albert Boni experi-
Rich1 shows a clear trend toward such tom.         mented with miniprint long ago. I just re-
plcmcntary use of large data bases, This           diwovcrcd it when I was thinking about
will increase as the cost of on-line services      ways to cut down on indexing costs. Maybe
declines.                                          it’s still not too late for CA to try it. After
  I recently did a search of the CA data           all, the most successful publishing vemurc
base using our Permutenw       Subject InAx        of the past dccadc has been in the mini-
(PSI) to identifi pertinent search terms and       print edition of the “Oxford English Dic-
then followed up the output from CA by             tionary”.
1. ParryAA, LirrfordRG&RichJ I. Com-               11. Opler A & Norton T R.          New speed to
puter Iiteratutc searches;a comparison of the      stmctuml scarchcs.  Chern.          Eng.    News
performmcc of two commercial systems in an         34:2812-14, 1956.
interdisciplinary subject. hrf SCi. 8:179-87,      12. Gartield E. An afgontbm for tradating
1974.                                              cbemicd       names    to   molecrdar formrdas.
2. Gar&ld      E. 1S1’s SCISEARCH timc-            (Philadelphia: Institute for scientific Informa-
shaccd system trades time for money--but arc       tion, 1961), 68 pp.
you ready for this? Crmwr/ Cofimr@
                                                    13. VanderStouw G G, Naznirsky I &
(C@ ) No. 40,4 October 1972, p. 5-6.
                                                   RushJ E. Proccdurcs for converting systematic
3. Watson J D & Crick F H C. A structure
                                                    names of organic compounds into atom-bond
for dcoxyribosc nucleic acid. Nuture 171:737,
                                                   connection tables. J. C&em. Dec. 7: 165.69,
1953.
                                                    1967.
4. Lwvty O. Personal communication to
D.J.D. Price, quoted in: Garficld E. Citation      14. VanderStouw G G, Elliott P M &
frequency as a measure of research activity and    Iaenberg A C. Automatic conversion of chem-
performzncc. CC No. 5, 31 Jan’1973, p. 5-7.        ical substance names to atom-bond connec-
                                                   tion tables. J C~esn. Dec. 14: 185-93, 1974.
5. Garfield E, Sher 1 H & Torpie R J. T4e
use of citation data in wnh”rng  the hidosy of     15. Susenguth, E H. Gmph theoretic algor-
science. (Philadelphia: Institute for Scientific   ithm for matching chemical structures. J,
Information, 1964), 86 pp.                         Cbem. Dec. 5:36-43, 1965.
6. Cole J R & Cole S. The Ottcga hypo-                 16. StesnsnleJT. Control of scientific papers
thesis. Science 178:368-75, 1972,                      Cbem. Eng. New~ 53:33-34, 1975.
7. Ortega y Gamer J. The revo/t of /be                 17. Garfield E. Page charges; for profit and
mmse$. (NCWYork: Norton, 1932), p. 84-85.              non-profit journals; and freedom of the scien-
8. Garfield E. Chemico-linguistics; compu-
                                                       tific press. CC No. 7, 17 February 1975, p.
ter translation of chcmicai nomenclature.
                                                       5-7.
N@ure 192:192, 1961.
                                                       18. Baker D. World’s chemical literature con-
9. Garfield E. Proposal for research in me-
chanical indexing. Unpublished manuscript,             tinues to expand.     CAem. Eng. News
                                                       49: 37-40, 1971.
 1956.
 10. Sager N. Syntactic formatting of scicncc          19. Anonymous. ACS report mtcs informa-
information. AFIPS Conf Pmt. 41: 791-800,              tion systcm cfficicnc y, CAem. E#g. Nesw 47:
 1972.                                                 45-46, 1969.




                                                 418

						
Related docs
Other docs by gyvwpsjkko
Brief instructions for taking cheek swabs
Views: 3  |  Downloads: 0
SINGER'S SINGER
Views: 5  |  Downloads: 0
Where to go for help
Views: 93  |  Downloads: 0
How Are Effective Brochures Made
Views: 53  |  Downloads: 0
A Case Study- A Melon Breeding Project
Views: 21  |  Downloads: 0
Tour South Africa - Cross African Tours……
Views: 4  |  Downloads: 0