Citation Analysis_ Mechanical Translation of Chemical Nomenclature
Document Sample


Essays of an Information Scientist, Vol:2, p.415-418, 1974-76 Current Contents, #5, p.5-8, February 2, 1976
February2, 1976 Number 5
The microstructure of science is very dif- All this is leading up to a discussion of
ferent from its macrostructcsre. For some work I did which is rarely cited but
example, I can confident y assert that which gave me fantastic satisfaction. I refer
“milestone” papers-those which are sub- to a paper on mechanical translation of
jectively rated as important by a large chemical nomenclature. 8 This was the sub-
number of scientists-are, on average, fre- ject of my doctoral dissertation. Since 1‘m
quently cited. However, 1 cannot truthful- so often asked why, 1‘d like to tell you how
ly assert that every single “milestone” pa- I happened to take a degree in linguistics
per is highly cited. A f- may have been rather than library science.
almost totally ignored, for a variety of rea- I entered the field of documentation,
sons. In fact, some portions of my own now information science, from chemistry
work that I regard most highly have been by joining the Johns Hopkins University
least cited. Thus it is painfully apparent to Indexing Project in 1951. I stayed until its
me that models which are valid and relia- demise in 1953. By the middle of 1954 I
ble in the macrostructurc of science can had already accumulated a msstcr’s degree
crumble when the focus is narrowed to the in library science and sufficient graduate
microstructure of wiencc. credits to satisfy the minimum require-
How extensive this phenomenon is wc ments for a Ph.D. But it proved impossible
have not yet been able to determine. I for mc to find a faculty member at Colum-
wonder whether Watson and Crick would bia University who would approve for my
agree that thcit 19>3 paper in Nuttire3 re- dissertation topic the use of machine me-
presents the pinnacle of their work? 1 know thods in scientific information, The only
that Oliver Lowry’f correctly asserts that his sympathetic ear was that of Professor
most important papers ate not his most Merrell Flood, but in order to take a degree
cited. But that does not say his most im- with him, 1 would have had to take under-
portant are not heavily cited. Keep in graduate training in industrial engineering.
mind that 1 am not saying citation analysis in retrospect, I see more clearly how rele-
cannot detect the significant though infrc- vant systems work has been in my career.
quendy cited pa~r. Back in 1964 we pro- I tried to form an interdisciplinary facul-
duced a “computerized” history of DNA~ ty group, but I was not interested in
which showed some papers that were infre- spending ten years trying to satisfy an in-
quently cited but were significant in break- tetfacuky group that would supervise my
ing the genetic code. work. By that time my family had already
Examples of this kind have given me been convinced I was going to be a student
reason to question the assertion by Co]& forever. I left Columbia disappointed. But
that there is no validity in the Ortega hy- in 1954, through my friend and colleague,
pothesis.7 This theory asserts that advances Casimir Borkowski, I met Professor Z.eliig
in science depend in part on the contribu- Harris at the University of Pennsylvania,
tions of mediocre scientists. While we may Deparrmcnt of Linguistics, His work in
all stand on the shoulders of giants, they in structural linguistics was already well
turn depend upon many average or less known to scholars, but in the field of sci-
eminent scientists. Whether they depend entific information hc was unknown. In
upon dwarfs is another question. 1956, I wrote a paper on the application of
preprinted from J. C%em.Inform. & Comp. Sci. 15(3): 153-55,1975.
415
structural linguistics to mechanid indcx- Rccogniaing that the dictionary work a-
ing9 and showed it to Harris. ‘fhottgh it Ionc might talcc me several yeats unless I
wx never published, Harris became suf- got help, I proposed that the theory bc
f~iencly interested in the field of irsforma- proven with respect to acyclic compounds.
tion retrieval to accept some huge grants During the next few years 1 got into the
from NSF over a ten-year period. Most of dcmiled problems of discourse analysis for
this work is now continued ptimsuily by my target language-chemical nomcncla-
Naomi Sager at New York University. 10 turc. The details arc not essential to this
Some of you may recall mutsformatiomd story. When I was ready for actual com-
and diSCOUfSC atsdysis. puter trials, I got the help of John
I suppose it was prestigethat made me O’Connor in programming Univac 1,
seek a Ph.D. I ultimatcl y worked out a which was then in use at Penn. But I found
doctoral program with Professor Harris that I could never get time on the com-
which commcnccd officially in 1958. Wc puter, so I had to buy time at the Franklin
had agreed on the amount of course work Institute computer ccntcr.
and my ulciiate d~rcatiocr topic. By The outcome of all this was ‘ ‘an algori-
then 1 was quite preoccupied with prob- thm for translating chemical nomenclature
Icms of chemical indexing. We were en- into molecular formulas. ” 12 When I sub-
coding all new steroids for the U.S. patent mitted it to the department it was only tcn
Ol%cc under a contract with the Pharma- pages. My substitute adviser was dumb-
ceutical Manufacturers Association. founded by this. Dissertations in linguistics
By 1960, the Institute for Scicntflc in- arc written by the pound-not the page. I
formation (1S1) was publishing In&x spent a whole semester filling it out with
Cbemtk. The original purpose of this ser- interesting theoretical statements and for-
vice was to index compounds by molecular mal analyses of chemicaJ morphology, etc.
formula. So it was natuml for me to want By late 1960 I had made the first succcssfd
to find a way of calculating molecular for- computer run in calculating a molcctdar
mulas in the simplest way possible. Until formula directly from a systematic name.8
that time everyone assumed that it was I had done this manually hundreds of
necessary to dmw a stt-uctuml diagram in times ca.dicr in the yeax.
order to calculate a molccukar formula. As it turned out 1S1 was never able to
Even Aschcr Oplcr, 11 who wrote the pi- finance the research necessary to complete
oneering paper in 1956 on “New Speed to this work. NSF was not very kindly dis-
Structural Scarchcs’ ‘ , assumed this was the posed to us in those days. Wc also were up
case. That is why hc fimt wanted to rcprc- to our cars in the G8rne#s2sCitation index
scnt the compound in a topological matrix project so 1 had to put chemical nomen-
which later was called a connectivity table, clature work on the back bumcr. Wc never
My linguistic studies convinced mc that did input compound names for Cxme~t
the ‘‘ mcaoing” of chemical nomenclature Ahtiats of Chwi.rW (CAC); on the con-
had to include enough information for cal- trary, wc now input Wiswcsser Line Nota-
culating molecular formuhu straight away. tion (WIN) for each compound and that is
Otherwise, how could wc do this so quickly what wc use to compute the molecular
in our heads for simple compounds? I tnld formula. However, the double bond check-
Professor Harris my theory and hc accepted ing routines that wc used for so long were
it as my doctoral thesis, the first in the ncw inchsdcd in my algorithm.
field of chcmico-linguistics. Thanks to the About eight years ago 1 saw the proposal
recognition by Professor Allen Day of cbemidAbstracts made to NSF regarding
Penn’s Chemistry Department that it was a chemical nomenclature translation
nontrivial problcm, the topic was agreed research. Naturally I felt envious that they
upon in the gmduatc school. How~crt be- should get this support when it was clcady
fore 1 could work on my dissertation, I had an opemtional development they needed
to prove my theory worked. If it did not, I more than 1S1. That’s what made it ap-
would have to choose another topic, no plied for thcm and academic for us.
matter how long I spent on the research. However, I was very glad someone was
416
doing this and read with tied feelings complete. Consider that 25% of the ab-
the fmtteports ofthis research in 1967.13 stracts in CA are of Russian material. 18
A recent paper in the Jouwd of Chemical From our extensive citation anal~s we
Documentatios~4 shows chat this work is know that this is absurd in relation to the
finally coming to fruition, and I congmcu- significance of Russian research. They are
late the CA group on their accomplish- polluting the waters of science with a lot
ment. of mediare and unrefereed material. Pro-
Returning to the main point of my es- bably another 10% of CA falls into this
say. Here is a topic of research which has category. No doubt others do it too, but
multi-million dollar economic significance. the data show clearly chat the Russians arc
There are only a few people in the world the worst offenders. Does anyone anywhere
interested in it, so the number of times doubt the superiority of the Jowwd of tbe
this kind of work will be cited is bound to Ametshm Chemud So&e/y over the Zhur-
be small. Clearly it is the kind of thing nd Ohbcbei Kbimiz? How would you
that is less cited than, e.g., papers on compare the abstracts of the ACS meetings
WLN, but there is an important connect- to the abstracts of unpublished papers that
ing thread. Perhaps historians will decide the Russians are now loading into the Rsi.r-
that Opler’s notion of a connectivity table rtin Josmsd of Phyks6 Chemirby. Un-
for chemical compounds has been the most doubtedly it gives the Russians sigti]cant
important concept in this field. 11 Most political leverage to assert they account for
people seem to think that Sussenguth was 25% of CA’s coverage. Maybe they will
the first one to use his concept. 15 But even claim CA should pay them a royalty
clearly none of these chemical information for abstracting without their permission.
milestones has had any major discernible After all, CA abstracts do constitute a sub-
impact outside the field, and that is what stimte for the original Russian material.
the historian seeks and seems to find in There is an important distinction to be
large-scale citation analyses. This again made between unrefereed material appear-
demonstrates that the microstructure of sci- ing in high-priced journals and unrefereed
ence is very different from its macrosrruc- material listed in a depository. Each ab-
ture. stract requites the same space and work.
So much for the history of mechanical But at least someone was willing to pay for
translation of nomenclature. Let me digress that so-called high-priced journal. If libmr-
now to make some observations on the ians are as indiscriminate as they are ac-
future of chemical and scientific publica- cused of being, then why aren’t they buy-
tion. This has been much in the news these ing the original Russian journals and ab-
days, that is, C&E News! Joel Hildebmnd, stracts? I‘m sure that Earl Coleman would
my freshman chemistry professor, has be delighted if libraries bought his tmns-
caused a lot of soul-searching with his re- Iation journals without the slightest evalua-
discoveryof the ancient idea of publication tion. He knows how hard it is to sell the
by abstract. I’ve had some contact with best that the Russians publish. He would
him in recent yearn and I know why he is court disaster to publish” everything with-
making these proposals. Unlike James out regard to quality.
Stemmie who in C&EN16 seems worried h is a rather interesting obwvation that
that some important ideas will be lost to 10% of CA’s budget is about $2 million.
posterity if we adopt any changed systems, If they cut back on Russian material they
Hildebmnd is trying to tell us that the would find the same $2 million they want
system is overloaded with useless informa- the Russians to pay for pirating CA.
tion; he is talking about information pol- At 1S1we have very mixed feelings about
lution on a huge scale. I have rceentlyll CA. On the one hand, wc resent theit high
asserted that the abuse of the page-charge price because a chemistry department is
system may be aggmvating this pollution generally apt to say that it can’t -afTordthe
problem. And I regret to say Chemicaf Science Citatib# Index (SCI) but it must
Abstrads may be qually guilty. CA does buy CA. If for no other reason, it couldn’t
this unwittingly in its hopeless aim to be get ACS accreditation without it. On the
417
.. . . . . . ----
other hand, the higher CA’s price chcckmg rhe Items retmved m the WI!
becomes, the more easily we can convince This is frequently done when people use
buyers that SCI or CAC is a good value. MEDLINEand SCISEARCH,2but obvious-
However, given my choice, I would much ly the inclination to do so is tempered by
rather see CA priced lower. So I have a real the vast differences in per-hour rates.
concern for their cost-effectiveness. In fact, In closing, I will mention miniprint,
given my druthers, 1 would provide for CA which has now come into the limelight. As
a citation index to the chemical literature the cost of paper goes up, CA and 1S1may
that would complement CA searches. The well have to adopt such methods. Whether
combined use of CA and SCI is happening users will accept miniprint more readily
increasingly, but it would be nice if we than microform is hard to determine, but
could accelerate the use of SCI by chemists there is a whole new technology opening
as was suggested by the Hartnay Commit- up now that the ‘‘Oxf6rd English Diction-
tee many years ago. 19 ary” has become so successful in this medi-
The recent paper by Party, Linford, and um. Ralph Shaw and Albert Boni experi-
Rich1 shows a clear trend toward such tom. mented with miniprint long ago. I just re-
plcmcntary use of large data bases, This diwovcrcd it when I was thinking about
will increase as the cost of on-line services ways to cut down on indexing costs. Maybe
declines. it’s still not too late for CA to try it. After
I recently did a search of the CA data all, the most successful publishing vemurc
base using our Permutenw Subject InAx of the past dccadc has been in the mini-
(PSI) to identifi pertinent search terms and print edition of the “Oxford English Dic-
then followed up the output from CA by tionary”.
1. ParryAA, LirrfordRG&RichJ I. Com- 11. Opler A & Norton T R. New speed to
puter Iiteratutc searches;a comparison of the stmctuml scarchcs. Chern. Eng. News
performmcc of two commercial systems in an 34:2812-14, 1956.
interdisciplinary subject. hrf SCi. 8:179-87, 12. Gartield E. An afgontbm for tradating
1974. cbemicd names to molecrdar formrdas.
2. Gar&ld E. 1S1’s SCISEARCH timc- (Philadelphia: Institute for scientific Informa-
shaccd system trades time for money--but arc tion, 1961), 68 pp.
you ready for this? Crmwr/ Cofimr@
13. VanderStouw G G, Naznirsky I &
(C@ ) No. 40,4 October 1972, p. 5-6.
RushJ E. Proccdurcs for converting systematic
3. Watson J D & Crick F H C. A structure
names of organic compounds into atom-bond
for dcoxyribosc nucleic acid. Nuture 171:737,
connection tables. J. C&em. Dec. 7: 165.69,
1953.
1967.
4. Lwvty O. Personal communication to
D.J.D. Price, quoted in: Garficld E. Citation 14. VanderStouw G G, Elliott P M &
frequency as a measure of research activity and Iaenberg A C. Automatic conversion of chem-
performzncc. CC No. 5, 31 Jan’1973, p. 5-7. ical substance names to atom-bond connec-
tion tables. J C~esn. Dec. 14: 185-93, 1974.
5. Garfield E, Sher 1 H & Torpie R J. T4e
use of citation data in wnh”rng the hidosy of 15. Susenguth, E H. Gmph theoretic algor-
science. (Philadelphia: Institute for Scientific ithm for matching chemical structures. J,
Information, 1964), 86 pp. Cbem. Dec. 5:36-43, 1965.
6. Cole J R & Cole S. The Ottcga hypo- 16. StesnsnleJT. Control of scientific papers
thesis. Science 178:368-75, 1972, Cbem. Eng. New~ 53:33-34, 1975.
7. Ortega y Gamer J. The revo/t of /be 17. Garfield E. Page charges; for profit and
mmse$. (NCWYork: Norton, 1932), p. 84-85. non-profit journals; and freedom of the scien-
8. Garfield E. Chemico-linguistics; compu-
tific press. CC No. 7, 17 February 1975, p.
ter translation of chcmicai nomenclature.
5-7.
N@ure 192:192, 1961.
18. Baker D. World’s chemical literature con-
9. Garfield E. Proposal for research in me-
chanical indexing. Unpublished manuscript, tinues to expand. CAem. Eng. News
49: 37-40, 1971.
1956.
10. Sager N. Syntactic formatting of scicncc 19. Anonymous. ACS report mtcs informa-
information. AFIPS Conf Pmt. 41: 791-800, tion systcm cfficicnc y, CAem. E#g. Nesw 47:
1972. 45-46, 1969.
418
Get documents about "