Anda di halaman 1dari 17

Scientometrics (2010) 85:111–127

DOI 10.1007/s11192-010-0252-2

Trends in research foci in life science fields over the last


30 years monitored by emerging topics

Ryosuke L. Ohniwa • Aiko Hibino • Kunio Takeyasu

Received: 21 October 2009 / Published online: 11 June 2010


Ó Akadémiai Kiadó, Budapest, Hungary 2010

Abstract We report here a simple method to identify the ‘emerging topics’ in life sci-
ences. First, the keywords selected from MeSH terms on PubMed by filtering the terms
based on their increment rate of the appearance, and, then, were sorted into groups dealing
with the same topics by ‘co-word’ analysis. These topics were defined as ‘emerging
topics’. The survey of the emerging keywords with high increment rates of appearance
between 1972 to 2006 showed that emerging topics changed dramatically year by year, and
that the major shift of the topics occurred in the late 90s; the topics that cover technical and
conceptual aspects in molecular biology to the more systematic ‘-omics’-related
and nanoscience-related aspects. We further investigated trends in emerging topics within
various sub-fields in the life sciences.

Keywords Trends in life science  Emerging topics  MeSH terms 


PubMed  Co-word analysis

Introduction

Millions of research topics, including technologies, methodologies and scientific concepts,


are currently being studied throughout the world, and new topics are emerging, maturing,

Electronic supplementary material The online version of this article (doi:10.1007/s11192-010-0252-2)


contains supplementary material, which is available to authorized users.

R. L. Ohniwa (&)
Institute of Basic Medical Sciences, Graduate School of Comprehensive Human Sciences,
University of Tsukuba, Tennoh-dai, Tsukuba 305-8575, Japan
e-mail: ohniwa@md.tsukuba.ac.jp

A. Hibino
Japan Society for the Promotion of Science, Interdisciplinary Cultural Studies, Graduate School of Arts
and Science, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan

K. Takeyasu
Laboratory of Plasma Membrane and Nuclear Signaling, Graduate School of Biostudies,
Kyoto University, Yoshidahonmachi, Sakyo-ku, Kyoto 606-8501, Japan

123
112 R. L. Ohniwa et al.

converging and fading out every day. In this constantly changing research environment, it
is valuable to survey research trends in order to identify new issues as they emerge. This
critical information can help scientists to decide the direction of their future research and
allows governments and agencies to implement policies quickly and distribute research
funds more efficiently. In particular, it is important to identify research topics with a high
potential to expand and develop in the near future; such research topics shall be called as
‘emerging topics’ when they are identified at an early stage of development.
Two serious obstacles preventing the identification of emerging topics are the vast size
of scientific communities and the huge amount of research produced by each community—
over 800,000 life science articles were published in 2008. Thus, it is difficult to follow all
the information about research trends, and it is not practical to manually identify important
and emerging topics and judge whether each topic is emergent. To fill this gap, a simple
method to identify emerging topics would be beneficial.
Metrical approaches, first developed in the field of scientometrics, have provided useful
scoring systems that permit the identification of trends in given research areas—e.g.,
number of researchers, amount of grants, number of articles, number of citations. Among
the various metrical methods, content analysis of original articles can link quantitative
studies (e.g., number of papers) to qualitative studies (e.g., content of papers) (Callon et al.
1983, 1986; Leydesdorff 1995), and the keywords attached to the articles are often used for
content analysis. The number of articles containing certain keywords is a measure of the
research activity related to those keywords, and transitions in the frequency of particular
keywords can be regarded as indicators of research trends (Callon et al. 1986).
Since a single keyword represent only one aspect of research topics, a set of the
keywords are required to point out the precise view of the research topics (Fig. 1).
Research topic can be regarded as ‘‘an issue of a coherent set of subject-related research
problems, concepts and methods upon which attention is focused by scientific research-
ers,’’ irrespective the social and intellectual background of the researchers involved. This
concept is based on the notion of scientific specialty by Braam et al. (1991). Thus, the
‘topics’ can be regarded as the assembly of individual keywords.

Fig. 1 From keywords to


research trends Research Research Elements
trends topics (Keywords)
Materials
(chemicals,
organisms,
anatomy,
Concepts etc.)

Phenomena
and
Processes
Overall trends Problems

Diseases

Methods Techniques

Devices

123
Trends in life sciences 113

Several studies have developed methodologies for investigating emerging research


fronts. Chen (2006) and Braam et al. (1991) examined to identify emerging research foci
by the co-citation based approach. Small (2006) and Upham and Small (2010) also pro-
posed the method for identifying emerging research fronts based on the analysis of co-
citation among papers. Although the method makes it possible to clarify the dynamic
activities at the research fronts, which are defined as the clusters of ‘‘core papers’’, it is
difficult to identify the ‘emerging’ research front before the papers have become highly
cited enough to be listed as ‘‘core papers’’.
On the contrary to the analysis based on co-citation, the ‘co-word’ based analysis has an
advantage of direct investigation of the contents of research topic. Lee (2008) provided a
method to measure the latest research trends in scientific and technological documents
using ‘co-word analysis’. Ding et al. (2001), also using co-word analysis, reported changes
in a thematic domain in the field of information retrieval (IR) during the period between
1987 and 1997. These proposed approaches are quite interesting and useful for identifying
newly developing research topics. However, as Lee (2008) indicated, the co-word analysis
approach is limited due to its lack of validity for the investigation of objective keywords
(‘hub’ words in the Lee’s case). Currently, the procedure to select the objective keywords
from the documents depends highly on each analyst, making it difficult to avoid bias. Thus,
an objective criterion for the selection of keywords that represent emerging scientific
activities is required.
The increment rate of published articles with particular keywords may become such
criterion, since it is reasonable to assume that the number of scientific articles related to a
field increases dramatically after the field begins to receive more attention (Price 1963).
Previous studies tried to identify emerging topics by analyzing the changes in the number
of related articles (Noyons et al. 1999; Tseng et al. 2009). These studies showed that the
increment rate was a good bench mark to evaluate emergence of keywords. However, they
still have a methodological limitation mentioned above; they first by themselves selected
the set of keywords for their analyses.
To eliminate the bias of the analysts, we should first measure the increment rates of
‘‘all’’ the keywords handled in a literature database, and select the emerging keywords
based on the increment rates. Keywords selected in this manner, if identified while their
total number of articles is still relatively low, can therefore be regarded as ‘emerging
keywords’ as elements of ‘emerging topics’ (Fig. 1). In addition, analysis at the level of
individual topics represented by the assembly of emerging keywords enables to clarify the
overall trends in scientific activities without missing potentially valuable information of
emerging topics.
In the present study, we first established the simple method to identify ‘emerging
keywords’ in the field of life sciences, one of the largest fields in modern science. Followed
by the selection of the ‘emerging keywords’, we investigated ‘emerging topics’ by the ‘co-
word analysis’. The survey of ‘emerging topics’ over the past 30 years demonstrated how
and when ‘emerging topics’ turned over. Last, we further investigated sub-field-specific
movements within life sciences by analyzing the various journal communities. For these
analyses, we applied perspective factor (PF), which we have previously proposed to
evaluate journal quality (Ohniwa et al. 2004). Because PF estimates the number of
emerging keywords in a journal, it can therefore be applicable to grasp the fluxes and
refluxes of the abundance of emerging topics in the sub-fields which journal communities
belong to.

123
114 R. L. Ohniwa et al.

Results and discussion

Extraction of emerging keywords

PubMed (MEDLINE) database and MeSH (Medical Subject heading) terms are suitable
sources for the extraction of emerging keywords. PubMed, a literature database run by the
National Library of Medicine (NLM), contains about 10 million articles (Fig. 2) and is
large enough to cover almost all fields of life sciences. MeSH is a popular keyword
thesaurus developed by NLM and is typically used on PubMed to support literature
searches (Schulman 2000). Because over 10 terms representing each article’s content are
assigned and controlled by professionals educated by NLM, we can assume that the use of
terms remains consistent among authors.
There are two types of terms not appropriate to be used as emerging keywords. First,
MeSH involves not a few keywords which represent the style or the background of articles
rather than its research contents. For instance, many articles contain ‘English Abstracts’
and/or ‘Research Support, U.S. Govn’t P. H. S.’. These words represent the style of the
articles and foundation to support the activities of the authors. Second, quite general terms,
such as ‘Animals’ and ‘Proteins’, are involved in MeSH. MeSH has hierarchical structure
of the term construction, and such general terms are located in the top of the hierarchy. To
avoid the contamination of these terms in the investigation of emerging keywords, we
excluded all the MeSH terms classified in four trees of hierarchy irrelevant to research
topics. In addition, we excluded the terms involved in the top three levels of the MeSH
hierarchical tree (see details in ‘‘Materials and methods’’ section). Through this operation,
193,234 terms were selected from total 230,100 MeSH terms from 1971 to 2008. The
residual terms were categorized into various trees like Materials, Phenomenon, Diseases,
Techniques, devices etc. suitable for the investigation of research topics (Fig. 1 and
‘‘Materials and methods’’ section). We used these selected terms for the following
analyses.
We defined ‘emerging keywords in a certain year’ as MeSH terms whose increment of
appearance frequency is included in top 5% out of the all terms in a given year. This
increment is calculated as the ratio of mesh term frequency in years 3 and 4 divided by its
frequency in years 1, 2, 3 and 4, in which year 2 is the target year and years 1, 3, 4 are four
consecutive years before and after year 2 (see ‘‘Materials and methods’’ section). This
operational definition enabled us to neglect terms that already had a high frequency of
appearance and those with an extremely low frequency, which have little development
potential and are likely to simply fade away. Thus, the emerging keywords extracted were

Fig. 2 The number of articles 900,000 14


and MeSH terms on PubMed 800,000 12
700,000
10
PF value

600,000
500,000 8
400,000 6
300,000
4
200,000 Articles
100,000 2
MeSH terms per an article
0 0
1971

1976

1981

1986

1991

1996

2001

2006

year

123
Trends in life sciences 115

MeSH terms with relatively low frequencies of appearance and high increment rates; i.e., a
high potential to develop in the future. We finally selected totally 63,798 kinds of emerging
keywords (N = 4,808,377) from 1972 to 2006 (Supplemental Table C).
The characteristics of the 20 most frequently appeared emerging keywords every
5 years are summarized in Table 1. We found the transition in keywords, rather than
consistency. While 19 keywords were listed in two consecutive periods, the other 102
keywords appeared in only one period, suggesting that ‘emerging topics’ implicated by
these emerging keywords have been dynamically tuned over in the past 30 years.

Dynamics of emerging research foci: transition of emerging topics

As stated in Introduction, a topic should be regarded as the assembly of keywords,


although some keywords are themselves sufficient to indicate the topics (Fig. 1). Since a
set of the keywords used in an article should form certain topics together, the ‘co-word
analysis’, which estimates the frequency of co-appearance of the keywords in an article and
visualizes the connection of the keywords, is available to elucidate the topics implicated by
the keywords. The visualization of co-appearances of the top 20 emerging keywords
showed that 2–5 clusters were formed as the networks of the keywords, and that the
number of keywords and links in the clusters was different in a period-specific manner
(Table 2). In the case of 1982–1986, 2 clusters were formed by 11 keywords with 42 links.
Both clusters exhibited spider web like networks (clustering coefficient C = 0.42), sug-
gesting two centered emerging topics are present in 1982–1986. The similar clusters were
also found in 2002–2006. In contrast, there were 5 clusters in 1977–1981, but all the
clusters were composed of only 2 or 3 keywords without closed links (C = 0). All the
clusters in 1997–2001 also exhibited the same types of networks, suggesting that there was
no centered emerging topic in these periods. The residual periods included both spider web
like and not-closed networks. It would be noteworthy that the clusters with spider web like
network appeared just after the period when such clusters disappeared, implying that the
idiomatic phrase, ‘‘the darkest hour is always just before the dawn’’, is applicable in
yielding the centered emerging topics.
To make the topics represented by the clusters more vivid, we arbitrarily put topic
names on the clusters according to the keywords involved in them (Fig. 3). For instance, in
the case of 2002–2006, there are 2 clusters that contain 7 and 4 keywords, respectively. In
the smaller group, ‘Nanotechnology’ held the most number of links, and its related terms
surrounded. Thus, we named it as ‘Nano-scale analyses’. The lager group is composed of
the keywords related to computer analyses like ‘Algorithm’, ‘Computational Biology’, etc.
and the keywords related to large scale analyses like ‘Gene Expression Profiling’, ‘Oli-
gonucleotide Array Sequence Analysis’. Therefore, we put the name of ‘Large-scale
analyses with computational techniques’. When we focused on the centered emerging
topics in the last 30 years, the big shift appeared in the late 90s; from ‘Identification and
manipulation of genes and proteins’ to ‘Large-scale analyses with computational tech-
niques’ and ‘Nano-scale analyses’.
It is worth to emphasize that a certain single emerging keyword itself well represents the
activity in a scientific topic in an early stage of its development. Indeed, the emerging
keywords extracted by our method can clarify the innovative topics before awarded by
Nobel Prizes or on time. For instance, ‘Apoptosis’, which represents the research topics of
programming cell death, lodged as an emerging keyword in the top 20 in 1992–2001, and,
in 2002, Sydney Brenner was awarded with the Nobel Prize in Physiology or Medicine for
this achievement of research on apoptosis. ‘RNA, Small Interfering’, representing the

123
116 R. L. Ohniwa et al.

Table 1 The 20 most frequently


1972–1976 Num
used emerging keywords
Binding Sites 4,948
Drug Evaluation 4,639
T-Lymphocytes 4,571
Protein Conformation 4,009
Histocompatibility Antigens 3,804
Lymphocyte Activation 3,164
Ultrasonography 3,160
Carbon Radioisotopes 3,144
Prostaglandins 3,108
Prolactin 3,108
Muscle Contraction 3,099
Syndrome 2,748
Lymphocytes 2,486
Molecular Conformation 2,480
Receptors, Cell Surface 2,343
Tomography, X-Ray 2,201
Mice, Inbred Strains 2,083
Decision Making 2,032
Drug Therapy, Combination 1,996
Structure-Activity Relationship 1,963
Others 3,52,920

1977–1981 Num

Rats, Inbred Strains 7,830


Tomography, X-Ray Computed 7,028
Mice, Inbred BALB C 4,394
Chromatography, High Pressure Liquid 4,145
Membrane Proteins 3,825
DNA Restriction Enzymes 3,570
Aging 3,544
Immunoenzyme Techniques 3,429
Microscopy, Electron 3,195
Electrophoresis, Polyacrylamide Gel 2,844
Reference Values 2,381
Mice, Inbred C57BL 2,331
Antigen-Antibody Complex 2,282
Mice, Inbred Strains 2,273
Ultrasonography 2,151
Receptors, Cell Surface 2,122
DNA, Recombinant 1,921
Peptide Fragments 1,745
Protein Kinases 1,711
Naloxone 1,672
Others 3,41,961

123
Trends in life sciences 117

Table 1 continued
1982–1986 Num

Rats, Inbred Strains 14,118


Antibodies, Monoclonal 8,626
Cloning, Molecular 7,776
Nucleic Acid Hybridization 6,363
Acquired Immunodeficiency Syndrome 6,296
RNA, Messenger 6,088
Immunoenzyme Techniques 5,328
Genes 5,142
Enzyme-Linked Immunosorbent Assay 4,800
Magnetic Resonance Spectroscopy 4,691
Transcription, Genetic 4,159
Chromatography, High Pressure Liquid 4,050
Genes, Bacterial 3,570
Interleukin-2 3,520
DNA Restriction Enzymes 3,054
Ion Channels 2,980
Reference Values 2,866
Antineoplastic Combined Chemotherapy Protocols 2,629
Immunochemistry 2,500
Cyclosporins 2,416
Others 4,05,347

1987–1991 Num

Recombinant Proteins 13,725


Polymerase Chain Reaction 7,390
RNA, Messenger 6,419
DNA-Binding Proteins 5,511
Sequence Homology, Nucleic Acid 5,319
Blotting, Western 4,935
Chromosome Mapping 4,822
Restriction Mapping 4,611
Tumor Cells, Cultured 4,610
DNA Probes 4,594
Signal Transduction 4,402
Blotting, Northern 4,339
Molecular Structure 4,300
Acquired Immunodeficiency Syndrome 3,977
Tumor Necrosis Factor-alpha 3,913
Proto-Oncogene Proteins 3,874
Transfection 3,780
Magnetic Resonance Imaging 3,705
Substrate Specificity 3,198
Cloning, Molecular 3,167
Others 5,10,916

123
118 R. L. Ohniwa et al.

Table 1 continued
1992–1996 Num

Rats, Sprague-Dawley 15,545


DNA-Binding Proteins 8,882
Apoptosis 7,552
Enzyme Inhibitors 7,368
DNA Primers 7,235
Protein Binding 7,003
Oligodeoxyribonucleotides 6,906
Sequence Homology, Amino Acid 6,808
Anti-Bacterial Agents 6,774
Polymerase Chain Reaction 6,531
Mice, Transgenic 6,103
DNA, Complementary 5,665
Antineoplastic Agents 5,094
Antiviral Agents 4,977
DNA Probes 4,695
Rats, Wistar 4,502
Recombinant Fusion Proteins 4,098
Cell Death 4,025
Gene Expression Regulation, Developmental 3,785
Trans-Activators 3,663
Others 6,23,105

1997–2001 Num

Severity of Illness Index 15,849


Apoptosis 10,099
Phylogeny 7,208
Reverse Transcriptase Polymerase Chain Reaction 6,680
Protein Binding 6,447
Protein Structure, Tertiary 6,373
Mice, Knockout 5,807
Algorithms 5,345
Evidence-Based Medicine 5,138
Environmental Monitoring 5,100
Green Fluorescent Proteins 4,851
Mitogen-Activated Protein Kinases 4,692
Catalysis 4,481
Biotechnology 4,232
Stereoisomerism 4,159
Oxidative Stress 4,009
Mice, Transgenic 3,993
Blotting, Western 3,938
Environmental Exposure, Nuclear Magnetic Resonance 3,904
Biomolecular 3,807
Others 7,98,489

123
Trends in life sciences 119

Table 1 continued
2002–2006 Num

Computer Simulation 22,966


Algorithms 20,145
Surface Properties 14,329
Models, Chemical 12,040
Cell Line 11,203
Polymorphism, Single Nucleotide 11,132
RNA, Small Interfering 7,675
Ecosystem 7,634
Particle Size 6,826
Insulin Resistance 6,636
Water Pollutants, Chemical 6,597
Behavior, Animal 6,344
Imaging, Three-Dimensional 6,112
Oligonucleotide Array Sequence Analysis 6,073
Gene Expression Profiling 6,040
Microscopy, Electron, Transmission 5,364
Nanotechnology 5,286
Computational Biology 5,115
Phylogeny 4,938
Spectrometry, Mass, Electrospray Ionization 4,498
Others 1,014,233

Table 2 The number of clusters,


Period Clusters Edges Nodes Cluster
edges, nodes and cluster
in clusters coefficient
coefficient of the top 20 most
frequently used emerging
keywords 1972–1976 3 24 13 0.23
1977–1981 5 10 8 0.00
1982–1986 2 42 11 0.42
1987–1991 2 34 14 0.16
1992–1996 3 30 14 0.11
1997–2001 5 14 12 0.00
2002–2006 2 30 11 0.34

research target of the regulation of translation and the techniques for gene silencing, was
listed up in the top 20 emerging keywords in 2002–2006, and was valued in 2002. Thus, it
is possible to pick up the topics later highly valued as innovative topics by the analysis of
emerging keywords themselves.

Re-activation of studies on ancient research topics

It might be reasonable to assume that novel keywords are easily identified as emerging
keywords. To investigate the novelty of the emerging topics, the year when each emerging
keyword first appeared on PubMed was investigated using the NLM MeSH browser (see
‘‘Materials and methods’’ section). A comparison between the year of the first appearance

123
120 R. L. Ohniwa et al.

Fig. 3 The networks of the top 20 emerging keywords. Only keywords which obtained links with other
keywords were shown. The labels on the clusters represent the names of topics we added. Technical terms in
the label represent as following; ‘‘Lymphocyte’’: a type of white blood cell in the vertebrate immune system,
‘Oncogene’: genes related to cancer, ‘Ontogenesis’: the biological process that causes an organism to
develop its shape

123
Trends in life sciences 121

Frequency of the term


120000

100000
80000
60000

s
40000

term
2000
20000

g
1993

rgin
0 1986

eme
1979
upto1970
1973
1976
1979
1982
1985
1988
1972

1991
1994
1997

the
2000
2003
2006

r as
yea
Initial year of the appearance

Fig. 4 Initial year of emerging topics on PubMed. The year when a particular MeSH term first appeared on
PubMed (initial year) was compared with the year when a MeSH term was identified as an emerging
keyword (emerging year). The X, Y, and Z axes represent the initial year, emerging year and frequency of
emerging keywords, respectively. The solid ellipse indicates emerging keywords for which the initial and
emerging years are identical. The dashed ellipse indicates emerging keywords with initial years before 1970

(initial year) and the year logged as the emerging keywords (emerging year) revealed that
the initial year of many keywords was indeed identical to the emerging year (Fig. 4, solid
ellipse). These types of keywords increased in the early-80s and gradually decreased from
the mid-90s. The increasing period is well consistent with the period when identification
and manipulation techniques for genes and proteins were beginning to emerge. Thus, the
development of molecular biology seems to have yielded plenty of novel results.
Another interesting finding is that the majority of keywords initially appeared before the
70s and was identified as emerging keywords later (Fig. 4, dashed ellipse). For instance,
‘Histones’, which are essential proteins that pack genomic DNA in eukaryotic cells, first
appeared as a MeSH term in 1963 and later re-emerged in 2001–2003. Additional
examples include ‘Phylogeny’ (appeared in 1967), ‘Microscopy, Fluorescence’ (1963) and
‘Models, Molecular’ (1967), all of which are listed among the top 50 emerging keywords
in the late-90s. The number of such emerging keywords has dramatically increased since
the mid-90s. This trend is likely a result of the sufficient development of techniques to
solve various old but fundamental issues that could not be addressed before the 90s. In
recent years, the rediscovery and refocusing of old research topics seem to promote
important breakthroughs for the development of life sciences.

Emerging keywords as an index for surveying historical dynamics in sub-fields of life


science researches

The historical dynamics in sub-fields of life sciences like molecular biology, microbiology,
physiology etc. is an interesting issue to elucidate the trends of life science researches in
more detail. The number of emerging keywords in a particular category is a good
benchmark for the evaluation of the activity of the fields. Here, we focused on the journal
communities, because the journal communities can be regarded as the unit in which the
boundaries of professional discipline are established through the knowledge production
(Fujigaki 1998). Therefore, the analysis of journal communities will illustrate the trends in
the sub-fields. Here, we introduced the index, named PF (Perspective Factor), which we

123
122 R. L. Ohniwa et al.

A B

C D

E F

G H

Fig. 5 PF of journals in specific fields. a Biochemistry, molecular biology and cell biology, b genetics,
c microbiology, d development, e immunology, f bioinformatics, g neuroscience and h physiology

have previously proposed as an index to evaluate journal quality (Ohniwa et al. 2004,
2007), to the measurement of the abundance of emerging keywords in particular scientific
communities. The PF value of a given community c in year y, PFc in y, is defined as
follows:
PFc in y ¼ Ac in y =Bc in y

123
Trends in life sciences 123

Table 3 Top 5 PF journals for the past 3 years


Journals PF

2006
Small (Weinheim an der Bergstrasse, Germany) 3.31
Journal of nanoscience and nanotechnology 3.05
Nano letters 1.94
Nanomedicine (London, England) 1.92
Conservation biology: the journal of the Society for Conservation Biology 1.91
2005
Journal of nanoscience and nanotechnology 3.90
Journal of microencapsulation 2.89
Nano letters 2.24
Journal of contaminant hydrology 2.15
Water research 2.05
2004
IEEE transactions on pattern analysis and machine intelligence 5.06
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society 3.57
Journal of nanoscience and nanotechnology 3.29
IEEE computer graphics and applications 2.86
IEEE transactions on medical imaging 2.67

PF values of all journals on PubMed were calculated between 2004 and 2006, and the 5 journals with the
best PF values were selected

Fig. 6 PF values of Nature, 2


Science and PNAS. Analyzed science
articles were restricted to life PNAS
1.5
science issues (see ‘‘Materials
nature
and methods’’ section)
PF value

0.5

0
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006

year

where Ac in y is the number of emerging keywords handled in community c in year y, and


Bc in y is the number of articles published by community c in year y. A historical survey of
PF can clarify the period when the community yielded the plenty of the emerging
keywords.
To clarify field-specific trends, we surveyed PF values of journals gaining high Impact
Factor, assuming that such top-level journals represent active biological sub-fields (Fig. 5).
In the fields of molecular biology, biochemistry and cell biology (MBC), the PF values of
various journals increased in early-80s and decreased after the mid-90s. The PF values of
journals related to genetics and microbiology increased in 80s and decreased thereafter. PF

123
124 R. L. Ohniwa et al.

Fig. 7 Summary of trends in life sciences revealed by emerging keywords

values of journals related to development and immunology decreased in the late-90s, while
those of journals related to the field of bioinformatics reached a peak in 2003. PF values of
journals related to physiology and neuroscience remained relatively low throughout the
period studied. The present data thus illustrate sub-field-specific fluxes and refluxes in
various research topics over the last 30 years.
Now, we focus on the two interesting issues; (i) the identification of the fields that
currently contain many emerging topics and (ii) the PF trends in the leading journals of
entire life science. For the issue (i), we selected journals with PF values in the top 5 from
2004 to 2006 (Table 3), and found that journals handling nanoscience constantly ranked
high. This is well consistent with the results of emerging topic in 2002–2006 (Fig. 3). For
the issue (ii), we focused on the journals, Nature, Science and Proceedings of National
Academy of Science in USA (PNAS). They cover almost all sub-categories of life sciences
and are believed to lead worldwide life science research, and the impact factors of these
journals are quite high. The annual transition of PF values for these journals between 1973
and 2006 demonstrated that the absolute PF values reached a peak until 1997 and
decreased thereafter (Fig. 6).

Concluding remarks

In this study, we proposed a simple method for the identification of emerging keywords,
and reported the successful application to investigate the historical dynamics of life science
researches. The novelty of the method proposed here is that the emerging keywords should
be selected based on the increment rates of the keywords after calculating the rates of ‘‘all’’
the keywords handled in a literature database. The keywords selected in this manner should
be free from the bias of the analysts.
Our historical survey of the emerging topics suggests two stages in the development of
life science research over the past 30 years: a ‘progressive stage’ from the 80s to the 90s
and a ‘re-evaluation stage’ from the late-90s to the present (Fig. 7). During the progressive
stage, various novel topics implicated by emerging keywords were constantly and con-
tinuously discovered (Fig. 4). The analysis showed that ‘progressive stage’ was likely to be

123
Trends in life sciences 125

facilitated by the development of techniques of identification and manipulation of genes


and proteins (Fig. 3). Indeed, the PF values showed that molecular biology and related
fields were activated in this period (Fig. 5). In the ‘re-evaluation stage’, many old topics
were refocused as the emerging topics (Fig. 4). During the ‘re-evaluation stage’, which
overlapped with the ‘post-genomic era’, various techniques such as computational tech-
niques, large-scale analyses and nano-scale analyses were developed (Fig. 3). These novel
techniques may enable the old unsolved research topics to be re-focused and re-studied,
and are expected to promote the drastic change towards the future developments of life
sciences.
The advantage of our operation is that the principle of extracting emerging keywords is
quite simple and objective enough to exclude maneuvers or intentions. The selected
keywords with the ‘co-word’ analysis detect active topics or fields as well as the history of
particular topics.

Materials and methods

Datasets

We collected the MeSH terms attached to articles published between 1971 and 2008 from
PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed) on April 2009. A total of
13,954,189 articles were included in the analysis. To identify the set of the MeSH terms
attached to each article, we extracted the terms tagged as \NameOfSubstance[ and
\DescriptorName[ from the XML style data, and then eliminated overlaps in the terms
for each article. Then, to eliminate the terms not concerned to research topics, we excluded
the all MeSH terms under the hierarchies of ‘‘[M] Named Groups’’, ‘‘[N] Health Groups’’,
‘‘[V] Publication Characteristics’’, ‘‘[Z] Geographicals’’ and ‘‘[L01.453] Information
Services’’, ‘‘[L01.178] Communications Media’’, ‘‘[L01.143] Communication’’, ‘‘[L01.346]
Information Centers’’ and ‘‘[L01.737] Publishing’’. We further excluded the terms occu-
pying the top 1st, 2nd and 3rd hierarchies of MeSH tree construction to avoid the con-
tamination of quite general terms not suitable to clarify the topics at research level.

Emerging MeSH keywords

Among MeSH terms, we defined emerging keywords as follows. First, the increment rate
(I) of MeSH term a in year b was calculated as:
Ia in b ¼ Xa in b =Ya in b
where Xa in b is the total number of appearances of MeSH term a on PubMed in years
b ? 1 and b ? 2, and Ya in b is the total number of appearances of MeSH term a in years
b - 1, b, b ? 1 and b ? 2. The terms ranked in the top 5% of Ia in b in year b were
defined as emerging keywords.
As an example, the distribution of Ia in 2000 is shown in Supplemental Figure A. In 2000,
521,392 articles were published including a total of 6,118,069 MeSH terms (56,234 types).
For each MeSH term, we calculated Xa in 2000/Ya in 2000, where Xa in 2000 is the number of
articles published in 2001 and 2002 containing the MeSH term a, and Ya in 2000 is the
number of articles published in 1999, 2000, 2001 and 2002 containing the MeSH term a.
The MeSH terms meeting the criterion ‘Xa in 2000/Ya in 2000 [ 0.577’ were ranked in the top

123
126 R. L. Ohniwa et al.

5%. The averages of increasing rates of the extracted emerging keywords have remained
almost constant for the past 30 years (Supplemental Figure B).
The definition of emerging MeSH terms proposed here is similar to the Thomson
Reuters’ ‘‘fast breaking papers (http://sciencewatch.com/dr/fbp/)’’ and ‘‘fast moving fronts
(http://sciencewatch.com/dr/fmf/)’’, in which they calculate the percentage increase in
citations and the percentage increase in number of core papers respectively. These methods
were discussed in Small (2006) and Upham and Small (2010). As mentioned in the
Introduction section, our approach focused on the keyword rather than citation so that
investigate the contents of research topic deeply.

Co-word analysis with emerging keywords

The top 20 most frequently appeared emerging keywords were collected in each period,
and were examined whether they were co-appeared on the same articles. The co-appear-
ance of the keywords was visualized by Pajek package (Batageji and Mrvar 2002). To
eliminate the weak relation among keywords, the threshold for making edges was set as 5%
of the number of keywords linked by the edges.

Initial year of MeSH term appearance on PubMed

When the topics represented by MeSH terms appear or are rearranged on PubMed, the
NLM MeSH browser saves this information in a History Note for MeSH terms, tagged
as \NameOfSubstance[, or in the Date of Entry for Supplementary Concept Records,
tagged as \DescriptorName[. The earliest year listed in either the History Note or Date of
Entry for a particular topic was considered to be that topic’s initial year.

References

Batageji, V., & Mrvar, A. (2002). Pajek—analysis and visualization of large networks. Lecture Notes in
Computer Science, 2265, 477–478.
Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991). Mapping of science by combined co-citation and
word analysis. I. Structural aspects. Journal of the American Society for Information Science, 42(4),
233–251.
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks—
an introduction to co-word analysis. Social Science Information Sur Les Sciences Sociales, 22,
191–235.
Callon, M., Law, J., & Rip, A. (1986). Mapping the dynamics of science and technology-sociology of
science in the real world. London: The MacMillian Press.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific
literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research
by using co-word analysis. Information Processing and Management, 37, 817–842.
Fujigaki, Y. (1998). Filling the gap between discussions on science and scientists’ everyday activities:
Applying the autopoiesis system theory to scientific knowledge. Social Science Information, 37(1),
5–22.
Lee, W. H. (2008). How to identify emerging research fields using scientometrics: An example in the field of
Information Security. Scientometrics, 76, 503–525.
Leydesdorff, L. (1995). The challenge of scientometrics. Leiden, The Netherlands: DSWO Press, Leiden
University.
Noyons, E., Moed, H., & van Raan, A. F. J. (1999). Integrating research performance analysis and science
mapping. Scientometrics, 46(3), 591–604.

123
Trends in life sciences 127

Ohniwa, R. L., Denawa, M., Kudo, M., Nakamura, K., & Takeyasu, K. (2004). Perspective factor a novel
indicator for the assessment of journal quality. Research Evaluation, 13, 175–180.
Ohniwa, R. L., Hibino, A., & Takeyasu, K. (2007). Perspective factor; past, present and future of life sciences.
Proceedings of international society for scientometrics and informatics 2007, II, pp. 908–909.
Price, D. J. D. (1963). Little science, big science. New York: Columbia University Press.
Schulman, J. (2000). Using medical subject headings (MeSH) to examine patterns in American medicine-
preliminary consideration of vocabulary change as a metric. http://www.nlm.nih.gov/mesh/
patterns.html.
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.
Tseng, Y. H., Lin, Y. I., Lee, Y. Y., Hung, W. C., & Lee, C. H. (2009). A comparison of methods for
detecting hot topics. Scientometrics, 81(1), 73–90.
Upham, S. P., & Small, H. (2010). Emerging research fronts in science and technology: patterns of new
knowledge development. Scientometrics, 83, 15–38.

123

Anda mungkin juga menyukai