Anda di halaman 1dari 6

World Patent Information 29 (2007) 2025

www.elsevier.com/locate/worpatin

Subject analysis and search strategies Has the searcher become


the bottleneck in the search process?
Evert Nijhof
ASML Netherlands B.V., P.O. Box 324, 5500 AH, Veldhoven, The Netherlands

This article has been based on the authors presentation at the International Patent Information Conference, IPI-ConfEx,
in March 2006, in Athens, Greece.

Abstract
Many searches are not successful because the searcher either fails to properly identify the subject at hand - and consequently starts
searching for the wrong documents or that the searcher makes a fatal mistake while combining keywords and/or classes during the
search itself. This can be avoided by taking a structured approach to searching: (1) Distinguish between analysis of the technical subject,
identication of selection criteria and the selection and use of search terms. (2) When analyzing a technical subject, some guidance may
be provided by recognizing that the problem is often dened as a cause and eect relationship. The solution can be seen as combination
of action and subject, the subject being used to perform the action, and the action being directed at the cause. (3) While searching, a
major pitfall is to try to come up with complete sets of keywords and classes for each essential feature and then to combine them for
searching. The resultant queries fail to recognize that each search term is useful to a dierent extent. The article has particular relevance
to searches in the patent eld.
 2006 Elsevier Ltd. All rights reserved.
Keywords: Subject analysis; Problem; Solution; Cause; Eect; Action; Subject; Search strategies; Queries; Precision; Recall; Hits; Patent searching

1. Introduction
During the past ve to ten years we have seen tremendous developments to improve coverage and consistency
of databases and to enhance the quality of search platforms, both by patent-oces and by vendors of patent
information. Much work remains to be done. Poor indexing and abstracting still can defeat a searcher. Nevertheless, these developments have reached a point that for
many subject areas not the quality of the tools or the comprehensiveness of databases, but the skills of the searcher
just may have become the bottleneck in the search process.
In the past, searchers had access to only a limited number
of sources of information. As a result, the quality of
searches was limited: One could not blame the searcher
for not nding information that he (or she) did not have
E-mail address: evert.nijhof@asml.com
0172-2190/$ - see front matter  2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.wpi.2006.07.013

access to in the rst place. Today however, the searcher


has access to almost any information in the world from
behind his desktop. The challenge for the searcher has
become to actually retrieve relevant publications from databases that he (or she) does have access to. That not all
searches are successful is perhaps not remarkable when
we take a closer look at all the opportunities the searcher
has to go astray:

Search Request Intake.


Analysis of Technical Subject.
Identication of Selection Criteria Essential Features.
Search Terms Keywords, Classes, Company and
Inventor Names.
Adaptation of Strategies during the Search.
Selection of Databases.
Rejection and Selection of Documents (Culling).
Search Report.

E. Nijhof / World Patent Information 29 (2007) 2025

21

We have evaluated many of the searches that have been


conducted for ASML both in-house and by independent
search rms. Among the results were some ne examples
of what can go wrong: Validity searches being conducted
where the real need was in fact a clearance search, up front
restriction to time frames when doing a validity search, relevant documents being rejected as not relevant; i.e., the
searcher does have a relevant document in front of him
but does not recognize it as being relevant to the subject
at hand. In other examples relevant documents were being
cited for the wrong reasons; of course less serious than not
citing the relevant document at all, but it looks rather
sloppy and it doesnt provide the customer with much condence about the quality of the search. The majority of
mistakes however, is being made in the process of (1) nding out what the technical subject is, (2) the identication
of selection criteria and especially in (3) the selection and
use of search terms. As a result, highly relevant documents
are being missed. The question is of course: why?
It appears that searchers often fall into the trap of
starting to search before they have a crystal clear view
on what it is that they should be looking for. It may
seem wisdom to start searching as soon as you have pictured the general idea and trust that youll get a better
understanding of the subject while searching. Generally
speaking, it is not. Please note that even when your analysis of the technical subject is only slightly o target, you
are not only bound to miss relevant documents because
your target is wrong, you will not recognize relevant documents either, even if, by chance, you do have them in
front of you. To nd out what it is that you should
be looking for you should rst understand the invention
in all its aspects and then, in a separate step, identify the
features that are essential for selection of relevant documents. Lets start with step one, analysis of the technical
subject.

to mind: What is the problem really? What is the cause


of the problem?
Supercially it seems that the problem is dirty clothes.
But why is that a problem? It could also be the cause of a
problem. Problems associated with dirty clothes could be:
sloppy image of the wearer, bad hygiene, contamination
of furniture, etc. On the other hand dirty clothes could
indeed be regarded a problem in itself. In that case the
question is: What could be the cause of that problem: kids
playing in the garden, cycling in the rain or eating cherry
cheese cake with bare hands. In fact, each technical subject
can be broken down in a series of causes and eects. To
stick with the example: having a garden ! kids playing
in the garden ! dirty clothes ! dirty furniture! . . . each
being a cause of the next eect. It is this train of causes
and eects that may make it very dicult to pinpoint the
problem that the invention is intended to solve.
It may help to recognize that any invention or solution
is, by denition, directed at the cause of a problem and
not at the eect(s) or symptom(s) that characterise a problem. As washing with soap is directed at the dirty clothes,
the dirty clothes are, by denition, the cause of a problem.
Before starting to search, one should be aware of all possible series of causes and eects associated with the particular invention. This provides additional keywords that can
be used for searching and may help to improve the quality
of the search.
Above mentioned type of analysis may seem not be
applicable to all subjects. It is however applicable to many
more subjects than one at rst may think. At ASML we
have yet to come across the rst example which does not
t this type of analysis. It does require quite some practice
though, and discussion with colleagues, but it certainly
helps to improve subject analysis.
Once we understand the technical subject in all its
aspects, the next step is to identify selection criteria.

2. Analysis of technical subject

3. Identication of selection criteria essential features

When analyzing a technical subject, mostly one will be


able to identify a problem, a solution and a technical eld.
Often this is more complex than it may seem, especially
when having to distil the invention from four pages of an
invention disclosure.
For analysis, some guidance may be provided by recognizing that the problem is often dened as a cause and
eect relationship (eect  symptom  problem). Cause
and eect together form the basis for dening the object
of the invention. The solution can be seen as a combination of action and subject, the subject being used to perform the action, and the action being directed at the
cause. For example [1], if the object were to have clean
clothes, the solution to the problem would be washing
with soap provided by the action (washing) and the subject (soap) together forming the solution. This seems to
be a rather simple subject. But if you think this analysis
stops here, think again. The questions that should come

Selection criteria are used for review of documents (culling), not necessarily for searching. Selection criteria help to
match your search to your clients needs.
In step one we have analysed the technical subject and
have identied the following ve aspects:
1.
2.
3.
4.
5.

the
the
the
the
the

action,
direct object of the action (the cause),
subject,
object of the invention,
technical eld.

It is important to note that not all of these aspects necessarily need to be regarded as an essential feature for selection. That may sound strange but it really depends on the
type of search request at hand. It also may depend on how
the search is progressing. For instance, the client may want
to learn about all possible solutions to the problem at

22

E. Nijhof / World Patent Information 29 (2007) 2025

hand. That discards the subject as Essential Feature. Also,


often the client does not require publications to mention
the same object of the invention as for the problem at
hand. Publications showing similar embodiments, regardless of their intended use or application, may be relevant
as well. Ideally you select publications that have all identied features. If however, such publications are not found
during the search, the client probably wants to learn about
the next best thing. What that is depends on the search
request at hand.
At ASML we believe it to be very fruitful to verify our
ve aspect analysis and subsequent identication of the
essential features with the client, prior to searching.
Because only as soon as the searcher has a crystal clear
view on what he should be looking for can he start searching. And this is where it becomes really tricky.
4. Selection and use of search terms
If there is one single message in this article, it is the message of this section. In the rst place, the proposed theory
in this section is intended to be a guide to eective searching: i.e., nd the documents that you are looking for. Secondly, the same theory is also a guide to ecient
searching: i.e., nd relevant documents fast.
In general [2], the searcher should start as precisely as
possible. Search the unusual terms. Search by AND-ing
precise words (or phrases) and/or classes. Then expand
carefully (to obtain higher recall if and when necessary):
start searching abstracts or claims before turning to full
text searching. Work your way down from the high precision words and classes to the lower precision words and
classes. Also citation and company or inventor name
searches may be of help to carefully expand the collection
of publications for review. If you do not recognize these
guidelines as being a sound basis for searching you are
bound to miss relevant documents.
But even if the searcher does recognize these guidelines
as being a sound basis for searching, at some point during
the search, he might end up doing the following: He tries to
come up with complete sets of keywords (K) and classes
(C) for each essential feature (EF) and combines them
for searching:
EF1 = K1EF1 or K2EF1 or . . . (or C1EF1 or C2EF1 or . . .).
EF2 = K1EF2 or K2EF2 or . . . (or C1EF2 or C2EF2 or . . .).
EF3 = K1EF3 or K2EF3 or . . . (or C1EF3 or C2EF3 or . . .).
Possible queries:

(EF1)
(EF1)
(EF2)
(EF1)

and
and
and
and

(EF2).
(EF3).
(EF3).
(EF2) and (EF3).

Contrary to the start small, expand carefully


approach these queries are aimed at high recall from the

very early stages of the search. The biggest problem of


these queries is however that often the number of hits is
too large to be able to review them all (an eciency problem). Another problem is that the sets of keywords are supposed to be complete. They rarely are (an eectivity
problem). As a result, these queries may prevent the
searcher from nding some relevant documents at all! It
is important to note that such queries fail to recognize that
not all keywords and classes are equal, some search terms
are more useful than others.
Some remarks about the use of keywords and the use of
classes: both have their advantages and drawbacks. Many
searchers seem to focus on the drawbacks of keywords.
Some go as far as discarding them altogether in favor of
classes. Stephen Adams has written an article on this topic:
Patent Searching Without Words Why Do It, How To
Do It? [3]. Despite the obvious drawbacks I want to
emphasize the power of keywords. I hope this article, especially this chapter, helps to illustrate how a searcher can get
the most out of keywords (although described theory is
equally applicable to, but often less useful for, classes).
Searchers who rely on classes alone are bound to miss relevant documents.
4.1. Theory: not all search terms are equal
Each search term is useful to a dierent extent. This is
determined by precision (= % relevant of retrieved records),
recall (= % retrieved of relevant records) and most important, the number of hits for that search term. If for a certain
single search term the number of hits is low enough, you can
use this term without combining it with (terms for) other
essential features. The beauty is that for this search term,
you do not have to worry about the completeness of the lists
of terms for the other essential features.
Expanding this concept further, search terms can be distinguished as follows:
Type 1 : to be searched without combining it with any
other Essential Feature
Type 2 : to be combined with exactly one other Essential
Feature
Type 3 : to be combined with two other Essential
Features

4.2. Example
The example concerns a novelty search for the use of
blades in the projection system (of a microlithographic
exposure apparatus) to reduce are. The term are, also
known as stray light, is used to indicate light that has a negative eect on imaging quality. The searcher spent eleven
hours on this search and did not nd the publication shown
in Box 1:
The publication was properly classied and are was
mentioned as the problem to be solved in the full text.

E. Nijhof / World Patent Information 29 (2007) 2025

23

Box 1. US2003020893: abstract and drawing from a relevant document not found.

Please notice a variable aperture stop 32 and shielding


plates 50: exactly what the searcher should have been looking for but did not nd.
In this case the essential features had been identied as
the reduction of are and the use of blades in a projection
system. When discussing this with the searcher it became
clear that he had understood this all too well. He understood this so well that he used both are terms and blade
terms, but only in combination (AND) with each other.
And that was the pitfall for this search. Have you ever tried
to come up with a complete list of equivalent keywords or
phrases for blade? In this context you would have to include
such phrases as light intercepting means, pupil lter,
aperture stop, . . . what ever you come up with: that list
will never be complete. In this particular case the searcher
had not thought of shielding plate as synonym for blade.
Can you blame him for that? Perhaps not. But he can be
blamed for the fact that he did not circumvent that problem
all together. The problem of nding all synonyms for each
essential feature can be reduced dramatically by distinguishing between the dierent types of keywords, types 1, 2 and 3.
Now, how could that have helped in this case?
Flare is a very precise term when used within the context
of microlithography. If the searcher would search a worldwide abstract database, searching the standard ECLA class
for microlithography, using the word are, he would nd
42 hits. Only 42 hits! You dont have to combine with other
keywords with so few hits. You should be glad that you
have such a beautiful type one search term at your disposal. Dont waste it by combining with other terms! The
number of hits is so low that you might even consider

doing the same searching full text and guess what: Only
162 hits and still a high precision result set. Number 15
of the mentioned 162 hits is the reference shown, easily
identied by the front page drawing.
This way it took less than 30 minutes to nd this relevant publication where the rst searcher failed to locate
the document at all, and that in eleven hours of searching.
The point is NOT that the methodology used helps to
locate relevant documents so quickly (although it is a very
convenient side eect of this methodology), the point is
that use of this methodology prevents the searcher from
taking a wrong turn as a result of which he will not be able
to retrieve some relevant documents at all. Therefore, the
searcher should not only be aware of this methodology
for novelty or patentability searches but also for clearance
or freedom to operate searches.
4.3. Dos and donts
As mentioned earlier, in general it makes good sense to
start searching precisely and small and subsequently
expand carefully. This is particularly true when trying to
put this theory of dierent search terms into practice. That
is because generally speaking, the number of hits will be
lower and/or the result sets more precise in smaller search
environments. As a result, you will have more search terms
of type 1 at your disposal in these smaller environments.
Examples of smaller search environments are abstract
databases versus full text databases or small databases dedicated to one particular subject area versus large databases
covering all technologies. Smaller environments can also be

24

E. Nijhof / World Patent Information 29 (2007) 2025

Table 1
Matrix of essential features and types of search term
Essential Features
Keywords, phrases or classes

Type 1
Type 2
Type 3

5. Concluding remarks

EF1

EF2

EF3

K1-1
K2-1
K3-1

K1-2
K2-2
K3-2

K1-3
K2-3
K3-3

created by limiting the search to certain classication


codes.
Once a suitable database has been selected, the searcher
should start a search with search terms that are presumed
to be of type 1: i.e., with keywords or classes that, if taken
alone, yield a small enough number of hits to review them
all. When in doubt: presume the term to be of type 1. This
has to be checked for each search term for each essential feature. If for a search term the number of hits is too large: then
apparently the term is of type 2, or even type 3. If all presumed type 1 terms have been tested, you have identied
types 1. Presume the rest to be of type 2. You can do the
same for each combination of type 2 terms. What remains
is type 3. Due to time constraints it may not be possible to
properly distinguish between each of the types 2 and 3.
Sometimes it is not necessary either. Be aware though: in
many cases one should distinguish at least between the types
1 and other types (or types 2 and other types if types 1 dont
seem to exist at all for a particular subject).
Table 1 is intended to illustrate some donts.
KX-Y indicates keywords (or classes) of type X for
essential feature Y.
Do NOT combine any terms for the same essential feature (i.e., dont do any AND operations within the same
cell or between cells in the same column). This is a cumbersome way of telling the obvious: that one should not be
AND-ing synonyms or phrases aimed at describing the
same essential feature (regardless whether theyre of the
same type or of another type). Caveat: this happens more
often than one may think, especially when the searcher
does not have a crystal clear view on the subject at hand.
Do NOT combine terms of a certain type with terms of
another type of another essential feature (i.e., dont do any
AND operations combining cells of dierent rows in dierent columns). If you do that, you fail to recognize the
whole point of this theory.
One last note: If the subject at hand does not seem to
provide the searcher with type 1 or type 2 search terms,
the use of phrases rather than single keywords may help.
For instance, where the use of the keyword magnet in
a certain context may result in a relatively large noisy result
set, the use of the phrase permanent magnet may result
in a smaller and more precise result set. The dierence in
precision and number of hits may just enable the searcher
to use the phrase as a type one search term. As a result
he does not have to worry about the completeness of the
lists of search terms for the other essential features, thereby
minimizing the risk of missing relevant documents for that
particular search term.

For some of the readers of this article the presented subject matter is not more than common sense. If you are one
of those readers, I congratulate you. For others, the proposed way of searching may seem to be just another alternative route one can take while searching. This is the most
frequently heard response from people to my presentation
at the IPI-ConFex 2006 in Athens. For your consideration:
These are the same people that failed to recognize the main
points of the presentation.
Some people say that patent searching is an art. To me,
such a statement suggests that searching cannot be captured in standard processes and theories. I sincerely hope
that this article suggests otherwise. Considerable care is
needed at each stage of a patent search to nd the documents that the searcher should be looking for and, if possible, to nd them fast.
Last but not least, the recommended procedures and
theories are perhaps interesting reading material but it
requires training and management to be able to properly
use these in practice.
Editors note:
The following four documents are noted as of particular
relevance to the development of patent search strategies
and of the searchers to implement these strategies:
Van der Drift J. Eective strategies for searching existing patent rights. World Patent Information.
1991;13(2):6771.
Fletcher J M. Quality and risk assessment in patent
searching and analysis. In: Proceedings of the 4th international chemical information meeting and exhibition,
Montreux, 1921 October 1992. Recent advances in
Chemical Information II, 1993, p. 14756.
Deboys J. Decision pathways in patent searching and
analysis. World Patent Information. 2004;26(1):8390.
Adams S. Certication of the patent searching profession a personal view. World Patent Information.
2004;26(1):7982.

Acknowledgements
As noted earlier, this article has been based on the
authors presentation with the same title at the IPI-Confex
2006 in Athens. Both the article and presentation are based
on the evaluation of many searches that have been conducted
for ASML both in-house and by independent search rms. I
would like to thank my searcher colleagues at ASML for
their input and discussion and in particular for their enthusiasm about the impact of the proposed structured approach
on the quality of their work. Without that, both the presentation and article would not have seen the light of day.

E. Nijhof / World Patent Information 29 (2007) 2025

References
[1] Trippe Anthony J. Patinformatics: identifying haystacks from space.
Searcher vol. 10, No. 9, October 2002. Available from: <http://
www.infotoday.com/searcher/oct02/trippe.htm>.
[2] Mucs A, Kosicki T. Search Strategies within the EPOQUE-Database
Collections Workshop 1. EPO Seminar on Search & Documentation
Working Methods 2005. Available from: <http://www.europeanpatent-oce.org/dg1/searchseminar/2005/_pdf/
sfa_2005_ws_01_kosicki.pdf>.
[3] Adams Stephen. Patent Searching Without Words Why Do It, How
To Do It? FreePint, Issue 130, 6 February 2003. Available from:
<http://www.freepint.com/issues/060203.htm#feature>.

25

Evert Nijhof obtained his degree in mechanical


engineering in 1991 at Delft University of Technology in The Netherlands. In his subsequent
work, he held posts at CSM, the Dutch Ministry of
Economic Aairs, and ASML. A diverse spectrum
of employers, but with one thing in common:
research and patents. In 1999 he started working as
a patent searcher at ASML. Three years later, in
2002, he became Manager of the ASML Corporate
IP Search & Literature Group. In addition, also in
2002, he started his own search rm: NPS Patent
Searches.

Anda mungkin juga menyukai