Anda di halaman 1dari 16

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/6402702

Analysis of recent pharmaceutical regulatory


documents on analytical method validation
ARTICLE in JOURNAL OF CHROMATOGRAPHY A AUGUST 2007
Impact Factor: 4.17 DOI: 10.1016/j.chroma.2007.03.111 Source: PubMed

CITATIONS

READS

148

280

8 AUTHORS, INCLUDING:
Radu Oprean

Serge Rudaz

Iuliu Haieganu University of Medicine and

University of Geneva

52 PUBLICATIONS 525 CITATIONS

266 PUBLICATIONS 6,626 CITATIONS

SEE PROFILE

SEE PROFILE

Bruno Boulanger

Philippe Hubert

University of Lige

University of Lige

107 PUBLICATIONS 2,681 CITATIONS

253 PUBLICATIONS 5,245 CITATIONS

SEE PROFILE

SEE PROFILE

Available from: Attilio Ceccato


Retrieved on: 10 February 2016

Journal of Chromatography A, 1158 (2007) 111125

Analysis of recent pharmaceutical regulatory documents


on analytical method validation
Eric Rozet a , Attilio Ceccato b , Cedric Hubert a , Eric Ziemons a , Radu Oprean c ,
Serge Rudaz d , Bruno Boulanger b , Philippe Hubert a,
a

Laboratory of Analytical Chemistry, Bioanalytical Chemistry Research Unit, Institute of Pharmacy,


University of Li`ege, CHU, B36, B-4000 Li`ege, Belgium
b Lilly Development Centre, rue Granbompr
e 11, B-1348 Mont-Saint-Guibert, Belgium
c Analytical Chemistry Department, Faculty of Pharmacy, University of Medicine and Pharmacy Iuliu Hatieganu,
13 Emil Isac Street, RO-3400 Cluj-Napoca, Romania
d Laboratory of Pharmaceutical Analytical Chemistry, School of Pharmacy, University of Geneva,
20 Bd. dYvoy, 1211 Geneva 4, Switzerland
Available online 1 April 2007

Abstract
All analysts face the same situations as method validation is the process of proving that an analytical method is acceptable for its intended
purpose. In order to resolve this problem, the analyst refers to regulatory or guidance documents, and therefore the validity of the analytical
methods is dependent on the guidance, terminology and methodology, proposed in these documents. It is therefore of prime importance to have
clear definitions of the different validation criteria used to assess this validity. It is also necessary to have methodologies in accordance with these
definitions and consequently to use statistical methods which are relevant with these definitions, the objective of the validation and the objective of
the analytical method. The main purpose of this paper is to outline the inconsistencies between some definitions of the criteria and the experimental
procedures proposed to evaluate those criteria in recent documents dedicated to the validation of analytical methods in the pharmaceutical field,
together with the risks and problems when trying to cope with contradictory, and sometimes scientifically irrelevant, requirements and definitions.
2007 Elsevier B.V. All rights reserved.
Keywords: Validation; Guidelines; Terminology; Methodology; Accuracy profile

1. Introduction
The demonstration of the ability of an analytical method to
quantify is of great importance to ensure quality, safety and
efficacy of pharmaceuticals. Consequently, before an analytical method can be implemented for routine use, it must first
be validated to demonstrate that it is suitable for its intended
purpose. While the need to validate methods is obvious, the
procedures for performing a rigorous validation program are
generally not defined. If regulatory documents allow selecting
the validation parameters that should be established, there are
still three main questions remaining: (a) How to interpret the
regulatory definitions of the parameters? (b) What should be
the specific procedure to follow to evaluate a particular parame-

Corresponding author. Tel.: +32 4 366 43 16; fax: +32 4 366 43 17.
E-mail address: Ph.hubert@ulg.ac.be (P. Hubert).

0021-9673/$ see front matter 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.chroma.2007.03.111

ter? (c) What is the appropriate acceptance criterion for a given


parameter? Furthermore, method validation is not specific to
pharmaceutical industry, but to most industrial fields involving
either biology or chemistry. Even though each field of work has
its own characteristics and issues, the main criteria to fulfil are
similar or should be similar since the validation of an analytical method is independent of the industrial sector, matrix of the
samples or analytical technology employed. A harmonized validation terminology should be adopted to allow discussions and
comparisons of validation issues between scientists of different
fields. This consensus on terminology is not yet available even
if an attempt was made [1,2]. However, if it is desirable to have
a harmonization between the different fields interested in analytical validation, it is interesting to note that, even within the
pharmaceutical field, all the laboratories are not using the same
terminology while they should use similar definitions to describe
validation criteria. The terminology used between different official documents such as the Food and Drug Administration (FDA)

112

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

guide on validation of bioanalytical methods [3], ICHQ2R1


[4], ISO [5,6], IUPAC [7], AOAC [8] is different. Furthermore, in some cases inhomogeneous terminology can be found
throughout the same document depending on the section where
it is mentioned. Therefore, the knowledge and understanding of
these significant differences in terminology and definitions are
essential since the methodologies proposed to fulfil the definition
criteria can lead to confusion when preparing the validation protocol and the experimental design. Furthermore the subsequent
statistical interpretation of the results obtained and the final decision about the validity of the analytical procedure depends on
the consistent and adequate definition of the criteria assessed.
This leads to highly critical consequences since the validated
analytical method will be daily used in routine analysis (batch
release, stability assessment, establishment of shelf life, pharmacokinetic or bioequivalence studies, etc.) to make decision of
the utmost business and public health consequences. Therefore,
the main objective of this review is to reveal the inconsistencies between the definitions of the validation criteria and the
proposed experimental procedures to perform those criteria as
well as the statistical tools mandatory to help the decision about
the validity of the analytical procedure. The main points discussed in this review are: (a) the distinction that can be made
concerning specificity and selectivity; (b) the clarification of the
linearity concept and the difference with the response function;
(c) the definition of precision, trueness and accuracy; (d) the
discussion about the decision rules to adopt from a statistical
point of view; (e) the definition of the dosing range in which
the analytical method may be used and, last but not least, (f)
the determination of the limit of quantification (LOQ). Finally,

the risks and problems when trying to cope with inconsistent,


sometimes scientifically irrelevant, requirements and definitions
are highlighted.
2. Specicity or selectivity
The first criterion for an analyst when evaluating an analytical method consists in its capability of delivering signals
or responses that are free from interferences and give true
results. This ability to discriminate the analyte from interfering components has been confusedly expressed for many years
as selectivity or specificity of a method, depending on area
of expertise of the authors.
The terms selectivity and specificity are often used
interchangeably while their significances are different. This
concept was extensively discussed by Vessman in different
papers [913]. He particularly pointed out that organizations
such as IUPAC, WELAC or ICH are defining specificity and/or
selectivity in different manners (Table 1). However, a clear
distinction should be made as proposed by Christian [14], A
specific reaction or test is one that occurs only with the substance
of interest, while a selective reaction or test is one that can occur
with other substances but exhibits a degree of preference for
the substance of interest. Few reaction are specific, but many
exhibits selectivity. This is consistent with the concept that
selectivity is something that can be graded while specificity is
an absolute characteristic. Some tentative to quantify selectivity
can be found in the literature [1519]. For many analytical
chemists, it is commonly accepted that specificity is something
exceptional since there are, in fact, few methods that respond

Table 1
Definitions of selectivity and specificity in different international organizations
Organization

Definition

Reference

IUPAC

Selectivity (in analysis)


1. (qualitative): The extent to which other substances interfere with the determination of a
substance according to a given procedure.
2. (quantitative): A term used in conjunction with another substantive (e.g. constant,
coefficient, index, factor, number) for the quantitative characterization of interferences.
Specific (in analysis)
A term, which expresses qualitatively the extent to which other substances interfere with the
determination of a substance according to a given procedure. Specific is considered to be the
ultimate of selective, meaning that no interferences are supposed to occur. The term specicity is
not mentioned.
Selectivity of a method refers to the extent to which it can determine particular analyte(s) in a
complex mixture without interference from other components in the mixture. A method, which is
perfectly selective for an analyte or group of analytes is said to be specific.
Not defined.

[14]

Specificity is the ability to assess unequivocally the analyte in the presence of components, which
may be expected to be present. Typically these might include impurities, degradants, matrix, etc.
Lack of specificity of an individual analytical procedure may be compensated for by other
supporting analytical procedure(s).
Test for interferences (specificity): (a) Test effect of impurities, ubiquitous contaminants, flavours,
additives, and other components expected to be present and at unusual concentrations. (b) test
nonspecific effects of matrices. (c) Test effects of transformation products, if method is to indicate
stability, and metabolic products, if tissue residues are involved.

[4]

WELAC

ISO
ICH

AOAC

[20]

[8]

IUPAC: International Union of Pure and Applied Chemistry; WELAC: Western European Laboratory Accreditation Cooperation; ICH: International Conference on
Harmonization; ISO: International Organization for Standardization; AOAC: Association of Official Analytical Chemists.

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

to only one analyte. Considering these elements, the IUPAC


definition stating that the specificity can be considered as
the ultimate selectivity seems to be rational regarding the
situation in the pharmaceutical industry [9]. It must be noted
that WELAC provides probably the most clear definition of
selectivity by saying that a method which is perfectly selective
for an analyte is said to be specific [20]. As recommended
by the IUPAC and WELAC, the term selectivity should be
promoted in analytical chemistry and particularly in separation
techniques and the term specificity should be discouraged.
3. Response function and linearity
The response function for an analytical procedure is the existing relationship, within a specified range, between the response
(signal, e.g. area under the curve, peak height, absorption) and
the concentration (quantity) of the analyte in the sample. The
calibration curve should be described preferably by a simple
monotonic (i.e. strictly increasing or decreasing) response function that gives reliable measurements, i.e. accurate results as
discussed thereafter. The response function or standard curve
is widely and frequently confounded with the linearity criterion. The linearity criterion refers to the relationship between
the quantity introduced and the quantity back-calculated from
the calibration curve while the response function refers to the
relationship between the instrumental response and the concentration. Because of this confusion, it is very common to
see laboratory analysts trying to demonstrate that the response
function is linear in the classical sense, i.e. a conventional leastsquared linear model is adapted. As demonstrated by several
authors, to systematically force a linear function is not required,
often irrelevant and may lead to large errors in measured results
(e.g. for bioanalytical methods using LCMS/MS or ligand binding assays) where the linear range can be different from the
working or dosing range [21,22]. A significant source of bias
and imprecision in analytical measurements can be caused by
the inadequate choice of the statistical model for the calibration curve. The confusion is even contained and maintained in
the ICH document. In the terminology part of Q2R1 (formerly
Q2A), the linearity is correctly defined as the . . . ability (within
a given range) to obtain test results which are directly proportional to the concentration (amount) of analyte in the sample.
But later in the methodology section (formerly Q2B) it is mentioned that Linearity should be evaluated by visual inspection
of a plot of signals as a function of analyte concentration or content. The text indicates clearly that it is the signal and no more
the result that matters in the linearity. The document clearly
confounds, on one hand, linearity and calibration curve and,
on the other hand, test results and signal. The continuation of
the text is self-explicit: If there is a linear relationship, test
results should be evaluated by appropriate statistical methods,
for example, by calculation of a regression line by the method
of least squares. For an analyst, the test results are, without
ambiguity, the back-calculated measurements evaluated by the
regression line that is in fact the calibration curve, established
using appropriate statistics methodologies. Last but not least,
the fact that no linearity is needed between the quantity and the

113

signal is paradoxically contained in the last sentence of that


section devoted to linearity: In some cases, to obtain linearity
between assays and sample concentrations, the test data may
have to be subjected to a mathematical transformation prior to
the regression analysis. Indeed, if any kind of mathematical
transformation can be applied to both concentration and/or signal to make their relationship looking like straight lines what
is the very purpose of requiring linearity? Clearly, the intend
of that section was, confusedly, to suggest that to use the classical least-square linear function it is sometimes convenient to
apply transformations to the data when the visual plot signal
versus concentration does not look straight. It is indeed a good
trick, largely diffused to establish the standard curve, but that
trick should not be interpreted as a scientific necessity to have
a linear relationship between the concentration and the signal. Hopefully, since 1995 understanding has evolved so that
the FDA guidance on Bioanalytical Method Validation issued in
May 2001 [3] does not contain any more the word linearity but
only calibration/standard curve without particular restriction
except that The simplest model that adequately describes the
concentrationresponse relationship should be used.
Nevertheless, the same confusion in concept and wording
between response function and linearity of results still can be
found in the recent book by Ermer and Miller [23]. While those
authors indicate that some analytical procedures have intrinsic non-linear response function, such as quantitative TLC . . .
they continue to use the linearity terminology to express the
calibration curve.
In the same context, HPLC methods coupled to spectrophotometric detection (UV) are usually linear according to
LambertBeers law while immunoassays are typically nonlinear. However, even for HPLCUV methods covering a large
dynamical range, advanced models, such as quadratic models or
loglog models, could be necessary. Indeed, it is important to
model properly the whole procedure, including all the handling
and the preparation of samples that do not obviously remain
linear over a large range of concentration even if the detector
response is, according to LambertBeers law. It has to be noted
that the complete analytical procedure should be modeled by
an overall appropriate response function. As long as the model
remains monotonic and allows an accurate measurement, that is
all that is required.
Another aspect that is very important and that has been largely
neglected and ignored in the analytical literature is the fit-forpurpose principle [21]. The central idea is very logical: the
purpose of an analytical procedure is to give accurate measurements in the future; so a standard curve must be evaluated on its
ability to provide accurate measurements. A significant source of
bias and imprecision in analytical measurements can be caused
by the inadequate choice of the statistical model for the calibration curve. The statistical criteria such as R2 , lack-of-fit or any
other statistical test to demonstrated quality of fit of a model
are only informative and barely relevant for the objective of the
assay [21,2427]. For that intend, several authors [1,2,28] have
introduced the use of the accuracy profile based on the tolerance intervals (or prediction intervals) to decide if a calibration
model will give quality results. The models should be retained

114

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

Fig. 1. Accuracy profiles of the LCMS/MS assay for the determination of loperamide in plasma (concentration pg/ml) using (A) linear regression model, (B)
weighted linear regression model with a weight equal to 1/X2 , (C) linear regression model after logarithm transformation, (D) quadratic regression. The dotted lines
represent the acceptance limits (15%, 15%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance
limits, then the assay is able to quantify accurately, other wise not. The continuous line represents the estimated relative bias line.

or rejected based on the accuracy of the back-calculated results


regardless of the statistical properties. This approach has already
been used by several authors such as Streel et al. [29] for the validation of a LCMS/MS assay for the quantitative determination
of loperamide in plasma [29].
As can be seen from Fig. 1 and indicated by the authors,
the weighted linear regression provides for the procedure the
best accuracy profile as obtained by joining the extremes of
the 95% tolerance intervals, i.e. the interval that will contain
95% of the future individual results. Inversely, the simple linear
model, the quadratic or even a model with a loglog transformation are not adapted because they do not better contribute to
the ultimate goal of the assay, i.e. providing accurate results in
the future. Indeed, the tolerance intervals for those three models
are not as included in the acceptance limits defined as with the
selected model. Nevertheless, as can be seen on Fig. 2, when
looking at the quality of fit as usually practiced, the four models
exhibited a R2 > 0.999 for all series. This figure, representing
in a way the quality of the linear fit [4], does not show any
difference from one model to the other. This contrast with the
accuracy profile figure, where a major difference exists in the

quality of the results depending on the model selected as standard


curve.
Another example to illustrate the difference between response
function, linearity and fit-for-purpose accuracy profile can be
obtained with a high-performance thin-layer chromatographic
assay (HPTLC; Fig. 3) and an enzyme-linked immunosorbent
assay (ELISA) published in [30] (Fig. 4). Indeed as can be seen,
using a quadratic response function for the HPTLC assay or
a non-linear standard curve such as the weighted four parameter logistic model for the ELISA, the graphic of the signal
(Figs. 3.a.1 and 4.a.1) does not look linear, while the results as a
function of the concentration are linear (Figs. 3.a.2. and 4.a.2.).
The same apply with the accuracy profile (Fig. 3.a.3. and 4.a.3)
that clearly shows that when using this standard curve, the assay
is able to quantify over a large range. This property is not fulfiled
with the other linear models for both type of assay. In both cases,
the fit of the model is acceptable, but none of these two linear
models show acceptable linearity or accuracy according to ICH
definition.
The standard curve model selection based on the obtained
accuracy of the results was difficult to envisage few years ago

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

115

Fig. 2. Response functions for the LCMS/MS assay for the determination of loperamide in plasma (concentration pg/ml) for series 2 only using (A) linear regression
model (R2 = 0.9991), (B) weighted linear regression model with a weight equal to 1/X2 (R2 = 0.9991), (C) linear regression model after logarithm transformation
(R2 = 0.9997), (D) quadratic regression (R2 = 0.9991).

because it requires a lot of computing and is a post-data acquisition scenario, e.g. evaluation of all the putative calibration
models before making a choice. Currently, the computational
power is no more a limitation and the selection of a model is
perfectly aligned with the objective of the method.
Having stressed the difference between response function and
linearity, it allows to apply the concept of linearity not only to
relative but also to absolute analytical methods such as titration for which the results are not obtained by back-calculation
from a calibration curve. Attempts to provide a response function are therefore of no use and impracticable as there is no
signal or response whereas the linearity of the results can be
assessed.
Statistical models for calibration curves can be either linear or
non-linear in their parameter(s)as opposed to linear in shape,
indeed a quadratic model Y = + X + X2 is linear because it
is a sum of X even if its graphical representation may look
curved on a XY plot. The choice between these two families
of models will depend on the type of method and/or the range
of concentrations of interest. When a narrow range is considered, an unweighted linear model is usually adapted, while a
larger range may require a more complex or weighted model.

Weighting may be important because a common feature for most


analytical methods is that the variance of the signal is a function
of the level or quantity to be measured.
In case of heterogeneous variances of the signal across the
concentration range which is frequent it is natural to observe
that weighting improve significantly the accuracy of the results,
particularly at low concentration levels. When observations are
not weighted, an observation more distant to the curve than
others has more influence on the curve fit. As a consequence,
the curve fit, and so the back-calculated may not be good
where the variances are smaller. Regardless of model type, it
is assumed that all observations fit to a model are completely
independent. In reality, replicates are often not independent for
many analytical procedures because of the steps followed in
preparation and analysis of samples. In such cases, replicates
should not be used separately. Models are typically applied
on either a linear scale or log scale of the assay signal and/or
the calibrator concentrations. The linear scale is used in case
of homogeneous variance across the concentration range and
the log scale is usually recommended when variance increases
with increasing response, because it suggests that the response
is log-normally distributed. Most commonly used types of

116

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

Fig. 3. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on a high-performance thin-layer chromatography using
(left, a) a quadratic regression model, (right, b) a linear regression model. For the linearity profile and the accuracy profiles, the dotted lines represent the acceptance
limits (10%, 10%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able
to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the accuracy profile
the continuous line represents the estimated bias line.

polynomial models include simple linear regression (with or


without an intercept) and quadratic regression models. The
model parameters are estimated using the restricted maximum
likelihood method, which is equivalent to the ordinary least
square method when the data are normally distributed.

This being said, and because of fitting techniques, the experimental design, i.e. the way to spread the concentration values
over the range may significantly impact the precisions of the
results or inverse predictions that the response function will provide. As show by Francois et al. [31], depending on the model

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

117

Fig. 4. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on an immuno-assay using (left, a) a weighted 4-parameter
logistic model, (right, b) a linear regression on the most linear part of the response. For the linearity profile and the accuracy profiles, the dotted lines represent
the acceptance limits (30%, 30%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then
the assay is able to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the
accuracy profile the continuous line represents the estimated bias line.

that will be used for the response function, there are designs that
give more precise measurements than others. As general good
rule of thumb for optimally choosing the concentrations values, they show that for most models used in assays, from linear

to four-parameter logistic models, having standard points at the


extremes of the range and equally spreading the replicated standard points over the range in between gives excellent results in
general, particularly when the model has not yet been clearly

118

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

identified, which is the case during validation phase. They also


stress the importance of having replicates, particularly at the
extremes because of the leverage of those points on the fitting. When the model has been identified, then optimal designs
can be envisaged to improve again slightly the precision of the
measurements.
4. Accuracy, trueness and precision
4.1. Trueness
As can be seen from the following definition of trueness find
in the ISO documents [5,6], the International vocabulary of basic
and general terms in metrology (VIM) [32] or Eurachem document on the Fitness for Purpose of Analytical Methods [33],
trueness is a concept that is related to systematic errors: ISO
5725-Part 1 (General Principles and Definitions) definition of
trueness (section 3.7) is: The closeness of agreement between
the average value obtained from a large series of test results
and an accepted reference value. The measure of trueness is
usually expressed in terms of bias. Trueness has been referred
to as accuracy of the mean. This usage is not recommended.
Indeed it is expressed as the distance from the average value of
a series of measurements (xi ) and a reference value T . This
concept is measured by a bias, relative bias or recovery:
Bias = x i T

Relative bias (%) = 100
Recovery (%) = 100

x i T
T

x i
= 100 relative bias (%)
T

The ISO documents 5725 unambiguously affirm what is trueness and how to measure it. Application of this concept to the
validation experiments is that measuring several times independent validation standards, for instance i standards, for which the
true value of analyte concentration or amount (T ) is known
allows to compute their predicted concentration or amount: xi .
Therefore, it is possible to compute the mean value of these
predicted results (xi ) and consequently estimate the bias, relative bias or recovery. Those values are well estimated as they
are daily computed during the validation step of an analytical
procedure. Trueness is related to the systematic errors of the
analytical procedures [2,5,6,34]. Trueness refers thus to a characteristic or a quality of the analytical procedure and not to a
result generated by this procedure. This nuance is fundamental,
as we will see thereafter.
However, when looking for trueness in the regulatory documents for the validation of the pharmaceutical analytical
procedures, this concept is not per se defined. Conscientiously
reading both the ICH Q2R1 [4] and the FDA Bioanalytical
Method validation [3] documents references to this concept are
nonetheless made. When looking at ICH Q2R1 part 1 the use
of trueness is made: The accuracy of an analytical procedure
expresses the closeness of agreement between the value which
is accepted either as a conventional true value or an accepted

reference value and the value found. This is sometimes termed


trueness. Whereas for the FDA Bioanalytical Method Validation document this reference is made in the Glossary: The
degree of closeness of the determined value to the nominal or
known true value under prescribed conditions. This is sometimes termed trueness. Here can be seen a mix between trueness
and by extension accuracy of the mean (by opposition to accuracy of the results). ISO documents also specify that this use
of terminology of accuracy should be avoided and replaced by
trueness.
When comparing both of those two last quotes of truenes
in the ICH or FDA documents to the definition of ISO documents, the main difference is that both documents talk about the
distance between the true value and the value found or the determined value whereas the trueness definition of ISO is looking
at the distance between the average value and the true value. It
is essential to distinguish the difference between a result and
an average value. The results of an analytical procedure are its
very objective. When examining a quality control sample, the
result impacts the decision to release a batch. When unknown
samples are determined the results gives information about the
therapeutic effect of a drug or about the pathological or physiological state of a patient and so on. What matters is to ensure
that each unknown or known sample will be determined adequately. This average value only gives the central location of the
distribution of results of the same true content, not the position
of each individual result. By extension, the bias, relative bias or
recovery will locate the distribution of the results produce by the
analytical procedure relative to the accepted true value.
This incoherence of definition is not only found from one
document to another but also in different sections in a single document, especially in ICH Q2R1 document. In part II, Section 4.3.
Recommended data relative to accuracy: accuracy should
be reported as percent recovery by the assay of known added
amount of analyte in the sample or as the difference between the
mean and the accepted true value together with the condence
intervals. This is coherent with the definition of trueness form
the ISO documents and not with the corresponding definition
of part I of the ICH document. In other documents confusion
between trueness and accuracy is also observed [7,35].
When assessing the acceptability of the bias, relative bias or
recovery, the methodology mostly used is to apply the following
Student t-test:
H0 : x i T = 0
H1 : x i T = 0
for which a signification level is set, generally at 0.05 in the
pharmaceutical field. This means that it is accepted that the null
hypothesis H0 will be rejected wrongly 5 times out of 100. That
is, we accept to erroneously consider the bias different from 0
in 5 times out of 100. When the computed student quantile is
higher than the corresponding theoretical quantile, or equivalently when the p-value is smaller than , the null hypothesis
is rejected. Therefore, there is a high confidence that the bias is
different from 0 as the significant level is fixed by the analyst.
Another way to interpret this test is to look if the 0% relative

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

bias or 100% recovery is included in the 1 confidence interval of the relative bias or recovery, respectively. If these values
are outside their corresponding confidence interval then the null
hypothesis is rejected. However, the only conclusion which can
be made when the null hypothesis is not rejected is not that the
bias, relative bias or recovery is equal to 0, 0% or 100% but it
is that the test could not demonstrate that the bias, relative bias
or recovery is different than 0 or 100. As clearly demonstrated
in numerous publications [27,3638], the risk, which is the
probability to wrongly accept the null hypothesis, is not fixed
by the user in this situation. Furthermore, this approach can conclude that the bias is significantly different from 0, whereas it
could be analytically acceptable [27,3638]. It will also always
consider that the bias is not different from 0 when the variability
of the procedure is relatively high. In fact, the Student t-test used
this way is a difference test which answers the question: Is the
bias of my analytical procedure different of 0? However, the
question the analyst is wishing to answer during the validation
step of the analytical procedure is: Is the bias of my analytical
procedure acceptable? The test to answer this last question is
an equivalence or interval hypothesis test [27,3638]. In these
types of test, the analyst has to select an acceptance limit for the
bias, relative bias or recovery, that is limits in which if the true
bias, relative bias or recovery of the analytical procedure lays the
trueness of this procedure is acceptable. Different authors have
recommended the use of this type of tests to assess the acceptability of a bias [27,38]. Indeed a perfectly unbiased procedure
is utopia. Furthermore the bias obtained during the validation
experiment is only an estimation of the true unknown bias of the
analytical procedure. Nevertheless, this latest interval hypothesis test, while statistically correct, is not answering to the real
analytical question: the very purpose of validation is to validate
the results a method will produce, not the method itself. We will
come back to this objective and explain more in detail, in Section
5, the connections existing between good results and good
methods.
4.2. Precision
Contrary to trueness, homogenous definitions for precision
can be found in the regulatory documentation. For instance, the
ICH Q2R1 Part 1 definition of precision is: The precision of
an analytical procedure expresses the closeness of agreement
(degree of scatter) between a series of measurements obtained
from multiple sampling of the same homogeneous sample under
the prescribed conditions. This definition of precision is consistent with its definition in the FDA Bioanalytical Method
Validation, ISO, Eurachem, IUPAC, FAO and AMC documents.
As stated in all documents, precision is expressed as standard deviation (s), variance (s2 ) or relative standard deviation
(RSD) or coefficient of variation (CV). It measures the random
error linked to the analytical procedure, i.e. the dispersion of
the results around their average value. The estimate of precision
is independent of the true or specified value and the mean or
trueness estimate. Each document makes reference to different
precision levels. For ICH Q2R1 and ISO documents, three levels
could be assessed:

119

(1) Repeatability which expresses the precision under the same


operating conditions over a short interval of time. Repeatability is also termed intra-assay precision.
(2) Intermediate precision which expresses within-laboratories variations: different days, different analysts, different
equipment, etc.
(3) Reproducibility which expresses the precision between
laboratories (collaborative studies, usually applied to standardization of methodology).
The repeatability conditions involve the re-execution of the
entire procedure to the selection and preparation of the test
portion in the laboratory sample and not only the replicate instrumental determinations on a single prepared test sample. The
latter is the instrumental precision which does not include the
repetition of the whole analytical procedure.
The document of the FDA, also distinguish within-run, intrabatch precision or repeatability, which assesses precision during
a single analytical run, and between-run, inter-batch precision or repeatability, which measures precision with time, and
may involve different analysts, equipment, reagents, and laboratories. As can be seen in this document the same word,
namely repeatability, is used twice for both component of
variability which is certainly not free of confusion for the analyst. Furthermore this document considers at the same level
the variability in a single laboratory or in different laboratory.
The validation of an analytical procedure is performed by
a single laboratory as it has to demonstrate that the analytical
procedure is suitable for its intended purpose. The evaluation of
laboratory to laboratory method adequacy is usually performed
with the objective to standardize the procedure or to evaluate the
performance of several laboratories in a proficiency test, also
called ring test, and is regulated by specific documents and
rules.
In order to evaluate correctly the two components of variability of an analytical procedure during the validation phase, the
analysis of variance (ANOVA) by concentration level investigated is recommended. As long as the design is balanced, i.e. the
same numbers of replicates per series for a concentration level,
the least square estimations of the variance components can be
used. However, when this condition is not met the maximum
likelihood estimates of those components should be preferred
[2,30].
From the ANOVA table, the repeatability or within-run
precision and the between-run precision are obtained as
follows:
p

MSMj =

1 
n(xij,calc
j )2
p1
i=1

where x ij,calc is the average of the calculated concentration of the


jth concentration level of the ith series; p the number of series;
n the number of replicates per series;
p

j =

1 
xijk,calc
pn
i=1 k=1

120

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

with xijk, calc being the calculated concentration from the selected
response function.
p

MSEj =

1 
(xijk,calc x ij,calc )2
pn p
i=1 k=1

If MSEj < MSMj then:

Table 2
Experimental design of four runs taking into account days, operators and equipments as sources of variability
Run 1

Run 2

Run 3

Run 4

Day 1
Operator 1
Equipment 2

Day 1
Operator 2
Equipment 1

Day 2
Operator 1
Equipment 1

Day 2
Operator 2
Equipment 2

2
W,j
= MSEj

MSMj MSEj
=
n
Else:

2
B,j

2
W,j
=

1 
(xijk,calc x j,calc )2
pn 1
i=1 k=1

2
B,j
=0

The intermediate precision is computed as follows:


2
IP,j

2
2
= W,j
+ B,j

2 is the within-run or repeatability variance and


2
B,j
where W,j
is the between run variance.
It is important to note that the misapplications of known variance formula are still widely used and can lead to dramatic
overestimation of the variance components [36,39].
As can be seen in the regulatory documents what makes the
difference between repeatability and intermediate precision is
the concept of series or runs. These series or runs are composed
at least of different days with eventually different operators
and/or different equipments. A run or series is a period during
which analyses are executed under repeatability conditions that
remain constant. The rational to select the different factors which
will compose the runs/series is to mimic conditions that will be
encountered during the routine use of the analytical procedure.
It is evident that the analytical procedure will not be used only
1 day. So including the variability from one day to another of
the analytical procedure is mandatory. Then during its routine
use, will the analytical procedure be used by only one operator, and/or on only one equipment? Depending on the answers
of these questions, different factors representing the procedure
that will be used during the routinely performed analysis will be
introduced in the validation protocol, leading to a representative
estimation of the variability of the analytical procedure.
When the selection of the appropriate factors is made, an
experimental design can be made in order to optimize the number
of runs or series to account for the main effects of these factors
with a cost effective analysis time. For example if the factor
selected are days, operators and equipments, each of them at
two levels, then a fractional factorial design allows to execute
four runs or series in only 2 days. The design is shown in Table 2.
Having computed the variance components, one interesting
parameter to observe is the ratio Rj , with

Rj =

2
B,j
2
W,j

Indeed this parameter shows how important is the series to


series (or run to run) variance in comparison to the repeatability
variance. High values of Rj , e.g. greater than 4, could suggest
either a problem with the variability of the analytical procedure
whose results may vary from one run to the other, and so leading
to the redevelopment of the method, or either stress a lack of
number of series (runs) used during the validation process to
2 .
obtain a reliable estimate of the between-series variance B,j
In this last situation, all the results within a run are highly
correlated to each others providing little effective information
in regards to the run to run results. The effective sample size
of the validation is consequently smaller than the real sample
size used for the design of the validation experiments. The term
effective sample size is used to indicate that when the results
within a run a correlated, then there is in fact less information to
judge the quality of results as compared to a situation where all
results are fully independent and not dependent of the run they
belong to. This is an important feature to take into account for
the definition of the degrees of freedom to use for the computation of a confidence interval or of a tolerance interval. Indeed
if the results of repeated experiments are correlated computing
the degrees of freedom with the total number of experiments
performed will artificially reduce the confidence interval or the
tolerance interval. The Satterthwaite degrees of freedom [40]
include this concept of effective sample by modeling the degrees
of freedom between a minimum value the number of series
and a maximum value the total number of experiments
(series replicates) [41].
Usually, precision is commonly expressed as the percent Relative Standard Deviation (RSD). The classical formula is:

2
RSD (%) = 100
x
where 2 is the estimated variance and x is the estimated average
value.
When an RSD precision is expressed, the corresponding variance is used, e.g. repeatability or intermediate precision. The
computed RSD is therefore the ratio of two random variables,
giving a new parameter with high uncertainty. However, in the
case of validation of analytical procedure, because the true or reference value is known, then the denominator should be replaced
by its corresponding true value T . The RSD computed by
this way depends only on the estimated precision (estimated
variances), regardless of the estimated trueness.
This being said, the use of relative estimate is convenient
from a direct reading point de view but triggers nevertheless
a series of queries: what matters the most for the results, the
(absolute) variance or the relative standard deviation? Imagine

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

that a bioanalytical method is used for supporting a pharmacokinetic study. In that case the results are used for fitting the PK
non-linear model and what matters only is either the variance of
the results or the variance of the logarithms of the results, but
not the RSD at all. Remember that a procedure is validated for
its intended use. So what is the relevance of making a decision
on acceptance of a method based on the RSD when only the
variance of its results are important with regard to the intend
of its use. This distinction becomes particularly important when
dealing with the LOQ. Indeed, since the RSD is the SD divided
by the true concentration value, the RSD becomes large at the
lower end simply because the SD is divided by a small number,
not because the method becomes less precise. A good example
can be seen by comparing the same information on Fig. 3.a.2 in
absolute scale and Fig. 3.a.3 in relative scale. On Fig. 3.a.2 the
distance between the two dashed lines represents a multiple of
the intermediate precision in absolute value while on Fig. 3.a.3 it
is the same value but expressed in relative value. While it appears
that on that later figure (a.3) the relative intermediate precision
(RSD) explode at the smallest concentration, leading to conclude that results are not precise enough at that level, it is also
clear that, for this example, the absolute intermediate precision
improves at the smallest concentration because the intermediate precision SD is smaller. The contradiction comes here from
the fact that the SD has been divided by a small number, not
because the measurements are less precise, on contrary. This
raise questions on the meaning and the definition of the LOQ.
Indeed why ignoring or discarding the results at those low levels while they are obtained with a variance much smaller than
results at high concentrations? Once again, the answer to this
question lies in the intended use of the results: for supporting
stability or pharmacokinetic studies, not only it is not relevant
to discard those very precise measurements at the small concentrations, but they also are very useful, for example, in estimating
accurately the half-life or the pharmacokinetic of metabolites.
Only the variance or the SD matters, not the RSD. So, while the
common practice evaluate a method with respect to the relative
expression of the precision, scientists in the laboratories should
carefully consider the absolute and fundamental variance before
discarding data and question if it serves or not the objectives of
the study.
4.3. Accuracy
In document ICH Q2R1 part 1 [4], accuracy is defined as: . . .
the closeness of agreement between the value which is accepted
either as a conventional true value or an accepted reference value
and the value found. This definition corresponds to the one of
ISO [5,6] documents or VIM [32] which states that accuracy
is: the closeness of agreement between a test result and the
accepted reference value. Furthermore, in the ISO definition a
note is added specifying that accuracy is the combination of random error and systematic error or bias. From this and as specified
by the Analytical Methods Committee (AMC) [34], it is easily
understood that accuracy rigorously applies to results and not
to analytical methods, laboratories or operators. The AMC also
outlines that accuracy should be used that way in formal writing.

121

Therefore, accuracy denotes the absence of error of a result. Similar definitions of accuracy are found in the Eurachem document
[33].
The total measurement error of the results obtained from an
analytical procedure is related to the closeness of agreement
between the value found, i.e. the result, and the value that is
accepted either as a conventional true value or an accepted reference value. The closeness of agreement observed is based on
the sum of the systematic and random errors, namely the total
error linked to the result. Consequently, the measurement error
is the expression of the sum of trueness (or bias) and precision
(or standard deviation), i.e. the total error. As shown below, each
measurement X has three components: the true sample value T ,
the bias of the method (estimated by the mean of several results)
and the precision (estimated by the standard deviation or, in most
cases, the intermediate precision). Equivalently, the difference
between an observation X and the true value is the sum of the
systematic and random errors, i.e. total error or measurement
error.
X = T + bias + precision

X T = bias + precision

X T = total error

X T = measurement error

X T = accuracy
However, when looking at the section corresponding to accuracy in part 2 of ICH Q2R1 document, the recommended data
to document accuracy are presented as: accuracy should be
reported as percent recovery by the assay of known added
amount of analyte in the sample or as the difference between
the mean and the accepted true value together with the condence intervals. This refers not anymore to accuracy but instead
to the trueness definition of ISO 5725 document because it is the
average value of several results as opposed to a single result as
for the accuracy that is compared to the true value, as already
stated previously. This section refers consequently to systematic errors whereas accuracy as defined in ICH Q2R1 part 1
and ISO 5725 part 1 corresponds to the evaluation of the total
measurement error. In the document FDA Bioanalytical Method
Validation [3], accuracy is defined as . . . the closeness of mean
test results obtained by the method to the true value (concentration) of the analyte. (. . .) The mean value should be within
15% of the actual value except at LLOQ, where it should not
deviate by more than 20%. The deviation of the mean from the
true value serves as the measure of accuracy. As already mentioned in the previous sections, this definition corresponds to
the analytical method trueness. For bioanalytical methods, earlier reviews have already stressed the problem of the difference
of the definition of accuracy relative to trueness [1,2,27,38].
For most uses it does not matter whether a deviation from
the true value is due to random error (lack of precision) or to

122

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

systematic error (lack of trueness), as long as the total quantity


of error remains acceptable. Thus, the concept of total analytical error or accuracy as a function of random and systematic
error is essential. Furthermore, every analyst wants to ensure
that the total amount of error of the method will not affect the
interpretation of the test result and compromise the subsequent
decision [1,2,21,4246]. Decision based on the separate evaluation of the trueness and precision criteria cannot achieve this.
Only evaluation of the accuracy of the results which takes into
account the total error concept, gives guarantees to both laboratories and regulatory bodies on the ability of the method to
achieve its purpose.
5. Decision rule
Most of the regulatory documents do not make any recommendation on acceptance limits to help the analyst to decide
when an analytical procedure is acceptable. They insist, with
the confusions already mentioned, about the criteria that need to
be examined, estimated and reported, but only few rules are proposed about the way to decide. It is a laboratory competence to
justify the decision of accepting and using an analytical method
[3]. The only exception found concerns the FDA document on
bioanalytical methods that clearly indicates in the pre-study validation part: The mean value should be within 15% of the
theoretical value, except at LLOQ, where it should not deviate by
more than 20%. The precision around the mean value should
not exceed 15% of the CV, except for LLOQ, where it should
not exceed 20% of the CV. Later, when referring to in-study
validation, the same document indicates: Acceptance criteria:
At least 67% (4 out of 6) of quality control (QC) samples should
be within 15% of their respective nominal value, 33% of the
QC samples (not all replicates at the same concentration) may
be outside 15% of nominal value. In certain situations, wider
acceptance criteria may be justied. However, these two sections relating to pre-study and to in-study acceptance criteria
summarizes very well the in-depth confusion that exists and that
triggers many debates in conferences on validation. The proposed objective is that for bioanalytical methods, measurements
must be sufficiently close from their true valueless than 15%.
Thats clearly indicated here: QC samples should be within
15% of their respective nominal value. As suggested in the
section on accuracy, this objective is not aligned at all with the
previous rule for (pre-study) validation that impose limits on
methods performance not the results such as the mean and
the precision that must be better than 15% (20% at the LLOQ).
The objective of a quantitative analytical method is to be
able to quantify as accurately as possible each of the unknown
quantities that the laboratory will have to determine. In other
terms, what all analysts expect from an analytical procedure is
that the difference between the measurement or observation (X)
and the unknown true value T of the test sample be small or
inferior to an acceptance limit a priori defined:
< X T < |X T | <
The acceptance limit can be different depending on the
requirements of the analyst and the objective of the analytical

procedure. The objective is linked to the requirements usually


admitted by the practice (e.g. 1% or 2% on bulk, 5% on pharmaceutical specialties, 15% for biological samples, or whatever
limits predefined according the intent of use of the results).
Therefore, the aim of the validation phase is to generate enough information to have guarantees that the analytical
method will provide, in routine, measurements close to the true
value without being affected by other elements of the present in
the sample, assuming everything else remain reasonably similar.
In other words, the validation phase should demonstrate that this
will be fulfiled for a large proportion of the results.
As already mentioned, the difference between a measurement
X and its true value is composed of a systematic error (bias or
trueness) and a random error (variance or precision). The true
values of these performance parameters are unknown but they
can be estimated based on the (pre-study) validation experiments
and the reliability of these estimates depends on the adequacy
of these experiments (design, size).
Consequently, the objective of the validation phase is to evaluate whether, given or conditionally to the estimates of bias
(
M ) and standard deviation ( M ), the expected proportion of
measures that will fall within the acceptance limits, later in routine, is greater than a predefined level of proportion, say , i.e.:
E,
M , M }
{P[|X T | < ]|
However, there exists no exact solution to estimate this
expected proportion. An easy solution to circumvent this aspect
and make a reliable decision, as already proposed by other
authors [1,28,4749], is to compute the -expectation tolerance
intervals [50]:
E M , M {PX [
M k M < X <
M + k M |
M , M ]} =
where the factor k is determined so that the expected proportion
of the population falling within the interval is equal to . If
the -expectation tolerance interval obtained that way is totally
included within the acceptance limits [, +] (e.g. [15%,
15%] for bioanalytical methods or [5%, 5%] for analytical
methods used for a batch release) then the expected proportion
of measurements within the same acceptance limits is greater or
equal to .
Most of the time, an analytical procedure is intended to quantify over a range of quantities or concentrations. Consequently,
during the validation phase, samples are prepared to adequately
cover this range, and a -expectation tolerance interval is calculated at each level.
The accuracy error profile is simply obtained by connecting
the lower limits and by connecting the upper limits, as can be
seen on Fig. 1 or in the bottom of Fig. 3. The inclusion of the
measurement error profile within the acceptance limits [, ] at
key levels must be examined before declaring that the procedure
is valid over a specific range of values. will usually be chosen
above 80% and as shown by Boulanger et al. [43,44], choosing
80% for during pre-study validation guarantees that 90% of
the runs will later be accepted in routine when the 46l (e.g.
4615) rule is used in routine.

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

That way, the pre-study validation decision and the (in-study)


routine decision rule for acceptance of runs becomes aligned
with their respective risks which is not the case as proposed by
the FDA guide [3]. Indeed having a mean (trueness) smaller than
15% and a precision (CV%) smaller than 15% does not guarantee at all that most future results will be within [15%, 15%].
There are two statistical errors behind this classical assumption
that could be summarized as good methods give good results.
First, as pointed out in the section on accuracy, the difference
between the result and its true value is composed by the systematic error (trueness) plus the random error (precision). So, if
for example, a method shows an estimated mean of 14% and an
estimated precision of 14% as well, it is then obvious to imagine that most results will likely fall outside the acceptance limits
and so most runs will be rejected. Second, predicting what will
happen in the future routine depend largely on the quality of the
estimates of the mean and precision, i.e. primarily of the number
of observations collected and the conditions of the experiments.
If the mean and the precision are estimated based on too few
measurements during pre-study validation, or with conditions
(operator, days, etc. . . .) not representative from the routine use,
there is poor confidence that the true bias or the true precision
are in fact not greater than acceptance criteria. The tolerance
interval approach, which is a prediction interval, avoids those
two pitfalls and correctly estimates the expected proportion of
good results depending on the performance criteria and the quality (size, design) of the performed experiments. If the tolerance
interval approach can prevent making decision based on poor
data, it remains the responsibility of the analyst to ensure that
the experimental conditions used during the (pre-study) validation reflects what will be used and practice in routine. The
subtle difference between the method and the procedure should
be stressed here: in the validation experiments, the various operational aspects or potential sources of variance must be included
in the experiments to anticipate what could happen later in routine. The most classical factors are the operators, the column
lot, different set-ups, independent preparation of samples, etc.
. . . in order to simulate or mimic as closely as possible the daily
practice and set of procedure around the use of the method. As
already indicated, it is the whole procedure or practice that must
be validated, not only the method in its most restrictive sense.

123

and ULOQ is the upper limit of quantitation. Thus, the abovementioned definitions are quite similar because for both of
them, the range is correlated with the linearity and the accuracy (trueness + precision). Moreover, both documents specify
that the range is dependent on the specific application of the
procedure. ICH Q2R1 part 2 states that the specied range is
established by conrming that the analytical procedure provides an acceptable degree of linearity, accuracy and precision
when applied to samples containing amounts of analyte within
or at the extremes of the specied range of the analytical procedure. IUPAC defines the range as a set of values of measured
for which the error of a measuring instrument is intended to lie
within specied limits.
The range should be anticipated in the early stage of the
method development and its selection is based on previous information about the sample, in a particular study. The chosen range
determines the number of standards used in constructing a calibration curve.
ICH Q2R1 part 2 recommends the minimum specified ranges
for different studies:
(i) for the assay of a drug substance or a finished (drug) product: normally from 80 to 120% of the test concentration;
(ii) for content uniformity, covering a minimum of 70130%
of the test concentration, unless a wider more appropriate
range, based on the nature of the dosage form (e.g. metered
dose inhalers), is justified;
(iii) for dissolution testing: 20% over the specified range;
(iv) for the determination of an impurity: from the reporting
level of an impurity to 120% of the specification.
Therefore, the dosing range is the concentration or amount
interval over which the total error of measurement or accuracy
is acceptable. It is essential to demonstrate the accuracy of the
results over the entire range. Consequently, and in order to fulfil
these definitions, the proposition of ICH document to realize six
measurements only at the 100% level of the test concentration
to assess the precision of the analytical method should be used
with precautions to be in accordance with the definition of the
range. Accuracy, and therefore trueness and precision should be
evaluated experimentally and acceptable over the whole range
targeted for the application of the analytical procedure.

6. Dosing range
7. Limit of quantitation
For any quantitative method, it is necessary to determine the
range of analyte concentrations or property values over which
the method may be applied. ICH Q2R1 part 1 document defines
the range of an analytical procedure as the interval between
the upper and lower concentration (amounts) of analyte in the
sample (including these concentrations) for which it has been
demonstrated that the analytical procedure has a suitable level
of precision, accuracy and linearity. The FDA Bioanalytical Method validation definition of the quantication range
is the range of concentration, including ULOQ and LLOQ,
that can be reliably and reproducibly quantied with accuracy and precision through the use of a concentrationresponse
relationship, where LLOQ is the lower limit of quantitation

ICH considers that the quantitation limit is a parameter


of quantitative assays for low levels of compounds in sample
matrices, and is used particularly for the determination of impurities and/or degradation products. ICH Q2R1 part 1 defines the
quantitation limit of an individual analytical procedure as the
lowest amount of analyte in a sample which can be quantitatively
determined with suitable precision and accuracy. Limit of quantitation (or quantitation limit) is often called LOQ. Both terms
are used in regulatory documents, the meaning being exactly the
same. ICH document defines only one limit of quantitation. But
the quantification range of the analytical procedure has two limits: LLOQ and ULOQ. In the definition of quantitation limit(s)

124

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

excerpted from IUPAC, Eurachem let us understand that there is


more than one limit of quantification: quantification limits are
performance characteristics that mark the ability of a chemical
measurement process to adequately quantify an analyte. But, in
Eurachem document, only LLOQ, called quantification limit
is discussed.
The FDA Bioanalytical Method Validation document distinguishes between the two limits and defines the lower limit of
quantication (the lowest amount of an analyte in a sample
that can be quantitatively determined with suitable precision
and accuracy) and the upper limit of quantication (the highest amount of an analyte in a sample that can be quantitatively
determined with precision and accuracy). As can be seen in this
document, the only difference is the substitution of lowest
with highest word.
ICH Q2R1 part 2 proposes exactly the same approaches to
estimate the (lower) quantification limit as for the detection limit.
A first approach is based on the well known signal-to-noise
(s/n) ratio approach. A 10:1 s/n is considered by ICH document
to be sufficient to discriminate the analyte from the background
noise. The main problem appears when the measured signal is
not the signal used to quantify the analyte. For example, in
chromatography with spectral detection, the measured signal
represents the absorption units, i.e. the signal height but for the
quantitation the areas are generally used. Therefore, the quantitation limit is not expressing the lowest level of the analyte,
but lowest quantified absorbance. The problem becomes more
complicated in electrophoresis, where the signal is usually considered as the ratio between the peak area and the migration time.
The other approaches proposed by ICH Q2R1 part 2 document are based on the Standard Deviation of the Response and
the Slope and it is similar to the approach used for detection
limit computation. The computation ways for detection (DL) and
quantitation limit (QL) are similar, the only difference being the
multiplier of the standard deviation of the response:
DL

3.3
S

QL

10
S

where is the standard deviation of the response and S = the


slope of the calibration curve.
The same problems explained previously arise for the detection in chromatography or electrophoresis. On one hand, ICH
Q2R1 part 2 document assumes that the calibration is linear, that
is not always true. On the other hand, two ways of measuring
are proposed: those based on Standard Deviation of the Blank
and those based on the Calibration Curve. Neither of these
alternatives offers the adequate solution. The former because the
assumption that the signal units are the same as the measured
units for the calibration and the latter because of the assumption
that LOQ range is already known.
Other problems with those methods of estimation of limits of
quantitation are that they assume that there is a measurable noise,
which is not always the case. Furthermore, when possible, these
approaches are dependant on the manner the noise is measured
and depend from one instrument to another or internal operational set-up such as signal data acquisition rate or the detector
time constant. The LOQ estimated using the signal to noise ratio

is extremely subjective [51,52] and is equipment dependant. The


approaches using the standard deviation of the intercept should
be carefully used as the estimation of this intercept is dependant of the range of the calibration curve: the intercept is only
well estimated if the concentrations used are sufficiently small.
Furthermore each of these approaches provides different value
of the lower limit of quantitations [51,52]. This is highly problematic as it does not allow to compare the LOQ of different
laboratories using the same analytical procedure.
Another approach to estimate the lower limit of quantitation is proposed by Eurachem, based on a target RSD [33]. The
RSD at concentration levels close to the expected LOQ are plotted versus their concentration, and a curve is fitted to this plot.
When the curve crosses the target RSD the corresponding concentration levels is the LOQ. This approach alleviates most the
problems stressed to the previous approaches, as it is not any
more equipment and operator dependant. Still however, none of
these approaches fulfil the definition of the LOQ. Indeed even
with this last approach, only the precision of the analytical procedure has been assessed without trueness estimation and the
whole accuracy (trueness + precision) as required.
In our opinion, the best way to compute both quantitation
limits (LLOQ and ULOQ) is the use of the accuracy profile
approach [1,2,29,30,4345,4749] which fulfil the LOQ criteria
requirement by demonstrating that the total error of the result is
known and acceptable at these concentration levels, i.e. both an
acceptable level of systematic and random errors.
8. Conclusion
For analysts, method validation is the process of proving
that an analytical method is acceptable for its intended purpose. In order to resolve this very important issue, analysts refer
to regulatory or guidance documents which can differ in several points. Therefore, the validity of the analytical method is
partially dependant on the chosen guidance, terminology and
methodology. It is therefore highly essential to have clear definitions of the validation criteria used to assess this validity, to
have methodologies in accordance with these definitions and
consequently to use statistical methods which are relevant with
these definitions, the objective of the validation and the objective
of any analytical methods.
Repositioning the definitions and the methodologies during
revision processes of regulatory documents to eliminate contradictory, sometimes scientifically irrelevant, requirements and
definitions should be recommended and rapidly implemented.
Acknowledgements
Thanks are due to the Walloon Region and the European
Social Fund for a research grant to E.R. (First Europe Objective
3 project No. 215269).
References
[1] Ph. Hubert, J.-J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N.
Cohen, P.-A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111125

[2]

[3]

[4]

[5]

[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]

[21]

[22]

[23]
[24]
[25]
[26]
[27]

N. Mercier, G. Muzard, C. Nivet, L. Valat, J. Pharm. Biomed. Anal. 36


(2004) 579.
Ph. Hubert, J.-J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N.
Cohen, P.-A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 13 (2003)
101.
Guidance for Industry: Bioanalytical Method Validation, US Department
of Health and Human Services, Food and Drug Administration, Center for
Drug Evaluation and Research (CDER), Center for Biologics Evaluation
and Research (CBER), Rockville, May 2001.
International Conference on Harmonization (ICH) of Technical Requirements for Registration of Pharmaceuticals for Human Use, Topic Q2 (R1):
Validation of Analytical Procedures: Text and Methodology, Geneva, 2005.
ISO 5725, Application of the Statistics-Accuracy (Trueness and Precision)
of the Results and Methods of MeasurementParts 1 to 6, International
Organization for Standardization (ISO), Geneva, 1994.
ISO 3534-1: StatisticsVocabulary and Symbols. International Organization for Standardization (ISO), Geneva, 2006.
M. Thompson, S.L.R. Ellison, R. Wood, Pure Appl. Chem. 74 (2002) 835.
Association of Official Analytical Chemists (AOAC), Official Methods of
Analysis, vol. 1, AOAC, Arlington, VA, 15th ed., 1990, p. 673.
J. Vessman, J. Pharm. Biomed. Anal. 14 (1996) 867.
J. Vessman, Accred. Qual. Assur. 6 (2001) 522.
B.-A. Persson, J. Vessman, Trends Anal. Chem. 17 (1998) 117.
B.-A. Persson, J. Vessman, Trends Anal. Chem. 20 (2001) 526.
J. Vessman, R.I. Stefan, J.F. Van Staden, K. Danzer, W. Lindner, D.T. Burns,
A. Fajgelj, H. Muller, Pure Appl. Chem. 73 (2001) 1381.
A.D. McNaught, A. Wilkinson, IUPAC Compendium of Chemical Terminology, second ed., Blackwell, Oxford, 1997.
H. Kaiser, Fresenius Z. Anal. Chem. 260 (1972) 252.
D.L. Massart, B.G. Vandeginste, S.N. Deming, Y. Michotte, L. Kaufman,
Chemometrics, Elsevier, Amsterdam, 1988, p. 115.
A. Lorber, K. Faber, B.R. Kowalski, Anal. Chem. 69 (1997) 69.
K. Faber, A. Lorber, B.R. Lowaski, J. Chemometrics 11 (1997) 419.
K. Danzer, Fresenius J. Anal. Chem. 369 (1997) 397.
WELAC Guidance Documents WG D2, Eurachem/Western European Laboratory Accreditation Cooperation (WELAC) Chemistry, Teddington, UK,
first ed., 1993.
J.W. Lee, V. Devanarayan, Y.C. Barrett, R. Weiner, J. Allinson, S. Fountain,
S. Keller, I. Weinryb, M. Green, L. Duan, J.A. Rogers, R. Millham, P.J.
OBrian, J. Sailstad, M. Khan, C. Ray, J.A. Wagner, Pharm. Res. 23 (2006)
312.
UK Department of Trade and Industry, Managers Guide to VAM, Valid
Analytical Measurement Programme, Laboratory of the Government
Chemist (LGC), Teddington, UK, 1998; http://www.vam.org.uk.
J. Ermer, J.H.M. Miller, Practical Method Validation in Pharmaceutical
Analysis, Wiley-VCH, Weinheim, 2005.
Analytical Methods Committee, Analyst 113 (1988) 1469.
S.V.C. de Souza, R.G. Junqueira, Anal. Chim. Acta 552 (2005) 25.
J. Ermer, H.-J. Ploss, J. Pharm. Biomed. Anal. 37 (2005) 859.
C. Hartmann, J. Smeyers-Verbeke, D.L. Massart, R.D. McDowall, J. Pharm.
Biomed. Anal. 17 (1998) 193.

125

[28] D. Hoffman, R. Kringle, J. Biopharm. Stat. 15 (2005) 283.


[29] B. Streel, A. Ceccato, R. Klinkenberg, Ph. Hubert, J. Chromatogr. B 814
(2005) 263.
[30] Ph. Hubert, J.J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N.
Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)
87.
[31] N. Francois, B. Govaerts, B. Boulanger, Chemometr. Intell. Lab. Syst. 74
(2004) 283.
[32] ISO VIM. DGUIDE 99999.2, International Vocabulary of Basic and General terms in Metrology (VIM), third ed., ISO, Geneva, 2006 (under
approval step).
[33] The Fitness for Purpose of Analytical Methods, Eurachem, Teddington,
1998.
[34] Analytical Methods Committee, AMC Technical Brief 13, Royal Society of
Chemistry, London, 2003; http://www.rsc.org/Membership/Networking/
InterestGroups/Analytical/AMC/TechnicalBriefs.asp.
[35] Food and Agriculture Organization of the United Nations (FAO), Codex
Alimentarius Commission, Procedural Manual, 15th ed., Rome, 2005.
[36] H. Rosing, W.Y. Man, E. Doyle, A. Bult, J.H. Beijnen, J. Liq. Chrom. Rel.
Technol. 23 (2000) 329.
[37] J. Ermer, J. Pharm. Biomed. Anal. 24 (2001) 755.
[38] C. Hartmann, D.L. Massart, R.D. McDowall, J. Pharm. Biomed. Anal. 12
(1994) 1337.
[39] C.R. Jensen, Qual. Eng. 14 (2002) 645.
[40] F. Satterthwaite, Psychometrika 6 (1941) 309.
[41] B. Boulanger, P. Chiap, W. Dewe, J. Crommen, Ph. Hubert, J. Pharm.
Biomed. Anal. 32 (2003) 753.
[42] B. DeSilva, W. Smith, R. Weiner, M. Kelley, J. Smolec, B. Lee, M. Khan,
R. Tacey, H. Hill, A. Celniker, Pharm. Res. 20 (2003) 1885.
[43] B. Boulanger, W. Dewe, Ph. Hubert, B. Govaerts, C. Hammer, F. Moonen,
Accuracy and Precision (total error vs. 4/6/30), AAPS Third Bioanalytical Workshop: Quantitative Bioanalytical Methods Validation and
ImplementationBest Practices for Chromatographic and Ligand Binding Assays, Arlington, VA, 13 May 2006; http://www.aapspharmaceutica.
com/meetings/meeting.asp?id=64.
[44] B. Boulanger, W. Dewe, A. Gilbert, B. Govaerts, M. Maumy-Bertrand,
Chemometr. Intell. Lab. Syst. 86 (2007) 198.
[45] J.W.A. Findlay, W.C. Smith, J.W. Lee, G.D. Nordblom, I. Das, B.S.
DeSilva, M.N. Khan, R.R. Bowsher, J. Pharm. Biomed. Anal. 21 (2000)
1249.
[46] H.T. Karnes, G. Shiu, V.P. Shah, Pharm. Res. 8 (1991) 421.
[47] Ph. Hubert, J.J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N.
Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)
28.
[48] M. Feinberg, B. Boulanger, W. Dewe, Ph. Hubert, Anal. Bioanal. Chem.
380 (2004) 502.
[49] A.G. Gonzalez, M.A. Herrador, Talanta 70 (2006) 896.
[50] R.W. Mee, Technometrics 26 (1984) 251.
[51] J. Vial, A. Jardy, Anal. Chem. 71 (1999) 2672.
[52] J. Vial, K. Le Mapihan, A. Jardy, Chromatographia 57 (2003) S-303.

Anda mungkin juga menyukai