Abstract
Although property valuation models have become an important paradigm in real estate market research, the results of the most well-
known approaches are limited due to various data-related problems such as the non-linearity of relationships, the presence of noise, or
the absence of necessary information. This paper focuses on overcoming these obstacles. We introduce an automated system for property
valuation that combines articial neural network models with a geographic information system, and both tools have shown their
potential usefulness in the eld of economic research. The articial neural network models used in this work are the multilayer
perceptron, the radial basis function, and Kohonens maps.
r 2007 Elsevier B.V. All rights reserved.
1. Introduction models. Other works that must be mentioned are: Tay and
Ho [20], Do and Grudnitsky [6], Worzala et al. [22],
The task of valuating housing properties has been largely McCluskey [14], Nguyen and Cripps [16]. This latter work
developed within the real estate market analysis. This concludes that ANN performs better than multiple-
regression problem has been usually tackled econometri- regression analysis if a sufcient training data size is
cally through hedonic and repeat-sales models, both provided. On the other hand, whichever approach is used,
belonging to the transaction-based approach [23]. The the analysis can be improved through the integration with a
hedonic pricing model was rst developed by Rosen [18] in geographic information system. As Thruston [21] stated,
1974 and constitutes a linear regression approach in which ANN linked to GIS can be used to simulate how the
the property price is determined as the weighted sum of the human brain processes spatial data problems. There are
different characteristics of which the property is made up. many applications in which the ANN coupled to GIS has
The other approach, the repeat-sales model, was intro- turned out very useful. For instance, we can mention: land
duced by Bailey et al. [3] in 1963. This model has been use, oceanography, forestry, consumer movement, airport
applied far less than the hedonic model due to the difculty noise evaluation and so on.
of nding the required information to implement it. The aim of this work is to show how different models of
More recently, there have been some successful attemps ANN and a geographic information system can be
from the geostatistics eld [8]. Additionally, since the combined to constitute a very powerful tool for economic
pioneering work of Borst [5], in 1991 the articial neural research, specically for the design of an automated
network (ANN) models have become a very attractive property appraisal system and for other complex tasks
alternative to the more traditional econometric models. related to the real estate market (e.g., the objective
The main advantage of these techniques is the ability to assignation of a quality level to each property, which
deal with non-linear relationships or initially unknown clearly has a large impact on the market value). In order
to reach these goals, we will use three of the most well-
Corresponding author. Tel.: +34 967 599 200; fax: +34 967 599 220. known ANN models [4,10,13,17]: the multilayer percep-
E-mail addresses: Noelia.Garcia@uclm.es (N. Garca), tron (MLP), radial basis function networks (RBF), and
Matias.Gamez@uclm.es (M. Gamez), Esteban.Alfaro@uclm.es self-organizing feature maps (SOFM) also known as
(E. Alfaro). Kohonens maps [12]. The rst two models represent
0925-2312/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2007.07.031
ARTICLE IN PRESS
734 N. Garca et al. / Neurocomputing 71 (2008) 733742
Table 2
Regression statistics
29 cases in the total sample, the agencies had not labelled 562 cases to develop a network capable of estimating the
the level of quality and there was not enough information missing data. In order to measure the true error in order to
to assign it (home improvements, condition of the oors, select the best networks in terms of their generalization
carpentry, windows, etc.), and so we used the remaining capacity, the available samples were divided into three
ARTICLE IN PRESS
N. Garca et al. / Neurocomputing 71 (2008) 733742 737
different sets: the training set (288 cases), the validation set 40
(137 cases), and the test set (137 cases). After many trials, 35
Frequencies
(14 nodes in the input layer, a hidden layer with six 25
nodes, and one node for each quality level in the output 20
capable of classifying 82% of the cases. Since the MLP Fig. 4. Error histogram for the test set.
results were more satisfactory than Kohonens ones,
missing data were substituted from MLP predictions.6
Table 3
3. Estimation of free housing prices Sensitivity analysis
5 7
Although self-organizing feature maps were primarily designed to solve All variables were pre-processed before being introduced into the
clustering tasks, they can also be used to supervise classication tasks. network. The numerical variables were scaled to produce new variables in
Once the SOFM has been trained, we can label each competition node. the range 01. The qualitative variables were encoded by the two-state
The labels allow us to compute the error as the percentage of well- technique in a single input variable, except for the variable quality that was
classied cases in the supervised classication as normal. In this work, one converted using the one-of-N method. This technique uses a set of
restriction has been imposed and that is that at least 50% of the cases variables, one for each possible nominal value. In this case, there were
where one node is the winner must belong to the same class for this node three categories of quality, so the total number of variables changes from
to be labelled with this class. With this restriction, we attempt to keep give- 14 to 16.
8
and-take between the number of nodes that will remain unlabelled and The Delta-Bar-Delta rule, proposed in [11], is an improvement of
trust in the labelled neurons. In this case, the result is that nine cases in the standard back propagation. The objective is to accelerate the convergence
training set were not classied, one in the validation set and six in the test of the learning process from the following idea: since the error surface may
set. have different gradients along the direction of each weight, it might be
6
Nevertheless, an exhaustive comparison was carried out which desirable to allow the learning rates to differ for each adjustable parameter
concluded that agreement between both procedures exceeded 75%. in the network and to allow these rates to be adaptable during the epochs
ARTICLE IN PRESS
738 N. Garca et al. / Neurocomputing 71 (2008) 733742
145441 238635
62459.2 513735.1
50 250 60
2375
Y
D) )
(S
GE
Y
LO
U
(A
(A
RF
B
G
GA
AC
X
E)
E)
X
35 0
00
However, it is worth pointing out that in centrally located further from the centre in such a way that the highest prices
areas, the oldest properties command the highest prices. On were found for both extremes of age. The response graph
the other hand, the response had a non-linear behaviour for surface and age shows that for the largest properties,
the price tends to decrease with age, whereas for medium-
(footnote continued) sized properties, the price only decreases with age until a
assumption: e.g. Mok, Chan and Cho [15]; Atack and Margo [2] or Dunse certain point after which it starts to increase. This suggests
and Jones [7]. Nevertheless, there have been a considerable number of an interesting non-linear behaviour on the effect of age as
studies that have replaced the monocentric assumption with the non- the univariate response graph in Fig. 6 also shows.
monocentric or polycentric one, assuming the presence of a group of sub- Fig. 6 shows that when the other characteristics are kept
centres that makes it impossible to observe an inverse relation between the
value and the distance to the main centre. However, we consider the at a constant value that is equal to their mean, the lowest
polycentric assumption to be more plausible in cities which are larger than price is reached for 22-year-old houses. This result is in line
Albacete, which has nearly 1 60 000 inhabitants living in 1234 km2. with the one shown in [6] where the authors found evidence
ARTICLE IN PRESS
740 N. Garca et al. / Neurocomputing 71 (2008) 733742
Fig. 9. Zoom into the area where the property in the example is located.
initial page with a brief description of the most important extract non-linear relationships that cannot be detected by
buttons. more traditional models.
The process can be summarized as follows. The rst step Our work is by no means nished, and in the future we
is to click on the maptools button so as to load the different will explore the possibilities of the GIS and include
layers of the geographical information system (streets, additional inputs in the analysis such as accessibility,
blocks, etc.) and also the necessary tools for map existence of green spaces, distance to educational centres,
navigation. The following step is to click on the Albacete and other social, economic and geographical factors. For
map button to visualize it and to use the mouse and the this, it is imperative that geographical information is
zoom buttons to move through the city in search of a regularly updated so that data relating to the housing
property whose price needs to be estimated. Once the house market is easily available. This is the only way to keep the
has been located, we must click on the point estimation automated valuation system up-to-date, for if it is not, the
button and place the pointer in the exact location. A data system will lose its usefulness.
editor such as the one shown on the left-hand side of Fig. 8
then appears so that the property characteristics may be References
entered. It is worth mentioning that variables such as the
location or the distance to the city centre are shown [1] W. Alonso, Location and Land Use, Harvard University Press,
automatically thanks to the use of the GIS. Cambridge, MA, 1964.
Let us take as an example the estimation of the price of a [2] J. Atack, R.A. Margo, Location, location, location! The price
gradient for vacant urban land: New York, 18351900, J. Real Estate
at located at 10 Cristobal Lozano Street. We imagine that
Financ. Econ. 16 (2) (1998) 151172.
the at is 17 years old, 95 m2, and standard quality, and has [3] M.J. Bailey, R.F. Muth, H.O. Nourse, A regression method for real
three bedrooms, a full bathroom, a toilet, a lift, no balcony, estate price index construction, J. Am. Stat. Assoc. 58 (1963)
a heating system, one parking space, and no storage room. 933942.
The estimated price given by the neural network system, in [4] C.M. Bishop, Neural Networks for Pattern Recognition, Clarendon
the example 127 592h, appears in a box on the location of Press, Oxford, 1995.
[5] R.A. Borst, Articial neural networks: the next modellin/calibration
the property in the map (Fig. 9). technology for the assessment community?, Prop. Tax J. 10 (1) (1991)
Finally, a more complete output can be ordered by 6994.
clicking on the Make a report button. In this report [6] A.Q. Do, G. Grudnitski, A neural network analysis of the effect of
(Fig. 10), we can nd the main data for the example and the age on housing values, J. Real Estate Res. 8 (2) (1993) 253264.
respective estimated price. [7] N. Dunse, C. Jones, A hedonic price model of ofce rents, J. Prop.
Val. Invest. 16 (3) (1998) 297312.
[8] M. Gamez, J.M. Montero, N. Garca, Kriging methodology for
regional economic analisis: estimating the housing price in Albacete,
5. Conclusions Int. Adv. Econ. Res. 6 (3) (2000) 438450.
[9] N. Garca, Diseno de Redes Neuronales Articiales para el Mercado
Inmobiliario. Aplicacion a la ciudad de Albacete, Ph.D. Thesis,
In this work, we have presented the construction of an
Department of Business and Economics Sciences, University of
automated valuation system through the combination of Castilla-La Mancha, 2004.
an articial neural model and the geographical information [10] S. Haykin, Neural Nerworks. A Comprehensive Foundation,
system. The combination of ANN and GIS has proved to Prentice Hall, Englewood Cliffs, NJ, 1994.
be a very powerful and useful tool for the task of real estate [11] R.A. Jacobs, Increased rates of convergente through learning rate
valuation and these results could surely be extended to any adaptation, Neural Networks 1 (4) (1988) 295307.
[12] T. Kohonen, Self-Organizing Maps, second ed., Springer, Berlin,
problem dealing with spatial data. Heidelberg, 1997.
With regard to the particular results of our empirical [13] B. Martn del Bro, A. Sanz, A. Redes Neuronales y Sistemas
work, the objective has been reached satisfactorily, and the Borrosos, RA-MA, 1997.
MLP model has performed better than SOFM in the [14] W. McCluskey, R. Borst, An evaluation of MRA, comparable sale
analyisis, and ANNs for the mass appraisal of residential properties
quality level assignation problem. Comparing the perfor-
in North Ireland, Assess. J. 4 (1) (1997) 4755.
mance of MLP and RBF models for property price [15] H.M.K. Mok, P.P.K. Chan, Y.S. Cho, A hedonic price model for
estimation, the best results were achieved by the MLP private properties in Hong-Kong, J. Real Estate Financ. Econ. 10
estimating the total price. This network yielded an R2 of (1995) 3748.
0.92 and a relative mean error of 5.65%. Maybe the reason [16] N. Nguyen, A. Cripps, Predicting housing value: a comparison of
for these results is the size of the available sample (591 multiple regression analysis and articial neural networks, J. Real
Estate Res. 22 (3) (2001) 314326.
records) that is small for RBF network requirements. [17] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge
The sensitivity analysis showed that one of the most University Press, Cambridge, 1999.
important variables was the distance to the central business [18] S. Rosen, Hedonic prices and implicit markets: product differentia-
district (in this work, Plaza Gabriel Lodares-Gablod) with tion in pure competition, J. Polit. Econom. 82 (1) (1974) 3455.
a negative slope according to the monocentric assumption. [19] E.M. Sabella, Determining the relationship between the propertys
age and its market value, Assesors J. 9 (1974) 8185.
Another important result was the non-linear behaviour on [20] D.P.H. Tay, D.K.K. Ho, Articial intelligence and the mass
the effect of age in the housing price. In this sense, it appraisal of residential apartments, J. Prop. Val. Invest. 10 (1991/
is worth mentioning the capability of neural models to 1992) 524540.
ARTICLE IN PRESS
742 N. Garca et al. / Neurocomputing 71 (2008) 733742
[21] J. Thurston, GIS & Articial Neural Networks: Does your GIS Esteban Alfaro teaches Statistics at the Faculty of
Think?, GISVision Magazine, 2002. Economic and Business Sciences in the University
[22] E. Worzala, M. Lenk, A. Silva, An exploration of neural networks of Castilla-La Mancha. He completed his degree
and its application to real estate valuation, J. Real Estate Res. 10 (2) in Business in 1999 and got his Ph.D. in
(1995) 185201. Economics in 2005, both in the University of
[23] C.Y. Yiu, C.S. Tam, A review of recent empirical studies on property Castilla-La Mancha. His thesis dealt with the
price gradients, J. Real Estate Lit. 12 (3) (2004) 307322. application of ensemble classiers to corporate
failure prediction. Current research deals with
Noelia Garca teaches Statistics at the Faculty of spatial statistics and the combination of classi-
Economic and Business Sciences in the University ers (decision trees and neural nets) for solving
of Castilla-La Mancha. She got her degree in heated topics in the Economics.
Economics at the University of Madrid (UAM)
in 1996 and completed her Ph.D. in Economics in
2004 on the construction of an intelligent and
automated system for property valuation through
the combination of neural nets and a geographic
information system (GIS). Current research deals
with spatial statistics and the combination of
classiers (decision trees and neural nets) for solving heated topics in the
Economics.