After analyzing the data for a while and considering many variables, I realized the
burdensome task analyzing is. I have been committed in writing insights and
managerial relevance’s for articles that you provide to me while inculcating the
knowledge that you provide. The actual time that the task takes and the decision
regarding selecting the particular feature to come up with a particular acceptable
model struck me today when I had to think like a data scientist.
Introduction
Hotel Pricing is a complex phenomenon involving myriad of characteristics of be
factored into account when conducting the analysis and to determine the correct
price to be set for a particular room. For example, one would be ludicrous in
setting the price of a hotel room located in a non metro, non tourism city for a
price at which even those who fulfill the above mentioned criteria glower.
The objective of the analysis is so to crunch the data for the given 42 cities, of
which some are metro and some non metro, thus covering cities fulfilling many
such criteria, to yield a model where if provided a new city and a hotel with a
given city of features, we are able to more or less predict the price for a particular
room depending upon these characteristics. Embroiling and engulfing those
characteristics leaving irrelevant at the margin is the key here
Some hotels have more than one type of double occupancy room.
For simplicity, we picked the cheapest room with double
occupancy.
External Factors
Many external factors can potentially influence the RoomRent. The dataset captures some of
these external factors, as explained below.
Internal Factors
Many Hotel Features can influence the RoomRent. The dataset captures some of these internal
factors, as explained below.
I first drew a correlation diagram to get the basic idea in how to frame the
variables and their relation in fluctuating room prices over varying metrics. I won’t
discuss in detail the technicality of the approach but I would like to keep the
reader in the thick of the developing situation which can be done by visuals, often
effective.
Here are the correlation diagrams which I split in two phases so that one can have
a better understanding of the variables
With this you may be able to relate many of the variables but there was this
correlation with room rent that held me flabbergasted as some of the variables
showed no correlation whatsoever despite the logical relevance in the relation.
What I acquired from the analysis was that there are some variables which we
logically relate to room rent but are instead related to occupancy. We tend to
think along the lines , higher the occupancy , cheaper the hotel which may not be
true as it circumscribes certain conditions for that catastrophic conclusion to be
drawn. The variables which related to occupancy and not the room price were as
follows:
Airport
Free Wifi service
Weekends incidence
Has Breakfast
I concluded the following logical table of relation which in first glance would let
you acknowledge in advance the facsimile of model to be developed later in this
report.
Case 1)
Consider the variables Metro and Tourist Attraction. I can really surmise these
variables by considering the Tourist Attraction variable to determine the room
price with the condiment of interaction by the metro variable as a minor
influencing metric.
Case 2)
Consider the variables New Year Eve and Tourist Attraction. Again following the
same line of thought I would characterize New Year eve as an influencing metric.
It can be logically related as hotel prices seem to escalate and shoot up when
travelling around new year for the sheer demand even for petty rooms on the
account of slew of travelers.
Thus my linear model included such interaction variables in addition with other
variables. Here is a glimpse of my model and its summary characteristics:
> summary(model7)
Call:
lm(formula = RoomRent ~ HasSwimmingPool + HotelCapacity + Population +
(IsMetroCity:IsTouristDestination) + IsTouristDestination +
StarRating + HasSwimmingPool:IsNewYearEve +
IsNewYearEve:IsTouristDestination,
data = pricingtrain)
Residuals:
Min 1Q Median 3Q Max
-13653 -2356 -651 1062 309334
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.345e+03 4.428e+02 -18.845 < 2e-16 ***
HasSwimmingPool 1.797e+03 1.961e+02 9.163 < 2e-16 ***
HotelCapacity -1.113e+01 1.252e+00 -8.893 < 2e-16 ***
Population -6.864e-05 3.256e-05 -2.108 0.0350 *
IsTouristDestination 2.307e+03 2.033e+02 11.351 < 2e-16 ***
StarRating 3.699e+03 1.333e+02 27.749 < 2e-16 ***
IsMetroCity:IsTouristDestination -1.447e+03 3.474e+02 -4.165 3.14e-05 ***
HasSwimmingPool:IsNewYearEve 1.768e+03 4.471e+02 3.954 7.74e-05 ***
IsTouristDestination:IsNewYearEve 7.727e+02 3.213e+02 2.405 0.0162 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Conclusion:
I would like to state the following statement to help hotel manager follow the
statement and rate the room as the most expensive. Clustering according to a
statement which I yielded from a model is my conclusion. It is as follows
A hotel which has a swimming pool, considerable capacity, a high satiating rating
for comfort, is located in a town which has amenities of a cosmopolitan city but is
mainly a tourist attraction should classify themselves as wheat from the chaffe. If
the day of soliciting rooms happens to be on a new year, Hotel Company might
end up turning a downslide in revenue, if there was one.