Anda di halaman 1dari 7

Analysis of Hotel Pricing Data

I am Karan Jain , pursuing my B.Tech in Manufacturing Process and Automation


Engineering (M.P.A.E.)from N.S.I.T, New Delhi. I wish to present my finding for the
MBA Salaries data set that you altruistically provided to me to eke out relevant
insights from.

After analyzing the data for a while and considering many variables, I realized the
burdensome task analyzing is. I have been committed in writing insights and
managerial relevance’s for articles that you provide to me while inculcating the
knowledge that you provide. The actual time that the task takes and the decision
regarding selecting the particular feature to come up with a particular acceptable
model struck me today when I had to think like a data scientist.

Introduction
Hotel Pricing is a complex phenomenon involving myriad of characteristics of be
factored into account when conducting the analysis and to determine the correct
price to be set for a particular room. For example, one would be ludicrous in
setting the price of a hotel room located in a non metro, non tourism city for a
price at which even those who fulfill the above mentioned criteria glower.

The objective of the analysis is so to crunch the data for the given 42 cities, of
which some are metro and some non metro, thus covering cities fulfilling many
such criteria, to yield a model where if provided a new city and a hotel with a
given city of features, we are able to more or less predict the price for a particular
room depending upon these characteristics. Embroiling and engulfing those
characteristics leaving irrelevant at the margin is the key here

Describing the Dataset


The data set provided involves the following variables :
 Dependent Variable
DECISION UNITS MEANING
VARIABLE
RoomRent Rupees Rent for the cheapest room, double occupancy, in Indian Rupees.

Some hotels have more than one type of double occupancy room.
For simplicity, we picked the cheapest room with double
occupancy.

 External Factors
Many external factors can potentially influence the RoomRent. The dataset captures some of
these external factors, as explained below.

VARIABLE UNITS MEANING


Date Text We have hotel room rent data for the following 8 dates for each
hotel:
{Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan
8}
If a hotel is sold out on a given date, assume that the price of
the hotel room on the date it is sold out is the maximum price
from the sample of dates for which prices are available.
IsWeekend Dummy We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates
(Sat / Sun)
IsNewYearEve Dummy ‘1’ for Dec 31, ‘0’ otherwise
CityName Text Name of the City where the Hotel is located e.g. Mumbai`
Population Number Population of the City in 2011 (See Table A1 below)

CityRank Dummy Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1,


so on); (See Table A1)
IsMetroCity Dummy ‘1’ if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, ‘0’
otherwise
IsTouristDestination Dummy We use ‘1’ if the city is primarily a tourist destination, ‘0’
otherwise. For example, Goa and Agra are primarily tourist
destinations. We assume that most people who visit Goa and
Agra and stay in their hotels are in these cities primarily for
tourism.

 Internal Factors
Many Hotel Features can influence the RoomRent. The dataset captures some of these internal
factors, as explained below.

VARIABLE UNITS MEANING


HotelName Text e.g. Park Hyatt Goa Resort and Spa
StarRating Number e.g. 5
Airport km Distance between Hotel and closest major Airport
HotelAddress Text e.g. Arrossim Beach, Cansaulim, Goa
HotelPincode Number 403712
HotelDescription Text e.g. 5-star beachfront resort with spa, near Arossim Beach
FreeWifi Dummy ‘1’ if the hotel offers Free Wifi, ‘0’ otherwise
FreeBreakfast Dummy ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise
HotelCapacity Number e.g. 242. (enter ‘0’ if not available)
HasSwimmingPool Dummy ‘1’ if they have a swimming pool, ‘0’ otherwise

Getting Started in Interpreting results to go


about calculating Hotel Room Prices:
At first , one may find oneself in the dense jungle of unstructured data with this
deluge of information of 42 cities which contain as many as 12 distinct features in
helping you decide the cogent room price failing which all the effort goes down an
erroneous path.

I first drew a correlation diagram to get the basic idea in how to frame the
variables and their relation in fluctuating room prices over varying metrics. I won’t
discuss in detail the technicality of the approach but I would like to keep the
reader in the thick of the developing situation which can be done by visuals, often
effective.

Here are the correlation diagrams which I split in two phases so that one can have
a better understanding of the variables
With this you may be able to relate many of the variables but there was this
correlation with room rent that held me flabbergasted as some of the variables
showed no correlation whatsoever despite the logical relevance in the relation.
What I acquired from the analysis was that there are some variables which we
logically relate to room rent but are instead related to occupancy. We tend to
think along the lines , higher the occupancy , cheaper the hotel which may not be
true as it circumscribes certain conditions for that catastrophic conclusion to be
drawn. The variables which related to occupancy and not the room price were as
follows:

 Airport
 Free Wifi service
 Weekends incidence
 Has Breakfast

Which in layman jargon would translate to the constraints: whether hotel is


nearer to an airport or not, whether hotel provides wireless fidelity service or not,
whether the day hotel sold out a room happens to be a weekend or not and
whether the edifice believes in providing complimentary breakfast or not.

I concluded the following logical table of relation which in first glance would let
you acknowledge in advance the facsimile of model to be developed later in this
report.

S.no Star Capacity Swimming Tourist New Year Metro Population


Rating Destination Incidence
Room +ve +ve +ve +ve +ve -ve -ve
Rent

The aforementioned table clearly indicates the relation of a particular variable


with Room Rent.

Linear Model Formulated:


I now had the burden of using these 7 features aforementioned to determine a
quality equation yielding me the price least variant from the original. But I wanted
to include the interaction between variables to include the nuanced inter relation
and reduce the isolated erroneous relations.

Case 1)

Consider the variables Metro and Tourist Attraction. I can really surmise these
variables by considering the Tourist Attraction variable to determine the room
price with the condiment of interaction by the metro variable as a minor
influencing metric.

Case 2)

Consider the variables New Year Eve and Tourist Attraction. Again following the
same line of thought I would characterize New Year eve as an influencing metric.
It can be logically related as hotel prices seem to escalate and shoot up when
travelling around new year for the sheer demand even for petty rooms on the
account of slew of travelers.

Thus my linear model included such interaction variables in addition with other
variables. Here is a glimpse of my model and its summary characteristics:
> summary(model7)

Call:
lm(formula = RoomRent ~ HasSwimmingPool + HotelCapacity + Population +
(IsMetroCity:IsTouristDestination) + IsTouristDestination +
StarRating + HasSwimmingPool:IsNewYearEve +
IsNewYearEve:IsTouristDestination,
data = pricingtrain)

Residuals:
Min 1Q Median 3Q Max
-13653 -2356 -651 1062 309334

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.345e+03 4.428e+02 -18.845 < 2e-16 ***
HasSwimmingPool 1.797e+03 1.961e+02 9.163 < 2e-16 ***
HotelCapacity -1.113e+01 1.252e+00 -8.893 < 2e-16 ***
Population -6.864e-05 3.256e-05 -2.108 0.0350 *
IsTouristDestination 2.307e+03 2.033e+02 11.351 < 2e-16 ***
StarRating 3.699e+03 1.333e+02 27.749 < 2e-16 ***
IsMetroCity:IsTouristDestination -1.447e+03 3.474e+02 -4.165 3.14e-05 ***
HasSwimmingPool:IsNewYearEve 1.768e+03 4.471e+02 3.954 7.74e-05 ***
IsTouristDestination:IsNewYearEve 7.727e+02 3.213e+02 2.405 0.0162 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6908 on 9915 degrees of freedom


Multiple R-squared: 0.1795, Adjusted R-squared: 0.1788
F-statistic: 271.1 on 8 and 9915 DF, p-value: < 2.2e-16
The variable ‘7’ appended after model clearly indicates the times I failed to yield a
coherent analysis or failed to substitute a variable with a better one, or ended up
adding a redundant variables.

Conclusion:

I would like to state the following statement to help hotel manager follow the
statement and rate the room as the most expensive. Clustering according to a
statement which I yielded from a model is my conclusion. It is as follows

A hotel which has a swimming pool, considerable capacity, a high satiating rating
for comfort, is located in a town which has amenities of a cosmopolitan city but is
mainly a tourist attraction should classify themselves as wheat from the chaffe. If
the day of soliciting rooms happens to be on a new year, Hotel Company might
end up turning a downslide in revenue, if there was one.

Anda mungkin juga menyukai