Anda di halaman 1dari 22

Results from Metrics Research

Metrics definition
Objective
Methodology
Desktop Site Measurements
onLoad
Analysis
Action
Value Distribution
SpeedIndex
Analysis
Action
Value Distribution
Time to First Byte (TTFB)
Analysis
Action
Value Distribution
Total number of Requests
Analysis
Action
Value Distribution
PageSpeed
Analysis
Action
Value Distribution
VisualComplete
Analysis
Action
Value Distribution
Total Bytes
Analysis
Action
Value Distribution
Number of Domains
Analysis
Action
Value Distribution
Section Conclusion
Mobile

onLoad
Analysis
Action
Value Distribution
SpeedIndex
Analysis
Action
Value Distribution
Total number of Requests
Analysis
Action
Value Distribution
VisualComplete
Analysis
Action
Value Distribution
Section Conclusion
Conclusion
Appendix
Further Reading
People to Follow for Performance
R Program code

By: Akshay Ranganath, Enterprise Architect

Metrics definition
Ive used the various metrics as defined in the
WebPageTest
website. Heres a brief
summary of the metrics.
Load Time
The Load Time is measured as the time from the start of the initial navigation until the beginning of
the window load event (onload).
Fully Loaded
The Fully Loaded time is measured as the time from the start of the initial navigation until there was
2 seconds of no network activity after Document Complete. This will usually include any activity
that is triggered by javascript after the main page loads.
First Byte
The First Byte time (often abbreviated as TTFB) is measured as the time from the start of the initial
navigation until the first byte of the base page is received by the browser (after following redirects).
Start Render
The Start Render time is measured as the time from the start of the initial navigation until the first
non-white content is painted to the browser display.
Speed Index
The Speed Index is a calculated metric that represents how quickly the page rendered the
user-visible content (lower is better). More information on how it is calculated is available
here
.
DOM Elements
The DOM Elements metric is the count of the DOM elements on the tested page as measured at
the end of the test.

Objective
The purpose of this research is the help website stakeholders to arrive at a right combination
of metrics that can help them to measure and record performance details. By right
combination, I mean the metrics that can provide value for different perspectives related to
performance.
Another objective is to identify metrics that are relatively rich and independent from other
metrics. The purpose of identifying specific metrics is to optimize and aid in measuring and
recording
performance budget
.
Do note that each business is different and critical metrics will vary by the business
objectives. For example, Twitter has
described
that they define effectiveness by time to first
tweet. WebPageTest allows an ability to define and track
custom metrics
. For more in-depth

look at custom metrics, do watch


this webinar
. The presentation from the webinar is posted
here.
This study is for those who are relatively new to the world of performance budget and are
looking answer to the question: I have limited time and budget for measuring performance.
What are the top 3-4 measurements that will give the most bang for buck?

Methodology
For computing the results, HTTPArchive database was used. From the date, all the non-Null
values were extracted and compared for correlation. On the desktop result set, there were
no null values for the metrics that were used for the study. However, for the mobile site
crawl, the data set is sparse and had null values that varied by metric. The idea here is to
look at patterns and hopefully we can re-visit the study once HttpArchive start to gather
more data for the mobile sites.
Correlation between the two values were computed using 2 measurements:
Pearson Correlation
: The Pearson product-moment correlation coefficient (sometimes
referred to as the PPMCC or PCC or Pearson's r) is a measure of the linear correlation
(dependence) between two variables X and Y, giving a value between +1 and 1
inclusive, where 1 is total positive correlation, 0 is no correlation, and 1 is total
negative correlation.
Spearman Correlation
: Spearman's rank correlation coefficient or Spearman's rho,
named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as
r_s, is a nonparametric measure of statistical dependence between two variables. It
assesses how well the relationship between two variables can be described using a
monotonic function. If there are no repeated data values, a perfect Spearman
correlation of +1 or 1 occurs when each of the variables is a perfect monotone
function of the other.
In my analysis, a correlation of over +/-0.7 is considered as significant correlation and a value
below +/-0.4 is considered a correlation that is not significantly correlated. I have chose the
+/-0.7 and +/-0.4 as thresholds to make the analysis simple. Many of the metrics exhibit the
highest correlation at the level of +/-0.7. Empirically, metrics exhibiting values less than
+/-0.4 are the ones that we tend to consider as independent. For example, the relationship
between onLoad and number of elements in DoM is relatively independent. To differentiate
such metrics, I have chosen the range the cut-off of +/-0.7 and +/-0.4.
If two values are not significantly correlated, it would imply relative independence between
the two variables.

Desktop Site Measurements


The following discussion is based on the HTTPArchive database run for March 15, 2015.

onLoad
Analysis
This is the event that is typically measured by most 3rd party synthetic testing tools. Since it
is widespread, it makes sense to measure this metric.
onLoad is closely correlated to visualComplete and SpeedIndex. There is a decent
correlation between onLoad and total requests indicating that a site slows down as the
number of requests increase. It will be an interesting number to measure during the transition
from HTTP/1.1 to HTTP2. HTTP2 (or H2) aims provided ability to combine responses in
single TCP packet and that could help reduce the total number of round-trips.

Action
Always measure onLoad since it is one of the most widely used metric and can provide
performance comparison across different measuring resources like Synthetic tests, RUM and
WebPageTest.
Do note that this is considered a very old metric and not a representative of users perceived
performance. Over time, reduce the stress on this metric and start to adopt newer metrics
that are closer to the performance that makes sense for your site. (See notes for more
details).

Value Distribution
All values in milliseconds (ms)
Min.

1st Quartile

Median

3rd Quartile

Max.

268

8442

14310

27510

102800

SpeedIndex
Analysis
SpeedIndex is closely correlated to visualComplete, renderStart and onLoad time
respectively. It is loosely correlated to TTFB and pagespeed.

Action
Measure speedindex as it is closely related to rendering of content, especially above the fold
content. Being a number, it is easier to compare and leaves little to subjective interpretation.
The biggest draw back is that this metric is not available across all testing products.
Value Distribution
For extremely performance oriented site, the ideal target is ~ 1000. Refer to
this blog
post by
Lara Hogan where she explains the design and use of a performance budget very well.
Heres the distribution of the SpeedIndex:
All values in are in units and not based on time.
Min

1st Quartile

Median

3rd Quartile

Max

200

3956

6163

11330

104200

Time to First Byte (TTFB)


Analysis
TTFB appears to have no strong correlation with the other metrics. At most, it can impact the
start render time. All other metrics are relatively independent of this value.

Action
If this metric is being collected for a static resource that uses CDN, then it can help measure
the performance of the CDN to some extent. If the page is dynamic then, it will help
determine the health of connection and the time spent on back-end.
Since this is the only metric that can expose the time spent on back-end or the relative
optimality of CDN, it should be a metric that should be part of performance budget.
Value Distribution
All values in milliseconds (ms)
Min.

1st Quartile

Median

3rd Quartile

Max.

67

545

943

1867

60920

Total number of Requests


Analysis
Total number of requests appears to be correlated with non-performance metrics like 3rd
party domains. However, it does show a decent correlation with fullyLoaded, visualComplete
and onLoad.

Action
If other metrics like onLoad is already being measured, then this metric may be of limited
value. This metric would be helpful when a customer appears to be relying on too many 3rd
party tags and we have reason to believe that there is a performance lag being caused by
these 3rd parties. WebPageTest has an option to test front-end SPOF. More information is
availble
here
and
here
.

Value Distribution
All values in units
Min.

1st Quartile

Median

3rd Quartile

Max.

44

75

119

1715

PageSpeed
Analysis
Google
PageSpeed
is supposed to measure the network-independent aspects of page
performance: the server configuration, the HTML structure of a page, and its use of external
resources such as images, JavaScript, and CSS. This is clearly borne out by very low
correlation with other metrics. It is also important to note that the correlation is mostly
negative indicating that a lower value of the time metric corresponds to a higher pagespeed
score.

Action
Google PageSpeed values are relatively independent to other metrics and yet impact the site
structure. Since these are measures that needs to be implemented by the developers of
website, it is a very important metrics and should be part of the performance budget toolkit.
Apart from just a number, PageSpeed can also identify issues in the page design like
blocking javascripts and stylesheets that would be harder to identify with other metrics.
Value Distribution
All values in units between 0-100.
Min.

1st Quartile

Median

3rd Quartile

Max.

71

82

89

100

VisualComplete
Analysis
visualComplete tries to measure the time taken to render above the fold (ATF) content.
VisualComplete is closely related to the performance of fullyLoaded (when everything is
loaded), onLoad and SpeedIndex. This makes sense empirically as well. Unless a page has a
lot of lazy loaded / defered content, visualComplete will be close to fullyLoaded time.

Action
Since the value of this metric relates to SpeedIndex and onLoad, measuring it separately
would be of limited value. However, if you want to compare the performance of a page
before lazy loading and after lazy loading, them use the pair visualComplete and fullyLoaded
to measure the effectiveness of your implementation.
Value Distribution
All values in milliseconds (ms)
Min.

1st Quartile

Median

3rd Quartile

Max.

6700

11900

21400

104200

Total Bytes
Analysis
Total bytes downloaded is not very strongly associated with any metric. However, it does
have a decent correlation with fullyLoaded, VisualComplete and onLoad.
It is interesting to note that total bytes has a non-linear correlation (Spearman correlation) to
fullyLoaded, visualComplete and onLoad. Empirically, it would mean that a 2 unit increase in
total bytes would cause a 1 unit increase in fullyLoaded. It could also mean that a unit
increase in total bytes could cause a 2 unit increase in fullyLoaded.

Action
This metric could help uncover sudden bloat in size, especially due to images or new
Javascript libraries. It would be a good catch-all metric to track, if your performance budget
allows for an extra metric to be used. Scott Jehl has an
excellent article
that talks about the
fact that a heavy page need not mean bad user experience.
Value Distribution
All values in bytes
Min.

1st Quartile

Median

3rd Quartile

Max.

608200

1275000

2005000

36770000

Number of Domains
Analysis
Higher number of domains appear to indicate a heavier website. Similarly, a higher number
of domains also indicates a slightly higher fullyLoaded time. However, the correlation is not
very strong.

Action
Thismetricwouldbehelpfultotrackthenumberofshardsandthirdparties.Generally
speaking,thenumberof3rdpartiesmustbecontrolledthroughastricttestingprocess.
Enforcingapolicyofalwaysasynchronouslyloadingthe3rdpartiesordeferingthemafter
onLoadshouldensuringthatthenumberof3rdpartieshasminimalimpactonperceived
performance.Catchpointhas
anarticle
ontheimpactof3rdpartiesandtheissuewithSPOF
when3rdpartytagsarentoptimallyplaced.

Itwouldbeagoodmetrictotrackforensuringcompliancetolimit3rdparties.However,
measuringthismetricwillnothaveprovideanyusefulinformationontheperceived
performancefortheuser.
Value Distribution
All values in units
Min.

1st Quartile

Median

3rd Quartile

Max.

11

20

395

Section Conclusion
HTTPArchivehasalotofdatacapturedfordesktopwebsites.SpeedIndexclearlyhasalotof
correlationtoperceivedperformancemetricslikepageLoad,startRenderandvisualComplete.
Therearealotofmetricsassociateditthenumberofdomains,numberofrequestsand
numberofDOMelements.However,keepinginmindthatwehavearestrictedbudget,the
recommendationistomeasureSpeedIndex,onLoadandPageSpeedscores.

Ifthereisalotofpushtoaddmore3rdpartymetrics,thenpleasemeasurethenumberof
domainsandtotalnumberofrequests.Theimpactthesemetricshaveoverperformancecan
thenbedocumentedandshowntotherightbusinessowners.Thiswillprovidegood
discussionpointsonrationalizingtheuseof3rdpartiesandusingtheservicesofthosethat
reallymatters.

Mobile
Analyzing the metrics for mobile is a bit hard. HTTPArchive does not collect many measures
like PageSpeed, time to first byte (TTFB) and dom related numbers. The number of sites
crawled too is much lesser (4000+) as compared desktop (400,000+).
Just because the numbers arent available in HTTPArchive does not mean they are not
measurable or unimportant. The missed metrics would definitely be very important for mobile
devices as well.

onLoad
Analysis
onLoad is the granddad of the performance metrics. As Steve Souders mentions in
his blog
,
it is not very effective for a lazy-load, AJAX based, Web 2.0 application. However, it is the
metric that is supported by almost everyone. It is closely related to the fullyLoaded metric
and has a good relationship to SpeedIndex.

Action
As this is a metric that is universally available and reported, it makes sense to continue to
track it and have specific performance budgets for it. However, care should be taken to
define the value for this metric. Spending too much of time optimizing it may cost the end
user experience.

Heres a slide highlighting this issue from one of SpeedCurve+Soasta presentations. ATF
stands for above-the-fold and Amazon, the page is quite usable by 2s whereas onLoad fires
only at 9s. At the other extreme, in case of Gmail, onLoad has fired at 3.9s whereas emails
are visible only after a second.

Value Distribution
Based on the explanation above, onLoad cannot really have a fixed value. The measure will
vary on the website implementation. If the site is relatively static and has very few
lazy-loading or AJAX based features, then it should aim for a low value. If there are a lot of
dynamic content being with clever logic handling below-the-fold content with lazy-loading
and other techniques, this metric can have a higher value.
All values in milliseconds (ms)
Min.

1st Quartile

Median

3rd Quartile

Max.

584

9916

15700

23700

61580

SpeedIndex
Analysis
This metric is closely related to the visual elements like renderStart and visualComplete.
There is a more than linear relationship between this metric and visualComplete and
fullyLoaded.

Action
This is a metric that ties up different visual aspects like loading of above-the-fold content
and delivering an actionable site, it is a metric that should always be part of the metrics
collection set.
Value Distribution
All values in are in units and not based on time.
Min

1st Quartile

Median

3rd Quartile

Max

1000

6210

9220

10650

91860

Total number of Requests


Analysis
Compared to desktop results, total number of requests has a more direct correlation to
metrics like total bytes and visual metrics like fullyLoaded and visualComplete. However, if
onLoad is being measured then this metric may not be too important.
One interesting use case to measure this metric would be during the adoption of H2. Due to
better management of single TCP connection, the number of requests (from a single domain)
is not supposed to have a major impact page performance. However this assertion may not
be entirely hold true for mobile devices. Until better studies are available, tracking this metric
would provide insight for early adopters.

Action
Track this metric during the H2 adoption. Beyond this use case, it may not be a very valuable
metric to focus.
Value Distribution
All values in milliseconds (ms)
Min

1st Quartile

Median

3rd Quartile

Max

1000

6210

9220

10650

91860

VisualComplete
Analysis
visualComplete appears to be closely related to SpeedIndex and onLoad as well.

Action
Since the recommendation is to measure both SpeedIndex and onLoad, this metric by itself
will not add value and can be ignored in the performance budget.
Value Distribution
All values in milliseconds (ms)
Min

1st Quartile

Median

3rd Quartile

Max

9000

15000

23000

97000

Section Conclusion
The crawl data from HTTPArchive for mobile websites is relatively less rich. PageSpeed
scores are not available for the mobile devices but, that does not reduce its importance.
From just the available data, the best metrics to measure are SpeedIndex and onLoad (for
compatibility). Apart from this, number of compressed objects (numCompressed) and
number of domains (numDomains) would be useful to measure since opening connections to
different domains is always expensive for a mobile device.
With the growing importance of mobile devices I am sure the future crawls will improve and
start to have much better reporting. Once this is available, I hope to re-do this part of the
research agin.

Conclusion
Based on the study, the following metrics appear to stand out in terms of richness and an
ability to provide different perspective of data:
SpeedIndex (perceived performance)
onLoad (for backward compatibility)
Google Page Speed (network independent optimization)
TTFB (backend effectiveness, CDN efficiency)
Total domains (3rd party bloat)
Depending on your appetite for data, consider measuring at least the
Do note that each website is different and has a special purpose. The best metric is one that
measures the effectiveness of this critical action. If none of the metric suits your needs, do
consider to develop a custom metric that helps your business.

Appendix
Further Reading
Raw speed score correlation spreadsheet:
https://docs.google.com/a/akamai.com/spreadsheets/d/1yUvYlJmt2DBrmO0DIxO9y
wXEyz_8CmoesWHAYpRQmeM/edit?usp=sharing
WebPageTest definition of metrics:
https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics
General concept of performance budgeting:
https://en.wikipedia.org/wiki/Performance-based_budgeting
Performance budget blog by Tim Kadlec:
http://timkadlec.com/2013/01/setting-a-performance-budget/
Performance budget at Etsy by Lara Callendar Hogan:
https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/
Grunt task for performance budgeting by Tim Kaldec:
https://github.com/tkadlec/grunt-perfbudget
Performance budgeting using the Grunt task explained by Tim Kaldec:
http://timkadlec.com/2014/05/performance-budgeting-with-grunt/
An easy to understand overview of Performance Budget by Catherine Farman:
http://www.sitepoint.com/automate-performance-testing-grunt-js/
Collection of tools to help in performance tuning:
http://perf-tooling.today/tools
Webinar Creating Meaningful Metrics That Get Your Users to do the Things You
Want -
http://www.oreilly.com/pub/e/3390
Lara Hogans blog post on a importance performance budget:
https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/

Chris Coyers summary of Tim Kadlecs performance budget blog:


https://css-tricks.com/fast-fast-enough/
A nice comment from Paul Irish on Performance Budget:
http://timkadlec.com/2014/01/fast-enough/#comment-1200946500
A huge collection of articles, tools and videos related to performance:
http://perf.rocks/
Testing for Front-End SPOF by Patrick Meenan:
http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html
Frontend SPOF by Steve Souders:
http://www.stevesouders.com/blog/2010/06/01/frontend-spof/
Metrics reporting:
Catchpoint:
http://www.catchpoint.com/
Keynote:
http://www.keynote.com/
SpeedCurve:
http://speedcurve.com/
SpeedTest.io Free Dashboard:
http://dashboard.sitespeed.io/

People to Follow for Performance

SteveSouders:@souders
ScottJehl:@scottjehl
TimKadlec:@tkadlec
LaraHogan:@lara_hogan
GuyPodjarny:@guypo
PaulIrish:@paul_irish
IlyaGrigorik:@igrigorik
PerfPlanet:@perfplanet
Hastags:#webperf#permatters

R Program code
Sample R program code to compute the correlation metric
data <- read.csv('op.txt', sep="\t", head=TRUE)
cor(data$R1,data$R2,method='pearson')
cor(data$R1,data$R2,method='spearman')

Sample program to extract the min,max median, etc.

Anda mungkin juga menyukai