Sunday 29 November 2015

Notes about Web Performance

Render-tree construction, layout, and paint
  • The DOM and CSSOM trees are combined to form the render tree.
  • Render tree contains only the nodes required to render the page.
  • Layout computes the exact position and size of each object.
  • Paint is the last step that takes in the final render tree and renders the pixels to the screen.
--> https://developers.google.com/web/fundamentals/performance/critical-rendering-path/render-tree-construction?hl=en

 
Render blocking CSS
  • CSS is treated as a render blocking resource, which means that the browser will hold rendering of any processed content until the CSSOM is constructed
--> https://developers.google.com/web/fundamentals/performance/critical-rendering-path/render-blocking-css?hl=en


DOM blocking scripts
  • Executing an synchronous inline/external script blocks DOM construction.
  • The script is executed at the exact point where it is inserted in the document. When the HTML parser encounters a script tag, it pauses its process of constructing the DOM and yields control over to the JavaScript engine; once the JavaScript engine has finished running, the browser then picks up from where it left off and resumes the DOM construction.
  • JavaScript execution blocks on CSSOM: The browser will delay script execution until it has finished downloading and constructing the CSSOM and during this time the DOM construction is also blocked.
  • This is because JavaScript can query and modify the DOM and CSSOM
--> https://developers.google.com/web/fundamentals/performance/critical-rendering-path/adding-interactivity-with-javascript?hl=en


Analyzing critical rendering path performance

--> https://developers.google.com/web/fundamentals/performance/critical-rendering-path/analyzing-crp?hl=en



Navigation Timing API

  • domInteractive - The moment just after the browser finished parsing the document including scripts inserted in "traditional" blocking way i.e. without defer or async attribute.
  • domContentLoaded - The time just before DOMContentLoaded event is fired, which is just after browser has finished downloading and parsing all the scripts that had defer set and no async attribute.
  • domComplete - The point when all resources (e.g. images) required by the page have been downloaded and processed - this is the point when the loading spinner can stop spinning in the browser
--> http://kaaes.github.io/timing/info.html

"The document is marked as “interactive” when the user agent stops parsing the document. Meaning, the DOM tree is ready."

"The user agent fires the DOMContentLoaded (DCL) event once any scripts marked with "defer" have been executed, and there are no stylesheets that are blocking scripts. Meaning, the CSSOM is ready"

"If you add a script and tag it with “defer”, then you unblock the construction of the DOM: the document interactive state does not have to wait for execution of JavaScript. However, note that this same script will be executed before DCL is fired."

"DCL does not have to wait for execution of async scripts"

"The DCL event is also a critical milestone. Many popular libraries, such as JQuery, will begin executing their code once it fires."

--> http://calendar.perfplanet.com/2012/deciphering-the-critical-rendering-path/


Notes about DomInteractive

Fonts: Chrome and Firefox use a three second timeout when waiting for fonts; if the font doesn’t arrive within three seconds then a default font is used. IE11 displays the critical content immediately using a default font. In Chrome, Firefox and IE11, the content is re-rendered when the font file finishes downloading.

--> http://www.stevesouders.com/blog/2015/08/07/dominteractive-is-it-really/


Document.readyState

--> https://developer.mozilla.org/en-US/docs/Web/API/Document/readyState


Custom Metrics

--> https://speedcurve.com/blog/user-timing-and-custom-metrics/


Optimizing the Critical Rendering Path

--> https://www.youtube.com/watch?v=YV1nKLWoARQ#t=704


Speed Index

--> https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-index

Further Reads

--> https://www.igvita.com/2014/05/20/script-injected-async-scripts-considered-harmful/https://www.igvita.com/2014/05/20/script-injected-async-scripts-considered-harmful/

Thursday 20 August 2015

Downloading an entire website using wget

wget --recursive --no-clobber --page-requisites --html-extension --convert-links --no-parent [url]

See: http://www.linuxjournal.com/content/downloading-entire-web-site-wget

Friday 16 January 2015

TCP overview and Wireshark example

Here is a summary of some TCP concepts described in the book "High Performance Browser Networking":

TCP provides an effective abstraction of a reliable network running over an unreliable channel:
  • retransmission of data
  • in-order delivery
  • congestion control & avoidance
  • data integegrity
Every TCP connection starts with a three-way handshake ("SYN" -> "SYN ACK" -> "ACK"): Any new connection will have a full roundtrip of latency before any application data can be transferred.

TCP Fast Open: Allows data transfer within the SYN packet (Linux 3.7+ kernels)

Receive window (rwnd): Each side of the TCP connection has its own rwnd which communicates the size of the available buffer space to hold incomming data. Each ACK packet carries the latest rwnd value for each side.

TCP window scaling: Allocated 16 bits place an upper limit of 65KB on the rwnd but "window scaling" raises the max rwnd to 1GB.

  > Check if window scaling is enabled: sysctl net.ipv4.tcp_window_scaling
  > Enable window scaling: sysctl -w net.ipv4.tcp_window_scaling=1

Congestion window (cwnd): Sender-side limit on the amount of data the sender can have in flight before receiving an ACK from the client. cwnd is not exchanged between sender and receiver. It is a private variable of the sender.

The max amount of data in flight is the min of the receive window and the congestion window.

Slow Start: Avoid to overwhelm the underlining network. The cwnd size starts with 4 or 10 (specified April 2013; Linux 2.6.39 kernel) network segments (1,460 bytes when the Max. Transmission Unit is 1500 bytes). For every received ACK the sender can increment its cwnd by one segment. No matter the available bandwidth every TCP connectoin must go thru the slow start phase.

Upon packet loss the cwnd is adjusted to avoid overwhelming the network.

Slow Start Restart (SSR): Resets the cwnd of a connection after it has been idle for a defined period of time. Should be disabled on a server.

  > Check SSR: sysctl net.ipv4.tcp_slow_start_after_idle
  > Disable SSR: sysctl -w net.ipv4.tcp_slow_start_after_idle=0

Head of line blocking
: If one packet is lost the all subsequent  packets must be held in the receivers TCP buffer until the lost packet is retransmitted and arrives.


Wireshark Example:
---------------------------------

The file "sfchronicle.pcap" is a capture of the network traffic done with Wireshark of one HTTP get-request (http://www.sfchronicle.com/) between Munich/Germany and San Francisco/USA:

If opened in Wireshark one can see that:
  • The latency/round-trip-time is about 210ms between the "SYN" and the "SYN ACK" package
  • The first receive window size specified by the client was 29312 bytes (229 multiplied with the specified window scaling factor of 128)
  • In later "ACK"s the receive window size was dynamically increased by the client
  • The server has an initial congestions window of 10. After sending 10 packets the server had to wait for an "ACK" before being able to send more packets - this introduces another 200ms latency.
  • Then the server increased its cwnd with every received "ACK" and could send 20 packets in the next burst before it had to wait again for "ACK"s - another 185ms latency
  • In the next burst it could send about 40 packets and then had to wait once more for 165ms. With the following burst it finally managed to transmit all the data
The overall transaction took 1 second with most of the time being latency.

I used "Captcp" to create a nice TCP throughput diagramm from the pcap file.



Further Reads

--> http://calendar.perfplanet.com/2015/tcp-download-breakpoints/

Thursday 15 January 2015

Solr "queryResultCache": queryResultWindowSize vs queryResultMaxDocsCached

I did some tests to find out the difference between "queryResultWindowSize" and "queryResultMaxDocsCached"

Example config for the scenarios:

  <queryResultWindowSize>4</queryResultWindowSize>   
  <queryResultMaxDocsCached>16</queryResultMaxDocsCached>

Note: In all the Szenarios I always use the same query but between the scenarios I restarted Solr to flush the cache

-------------------------------------------------------------     
Szenario 1:
We have a page size of 2 and go from one page to the next
------------------------------------------------------------- 
Start:  0 Rows: 2 -> Executes Query, returns docs 0-1 and caches docs 0-3
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Executes Query, returns docs 4-5 and caches docs 0-7
                                         (Note: It replaces the existing cache entry for this query
                                          -> There is always only one cache entry for a query)
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 2:
We have a page size of 2 but start with a high "start" parameter
------------------------------------------------------------- 
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start:  0 Rows: 2 -> Retrieves docs 0-1 from cache and returns them
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Retrieves docs 4-5 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 3:
We start with a high "rows" query parameter
------------------------------------------------------------- 
Start:  0 Rows: 8 -> Executes Query, returns docs 0-7 and caches docs 0-7
Start:  0 Rows: 4 -> Retrieves docs 0-3 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them

From this I conclude

1) The Solr "queryResultCache" always caches from the first document of the query result - not from the "start" query-parameter

2) "queryResultWindowSize" setting: In our example the "windows" were documents 0-3, 4-7, 8-11, 12-15, ... The "start" + "rows" query-parameters determine which "window" is used and the "window" in turn determines the end document to be cached (the upper end of the window)

3) "queryResultMaxDocsCached" setting: A threshold indicating if the query result should be cached or not. If the end document to be cached (the upper end of the window) is higher than the "queryResultMaxDocsCached" setting the query result will not be cached.  (In my opinion the name of the parameter is very unfortunate)

Here is a suggestion for when your default page size ("rows" query-parameter) is 10:
  • A "queryResultWindowSize" setting of 20 will load the first two pages into the cache when the "start" query-parameter is 0 (or up to including 10).
  • A "queryResultMaxDocsCached" setting of 40 will also allow to cache the third and the fourth page (when the "start" query-parameter is 20 (or up to including 30)). From the fifth page on the query result will not be cached.

I also read (did not verify):

The Solr "queryResultCache" caches the document ids and optionally the scores (if you ask for the scores). That means that either 4 or 8 bytes per document are cached.

Wednesday 14 January 2015

Solr Cache Autowarming

The Solr index is incrementally updated, i.e. changes are always written to new files. Upon a hard commit a new searcher is created which has a reference to the previous index segments and any new index segments.

The old searcher (pointing to the old index segments) continues handling queries while the new searcher is loaded.

Caches are tied to a specific version of the index therefore new caches have to be autowarmed for the new searcher based upon values of the old cache. You have to pay attention that the warmup time for a searcher is shorter than the time between hard commits. The warmup time can be checked in the Solr Admin's "Plugins/Stats"-page in the "CORE"-section.

New document only become available in a search after the commit plus the time the searcher needs to warmup.

In the Solr Admin's "Plugins/Stats"-page in the "CACHE"-section one can see the "hitratio" and the "warmupTime" for a cache. You want to try for a high hit ratio with a low cache size.