Friday 16 January 2015

TCP overview and Wireshark example

Here is a summary of some TCP concepts described in the book "High Performance Browser Networking":

TCP provides an effective abstraction of a reliable network running over an unreliable channel:
  • retransmission of data
  • in-order delivery
  • congestion control & avoidance
  • data integegrity
Every TCP connection starts with a three-way handshake ("SYN" -> "SYN ACK" -> "ACK"): Any new connection will have a full roundtrip of latency before any application data can be transferred.

TCP Fast Open: Allows data transfer within the SYN packet (Linux 3.7+ kernels)

Receive window (rwnd): Each side of the TCP connection has its own rwnd which communicates the size of the available buffer space to hold incomming data. Each ACK packet carries the latest rwnd value for each side.

TCP window scaling: Allocated 16 bits place an upper limit of 65KB on the rwnd but "window scaling" raises the max rwnd to 1GB.

  > Check if window scaling is enabled: sysctl net.ipv4.tcp_window_scaling
  > Enable window scaling: sysctl -w net.ipv4.tcp_window_scaling=1

Congestion window (cwnd): Sender-side limit on the amount of data the sender can have in flight before receiving an ACK from the client. cwnd is not exchanged between sender and receiver. It is a private variable of the sender.

The max amount of data in flight is the min of the receive window and the congestion window.

Slow Start: Avoid to overwhelm the underlining network. The cwnd size starts with 4 or 10 (specified April 2013; Linux 2.6.39 kernel) network segments (1,460 bytes when the Max. Transmission Unit is 1500 bytes). For every received ACK the sender can increment its cwnd by one segment. No matter the available bandwidth every TCP connectoin must go thru the slow start phase.

Upon packet loss the cwnd is adjusted to avoid overwhelming the network.

Slow Start Restart (SSR): Resets the cwnd of a connection after it has been idle for a defined period of time. Should be disabled on a server.

  > Check SSR: sysctl net.ipv4.tcp_slow_start_after_idle
  > Disable SSR: sysctl -w net.ipv4.tcp_slow_start_after_idle=0

Head of line blocking
: If one packet is lost the all subsequent  packets must be held in the receivers TCP buffer until the lost packet is retransmitted and arrives.


Wireshark Example:
---------------------------------

The file "sfchronicle.pcap" is a capture of the network traffic done with Wireshark of one HTTP get-request (http://www.sfchronicle.com/) between Munich/Germany and San Francisco/USA:

If opened in Wireshark one can see that:
  • The latency/round-trip-time is about 210ms between the "SYN" and the "SYN ACK" package
  • The first receive window size specified by the client was 29312 bytes (229 multiplied with the specified window scaling factor of 128)
  • In later "ACK"s the receive window size was dynamically increased by the client
  • The server has an initial congestions window of 10. After sending 10 packets the server had to wait for an "ACK" before being able to send more packets - this introduces another 200ms latency.
  • Then the server increased its cwnd with every received "ACK" and could send 20 packets in the next burst before it had to wait again for "ACK"s - another 185ms latency
  • In the next burst it could send about 40 packets and then had to wait once more for 165ms. With the following burst it finally managed to transmit all the data
The overall transaction took 1 second with most of the time being latency.

I used "Captcp" to create a nice TCP throughput diagramm from the pcap file.



Further Reads

--> http://calendar.perfplanet.com/2015/tcp-download-breakpoints/

Thursday 15 January 2015

Solr "queryResultCache": queryResultWindowSize vs queryResultMaxDocsCached

I did some tests to find out the difference between "queryResultWindowSize" and "queryResultMaxDocsCached"

Example config for the scenarios:

  <queryResultWindowSize>4</queryResultWindowSize>   
  <queryResultMaxDocsCached>16</queryResultMaxDocsCached>

Note: In all the Szenarios I always use the same query but between the scenarios I restarted Solr to flush the cache

-------------------------------------------------------------     
Szenario 1:
We have a page size of 2 and go from one page to the next
------------------------------------------------------------- 
Start:  0 Rows: 2 -> Executes Query, returns docs 0-1 and caches docs 0-3
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Executes Query, returns docs 4-5 and caches docs 0-7
                                         (Note: It replaces the existing cache entry for this query
                                          -> There is always only one cache entry for a query)
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 2:
We have a page size of 2 but start with a high "start" parameter
------------------------------------------------------------- 
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start:  0 Rows: 2 -> Retrieves docs 0-1 from cache and returns them
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Retrieves docs 4-5 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 3:
We start with a high "rows" query parameter
------------------------------------------------------------- 
Start:  0 Rows: 8 -> Executes Query, returns docs 0-7 and caches docs 0-7
Start:  0 Rows: 4 -> Retrieves docs 0-3 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them

From this I conclude

1) The Solr "queryResultCache" always caches from the first document of the query result - not from the "start" query-parameter

2) "queryResultWindowSize" setting: In our example the "windows" were documents 0-3, 4-7, 8-11, 12-15, ... The "start" + "rows" query-parameters determine which "window" is used and the "window" in turn determines the end document to be cached (the upper end of the window)

3) "queryResultMaxDocsCached" setting: A threshold indicating if the query result should be cached or not. If the end document to be cached (the upper end of the window) is higher than the "queryResultMaxDocsCached" setting the query result will not be cached.  (In my opinion the name of the parameter is very unfortunate)

Here is a suggestion for when your default page size ("rows" query-parameter) is 10:
  • A "queryResultWindowSize" setting of 20 will load the first two pages into the cache when the "start" query-parameter is 0 (or up to including 10).
  • A "queryResultMaxDocsCached" setting of 40 will also allow to cache the third and the fourth page (when the "start" query-parameter is 20 (or up to including 30)). From the fifth page on the query result will not be cached.

I also read (did not verify):

The Solr "queryResultCache" caches the document ids and optionally the scores (if you ask for the scores). That means that either 4 or 8 bytes per document are cached.

Wednesday 14 January 2015

Solr Cache Autowarming

The Solr index is incrementally updated, i.e. changes are always written to new files. Upon a hard commit a new searcher is created which has a reference to the previous index segments and any new index segments.

The old searcher (pointing to the old index segments) continues handling queries while the new searcher is loaded.

Caches are tied to a specific version of the index therefore new caches have to be autowarmed for the new searcher based upon values of the old cache. You have to pay attention that the warmup time for a searcher is shorter than the time between hard commits. The warmup time can be checked in the Solr Admin's "Plugins/Stats"-page in the "CORE"-section.

New document only become available in a search after the commit plus the time the searcher needs to warmup.

In the Solr Admin's "Plugins/Stats"-page in the "CACHE"-section one can see the "hitratio" and the "warmupTime" for a cache. You want to try for a high hit ratio with a low cache size.