* Well, almost 🙂
We always read articles (mostly from vendors of libraries or database servers) showing lots of graphs claiming to be best and fastest. But benchmarks are just that, benchmarks. A test in a clinical environment.
Inspired, amongst other things by, this post by our systems have been stress tested from very early on continuously, but then again, if we can serve 1500 thumbnails per second from the cheapest virtual windows server possible to rent, who cares? Much more important – none of our valued users do.
I went looking for a webservice/REST recorder for browsers as i wanted to start adding such testruns for our product First Lane Document Handling Services. I found and tried a couple and found that not one of them would work out of the box. Furthermore not one vendor found a generic way of running sequences over several pages that would also be able to stress the server. This seems to be because it it difficult to judge when the browser has finished executing and rendering for each request. There are a lot of different ways to produce client side code for browsers.
Building a tool
Rolling up ones sleeves and starting to hack in a complete environment is often faster and more cost efficient than buying a tool, learning it’s scripting and settings and bugs and getting it to do what you want.
In order to put stress on the server i wanted to simulate a “true”, or at least “truer”, session and that has to include the most important and heavy operations and in the correct “sequence”. I wanted a tool that could increase the load and be modified to stress test new functionality too.
I ended up with a Windows Client Application that spawns a thread for each parallel sequence of server accesses, i’ll call these channels from here and on.
This is how the application looks. One important note about the tool, from now on called the Stress Hound, is that there are no timers whatsoever. Everything that happens is managed by threads waiting for and/or signalling each other. Thus the capacity of the processor running the Stress Hound will be utilized in full. This is made possible in a cost-efficient timeframe thanks to the RTC library.
Channel Request Sequence Script
The sequences that should be executed, what return values should be used in other requests is handled by a simple script where each server request has the following parameters:
Method, Filename, Payload, Response Handling
Response handling is hard-coded at the moment because writing beautiful code that no one will use is stupid. The handling mainly extracts a couple of values from the servers response, and stores the ones i deem necessary for further processing in the channel.
Script entries that use values retrieved from a response will have to wait for the corresponding request(s) to finish and that produces a kind of ordering of the requests. So to start each channel the tool will (yet again) create a thread (or get one from a pool) for each of the requests that does not need any precondition values.
So it is important that the requests are fired of in parallel as happens in real-life usage and as much as possible.
A common precondition value could be the session-id if the server produces such. As soon as a response comes in the channel object is updated with the particular information retrieved and any request(s) that could not yet be issued yet will.
Some options to enjoy
I have some options that i can change before each run (aside from the sequence script):
* The number of channels * Server Address, Port and whether to use SSL or not * User Id or User Name * Password * If User Id should be increased for each Channel * If the tool should wait between starting channels * If searches (a specific script item) should be done with words or no words
The last one is admittedly very specific to the application being stressed.
The stressed ones
The stressed hardware
The workstation runs Windows 7 x64 on an Intel(R) Xeon(R) CPU E3-1245 v3 @ 3,40 GHz. It has 4 cores and will run 8 threads. The database resides on a physically spinning hard drive (!).
The stressing hardware
The laptop runs Windows 8 x64 in on an Intel(R) Core(TM) i7 CPU M620 @ 2.67 GHz. It has 2 cores and will run 4 threads.
The stressed software
The application, First Lane Document Services, was set up with a borrowed data containing 8500 documents. This results in 140.000 distinct exact words (105.000 distinct words after applying our unique lossy algorithm by Jens Edlund at the KTH Speech Music and Hearing Center, Stockholm) and 2.200.000 word-to-document references.
In this setup the server consists of three executable services working together running in 32-bit mode. They are currently compiled as 32 bit targets because they are not very memory-intensive.
There’s one Firebird 2.5 RDBMS and one Firebird 3.0 Beta 2 RDBMS, both running 64-bit executables with shared cache. The setup is such that i can switch between these two database versions using different ports. Reading this i also switch client libraries. No tweaking of performance settings has been done.
The Stress Hound will mark each channel that had any error, lost connection, timeouts, HTTP status code not 200 and so on and stop it immediately. It will look like this:
In order to have something to compare to and validate the script we start one channel (using Firebird 2.5).
This shows what we expected, and what our users currently expect. The server says that the search itself tool 593 milliseconds. Time from the start of the search until the “user” was presented with the result was 969 milliseconds (1344 – 375) and the whole run including information that the user gets after the initial result was 1734 milliseconds.
Because the server pool database connections and disconnects them after a period of time, i will do two runs successively. Starting with 6 channels, this is the result of the second run.
This was as expected around a second faster than the first run (not shown), the Stress Hound also needs some processor cycles to get going and create threads.
But, alas (sic), the search time has risen! The average search time with six users clicking the search-button at exactly the same time, on the same network (highly unlikely) rose from 593 milliseconds to an average of 2,9 seconds!
This is of course not acceptable at all. We choose Firebird out of a sleuth of reasons, one very important one was to be able to add functionality without breaking existing code (also performance-wise). Thus the system relies on Firebird ACID compliance and functional flexibility. And, when we started to develop First Lane Document Services we knew that the Firebird Project would come up with a scalable SMP version. In English; Firebird 3.0 can run database connections in separate threads.
So, let’s try it!
Same as above this is the result after the second run using six channels. The average search time has fallen from 2,9 seconds to 993 milliseconds. This is quite logical as the workstation is capable of 8 threads and is not just running Firebird but the servers and my desktop (chrome has already spawned thirty processes), oh and the development VM…
Running some more tests
In the beginning of this post i mentioned that we can put some pauses between the channels, and i also mentioned that we run the tests with random search words.
Below are the results from running 5 sessions using Firebird 2.5 and then using Firebird 3.0.
Because the laptop really can not press this server fully i’ve decided to run multiple tests (below are just 5 runs each) over a longer period of time and with more random search words when i get my hands on a more modern one or when i have the chance to set up multiple laptops with cots for hounds.
Firebird 3.0 shows that it got access to the necessary number of threads when running with only six channels, that’s the dip between the first two numbers. But when running 12 channels some threads must apparently wait for others to finish. The time is the time from the start of the first channel to the end of the channel completed last including any post-processing.
This is the average time the server reported that the searches took. So this is minus network communication to the clients and also excluding login, session handling and the serving of other web resources. The last run is much faster here because the channels start 150 milliseconds apart, but the total time did not decrease much because the wait is included in that time.
An observant reader will react – Why is the second run so much faster than the first try you did after switching to Firebird 3.0? You wrote “993 milliseconds” above and the diagram shows less than 500 milliseconds using 6 channels. That’s because this time i have thrown in some AND:ed search words and that will effectively limit the amount of data that needs to be processed. OR:ed searches are another matter.
I have no idea what people think about serving 12 concurrent users with full text partial search matches spanning around a hundred million characters on between one and three words in a second. It does not immediately sound very impressive to me so i googled around a bit.
Unfortunately i could not find anything that would compare. Most writeups satisfied themselves with the first unordered hit. That i can do much much much faster. Since users can freely sort the result set the searches presented above needs to smell all the data. Furthermore users are presented with the number of hits so the count needs to be calculated too. But this does not have to mean that the system will perform slower. Statistics can help in deciding in what order to filter the complete set of data.
I will continue to take care of my Stress Hound, taking it for walks in the morning and sire it to a full blown beast.
Back to A. Bouches’ post about writing fast multi-threaded applications, he rounds it off with the following quotes:
- Make it right before you make it fast. Make it clear before you make it faster. Keep it right when you make it faster. — Kernighan and Plauger, Elements of Programming Style.
- Premature optimization is the root of all evil. — Donald Knuth, quoting C. A. R. Hoare
- The key to performance is elegance, not battalions of special cases. The terrible temptation to tweak should be resisted. — Jon Bentley and Doug McIlroy
- The rules boil down to: “1. Don’t optimize early. 2. Don’t optimize until you know that it’s needed. 3. Even then, don’t optimize until you know what’s needed, and where.” — Herb Sutter
I especially like the last one!
Because really good news for our customers is that:
it’s only NOW that the performance tuning will start for real!