One of the challenges with our application is performance. I wanted to have a way of determining which changes to the app, network, hardware, etc. were having a positive or negative impact on performance. By wrapping some timing functions around the most timing-sensitive issues in the regression library, I created a performance benchmark.
The Timing Wrapper
The simplest timing wrapper could be as simple as this:
start = Time.now
# Do your operations here
elapsed = Time.now - start
I started this way then decided I wanted a little more capability, so I came up with a hierarchical timer, to easily time activities within larger activities. A likeness of the library I use is available here.
Watir
Watir is a very nice, open-source Ruby library that allows driving an IE session from a Ruby program and then scraping the pages to determine if a proper result was generated. I started with Selenium, but ran into problems with https and a few other things that Watir was able to handle. The one area where Selenium really beat Watir was in the quality of the recorder. But pretty quickly you learn to write the scripts by hand, with a little help from the IE Developer Toolbar. Watir does have a recorder, but I did not find it very useful.
A simple test might look something like this:
def list_page_to_page
# Try some list features
[['User 1', 'List 1', 'Company 1 page 1 -> page 2'],
['User 2', 'List 2', 'Company 2 page 1 -> page 2'],
['User 3', 'List 3', 'Company 3 page 1 -> page 2'],
].each do |user_name, list, label|
login(user_name, $password, @test_site)
@ie.link(:text, list).click
# Now try paging...
time(label) { @ie.link(:url, /page=2/).click }
logout
end
end
This example logs in as three different users, clicks on a link with particular text in it and then times how long it takes for the page to return when clicking a link that goes to page two of a multi-page list.
Hudson
For a test like this to be valuable, it needs to be run on a regular basis. While cron or Scheduled Tasks can do this, I much prefer Hudson, the continuous integration engine we use for verifying our commits do not break our JUnit tests. Hudson normally fires on actions like Subversion commits, but it can also schedule builds on a regular basis. So I set the performance task to run every hour and it just cranks away, sending me an email if there are any problems. While there are a lot of continuous integration engines available, both open-source and commercial, I am extremely pleased with both the quality of Hudson and the rate of development. I'll be writing a lot more about Hudson as soon as I get a chance. Check it out!
Give Me My Screen Back!
Since Watir actually fires up an instance of IE to do its work, it can be very disconcerting when a test fires off on your desktop machine while you are working on something else. I did have Watir minimize IE as soon as it started, but it would still be stealing focus from me and sometimes things I was trying to type in an editor would end up in some form field in IE and break the test.
I tried using another box, but it turns out it was slow enough that the tests didn't run very well. What finally worked out was using a virtual machine via VMware. Their VMWare Server is free, though you do have to sign up to get activation codes. So far, the amount of spam has been quite reasonable. I have a fairly beefy machine at work (Intel dual core, 2GB RAM), but I don't notice any degradation in performance.
Graphing the Results
While I could export the results to Excel (or better yet, IBM Lotus Symphony), I ended up using the powerful JFreeChart package. It offers an amazing amount of control in creating just the graph you want. I looked for a Ruby graphing package, but everything I found was pathetic compared to JFreeChart. Since I am not much of a Java fan, I wrote the code in JRuby. This also allowed me to pull in information from other sources, including our Apache logs and Zabbix. Here is a sample chart:
This particular graph shows that on a weekend with almost no traffic to the site (pink line), there were still performance hits on both staging (blue line) and production (red line) around 4 AM and 4 PM. These periods also corresponded to increased CPU load on the server (black line).
Summary
By using several open-source tools and a few snippets of code, I am able to record and graph the performance of our system over time and understand the impact changes to our system and operating environment have on our user experience. I recommend adding performance testing to your standard regression test suite. The immediate feedback is very helpful in heading off design decisions with negative performance implications.