30 October 2007

Using Watir and Hudson to Monitor Performance

Currently, I have a full suite of Watir regression tests that I use routinely during the development of our web app. For those who may not be familiar with it, Watir is a Ruby library which allows programmatic control of Internet Explorer to navigate through an application under test (AUT) and determine if the application is performing properly. I'll have more to say about Watir another time.

One of the challenges with our application is performance. I wanted to have a way of determining which changes to the app, network, hardware, etc. were having a positive or negative impact on performance. By wrapping some timing functions around the most timing-sensitive issues in the regression library, I created a performance benchmark.

The Timing Wrapper

The simplest timing wrapper could be as simple as this:

start = Time.now
# Do your operations here
elapsed = Time.now - start

I started this way then decided I wanted a little more capability, so I came up with a hierarchical timer, to easily time activities within larger activities. A likeness of the library I use is available here.

Watir

Watir is a very nice, open-source Ruby library that allows driving an IE session from a Ruby program and then scraping the pages to determine if a proper result was generated. I started with Selenium, but ran into problems with https and a few other things that Watir was able to handle. The one area where Selenium really beat Watir was in the quality of the recorder. But pretty quickly you learn to write the scripts by hand, with a little help from the IE Developer Toolbar. Watir does have a recorder, but I did not find it very useful.

A simple test might look something like this:

def list_page_to_page
# Try some list features
[['User 1', 'List 1', 'Company 1 page 1 -> page 2'],
['User 2', 'List 2', 'Company 2 page 1 -> page 2'],
['User 3', 'List 3', 'Company 3 page 1 -> page 2'],
].each do |user_name, list, label|
login(user_name, $password, @test_site)
@ie.link(:text, list).click

# Now try paging...
time(label) { @ie.link(:url, /page=2/).click }
logout
end
end


This example logs in as three different users, clicks on a link with particular text in it and then times how long it takes for the page to return when clicking a link that goes to page two of a multi-page list.

Hudson

For a test like this to be valuable, it needs to be run on a regular basis. While cron or Scheduled Tasks can do this, I much prefer Hudson, the continuous integration engine we use for verifying our commits do not break our JUnit tests. Hudson normally fires on actions like Subversion commits, but it can also schedule builds on a regular basis. So I set the performance task to run every hour and it just cranks away, sending me an email if there are any problems. While there are a lot of continuous integration engines available, both open-source and commercial, I am extremely pleased with both the quality of Hudson and the rate of development. I'll be writing a lot more about Hudson as soon as I get a chance. Check it out!



Give Me My Screen Back!

Since Watir actually fires up an instance of IE to do its work, it can be very disconcerting when a test fires off on your desktop machine while you are working on something else. I did have Watir minimize IE as soon as it started, but it would still be stealing focus from me and sometimes things I was trying to type in an editor would end up in some form field in IE and break the test.

I tried using another box, but it turns out it was slow enough that the tests didn't run very well. What finally worked out was using a virtual machine via VMware. Their VMWare Server is free, though you do have to sign up to get activation codes. So far, the amount of spam has been quite reasonable. I have a fairly beefy machine at work (Intel dual core, 2GB RAM), but I don't notice any degradation in performance.

Graphing the Results

While I could export the results to Excel (or better yet, IBM Lotus Symphony), I ended up using the powerful JFreeChart package. It offers an amazing amount of control in creating just the graph you want. I looked for a Ruby graphing package, but everything I found was pathetic compared to JFreeChart. Since I am not much of a Java fan, I wrote the code in JRuby. This also allowed me to pull in information from other sources, including our Apache logs and Zabbix. Here is a sample chart:



This particular graph shows that on a weekend with almost no traffic to the site (pink line), there were still performance hits on both staging (blue line) and production (red line) around 4 AM and 4 PM. These periods also corresponded to increased CPU load on the server (black line).

Summary

By using several open-source tools and a few snippets of code, I am able to record and graph the performance of our system over time and understand the impact changes to our system and operating environment have on our user experience. I recommend adding performance testing to your standard regression test suite. The immediate feedback is very helpful in heading off design decisions with negative performance implications.

19 October 2007

Open Source Load Testing with WebLOAD

Introduction

For some time now I've been playing with Grinder as a load-testing tool. There was much to like about Grinder - it seemed like one of the more advanced open-source load/capacity testers. The scripting language is Jython, which I love as a result of long being a fan of Python. Still, it is clearly underpowered compared to commercial tools. Then, one day when I was visiting Open Source Testing, I ran across WebLOAD and I knew that I was on to something special.

WebLOAD claims to contain over 250 man-years of development. It was converted to open source in April 2007. It was created by Ilan Kinreich, a co-founder of Mercury Interactive. In my opinion, this is a commercial quality application. One area that really shows is the documentation. After trying to piece together how some open-source tools work by reading forum entries, having real documentation is a godsend.

Getting Started

WebLOAD has an excellent Quick Start guide. I highly recommend starting there. The basic steps for running a load test are:
  1. Create an Agenda
  2. Configure a load template
  3. Configure session options
  4. Run the test
  5. Analyze the results
Create an Agenda

Creating agendas are easy. WebLOAD comes with a proxy recorder. You simply start the recorder and execute a user scenario in your browser. The proxy is set up automatically - no manual steps in your browser options. The commands are stored as Javascript. I really appreciate not having to learn yet another proprietary language.



The recorder also has playback and debug capabilities. It is really nice to get your agenda working cleanly before running 1,000 concurrent versions of it!

Configure a Load Template

A load template is when you actually configure the test to be run. What agenda(s) will be used? What statistics will be collected? How many simultaneous users will you simulate? Will they hit the site all at once or ramp up linearly? What host or hosts will provide the virtual clients?

All of this is done from the main WebLOAD console, not the IDE that you used to record the Agenda.

I highly recommend using the wizard that runs through the configuration options to set up the template.

Setting Session Options

Under Tools > Current Session Options, there are a large number of controls you can set, like the maximum time it can take for a page to return before it is considered a failure. In the Quick Start guide, they set a 20-second maximum time and verify 5% of the sessions.

Run the Test

From the Console, you just click the the Start Session icon and you are under way.



I love watching the various stats displaying in real time. Very cool! When running a stress test, monitoring the log view gives a good indication of when things start to break down.

Analyze the Results

WebLOAD has the ability to graph any number of parameters and statistics, including things like CPU and disk utilization on the server(s). It also supports export of data to popular formats such as Excel.

Hurdles

I did have to overcome several problems before WebLOAD would work with the main application I test. The first is that my application uses URL rewriting to track user sessions. So any URL recorded with the IDE will be incorrect at testing time, since each session is assigned a new sessionid which is part of the URL. This is where WebLOAD's professional documentation comes in again. On their forum, a white paper called "Session Management in Performance Testing [How-to]" describes handling URL re-writing along with cookies and hidden-form fields. You've got to love professional documentation!

The second issue I haven't figured out yet is why I was not able to record from our test site. I had to build sessions from our production site, then went in and modified the URLs to reference our test server. I'll let you know what I figure out with this one.

Gotchas

I learned a lesson the hard way. Do not run a stress test on your production system during business hours! (Notice issue number two in "Hurdles.") I managed to run an Agenda that hadn't been converted to the testing server's URLs. Ouch. It brought down the server at one point. Not a good thing to be doing at 10 AM! There is one good thing that came from this - it convinced me that WebLOAD was actually doing something. It's not just a program that draws pretty graphs. ;-)

Summary

Indeed, I am impressed with WebLOAD. It comes closest to an "ideal" load-testing solution than anything I've seen in the open source world. I especially like the professional quality documentation. With most open-source load testers, I give up in frustration before I ever get anything working. With WebLOAD, I had things working in a morning.

Perhaps others who have experience with commercial tools such as HP's LoadRunner will chime in on how it compares to the "big boys".