28 August 2008

Criteria for Selecting a Technical Debt Calculator

In "Calculating Technical Debt", I proposed a set of criteria for selecting a technical debt calculator. This list was:
  • Plugin architecture
  • Flexible roll-up rules
  • Current perspective
  • Trend perspective
  • Languages supported
  • Environments supported
  • Supports qualitative data
  • Build environments
  • Custom dashboards
  • Aggregates multiple projects
  • Continuous integration servers
  • User community
To this list I added a few more criteria relevant to using the tools:
  • Tool quality (is the quality calculator buggy?)
  • Latest release (is this project alive?)
  • Documentation (can I figure out how to use it?)
In "Characteristics of Technical Debt", I came to the conclusion that the McConnell internal quality characteristics were well suited to calculating technical debt. His criteria belong on the list then:
  • Calculates maintainability
  • Calculates flexibility
  • Calculates portability
  • Calculates reusability
  • Calculates readability
  • Calculates testability
  • Calculates understandability
I make no claim this list is exhaustive, but it covers enough of the issues important to me that I am comfortable moving forward with it.

In making these evaluations, I need some projects to run through the tools to see how they perform. The obvious choice is some open-source projects. These projects should:
  • Be developed in Java
  • Have their development history captured in a version control system
  • Have high development activity
  • Have JUnit tests
  • Some built with Ant, some with Maven 2
A trip around a few open-source repositories turned up a few likely candidates:

Jena - A framework for building Semantic Web applications. I'll focus on the ARQ component. (Ant)

Jena (ARQ module) has eight releases in its Subversion tags folder:
$> svn ls https://jena.svn.sourceforge.net/svn/root/jena/ARQ/tags

ARQ-2.0/
ARQ-2.0-RC/
ARQ-2.0-beta/
ARQ-2.1/
ARQ-2.1-beta/
ARQ-2.2/
ARQ-2.3/
ARQ-2.4/

Tiles - A Web templating framework.  (Maven 2)

Tiles has eight releases in its Subversion tags folder:
$>svn ls http://svn.apache.org/repos/asf/tiles/framework/tags

tiles-2.0.0/
tiles-2.0.1/
tiles-2.0.2/
tiles-2.0.3/
tiles-2.0.4/
tiles-2.0.5/
tiles-2.0.6/
tiles-2.1.0/
This would seem a trivial matter to pick out some projects to use as a basis for our tool evaluation. It turns out this is not so. I won't bore you with the reasons, but this took WAY longer than I expected.

With criteria and test cases in hand, I'll move on to evaluating the options. Sonar seems like a reasonable place to start.

25 August 2008

Characteristcs of Technical Debt

I was planning to launch into an evaluation of the tools identified in "Calculating Technical Debt" (please read that first if you have no idea what I am talking about), but in preparing for the evaluation, I discovered there are large bodies of work addressing the characteristics of interest for determining code quality. There is even an ISO standard (9126) defining one such set of quality characteristics.

I guess it was pretty naive of me to think that quantifying software quality was a new undertaking. There are five models that appear repeatedly in the literature:
  • Boehm
  • McCall
  • FURPS
  • ISO 9126
  • Dromey
Each model has a different focus and thus includes different characteristics to assess quality (see reference page 38). The following table summarizes these top five models:

Reference: Ortega, Maryoly; Pérez, María and Rojas, Teresita. Construction of a Systemic Quality Model for evaluating a Software Product. Software Quality Journal, 11:3, July 2003, pp. 219-242. Kluwer Academic Publishers, 2003

In "Quantifying the cost of expediency", I proposed using McConnell's "internal quality characteristics" from Code Complete. Does this still hold up in light of these large bodies of research? Lets see.

In a perfect world, I'd take the time research all of these models along with the various hybrids that have been devised. Perhaps I'll formulate my own model one day and become rich and famous. My goal for now is not to devise the optimal model, but rather to create a proof-of-concept for calculating the cost of expedient design and development choices.

From a cursory literature search, it appears that ISO 9126 is the most widely used model. I'll use that as the basis for comparison.

The ISO 9126 standard is summarized by the following graph:

(from http://www.cse.dcu.ie)

As you can see from the graph, the ISO model defines six major characteristics of quality:
  • Functionality
  • Reliability
  • Efficiency
  • Usability
  • Portability
  • Maintainability
Each is further decomposed. Comparing ISO 9126 with McConnell, you'll notice that McConnell is missing characteristics like accuracy and efficiency. He actually presented a separate list "external quality characteristics". These refer to customer-facing issues rather than developer-facing issues. Most of the ISO 9126 characteristics missing in his internal list appear there. Intuitively external characteristics are fundamental to understanding technical debt. They are also more difficult to determine automatically. I am going to choose to postpone including them in the technical debt evaluation for now - it is the expedient thing to do. ;-)

The characteristics on McConnell's list that don't appear in ISO 9126 are flexibility and reusability. These characteristics seem relevant to the technical debt of a project. Flexibility and reusability do appear in the McCall model (see table). McConnell provides the following definitions (p. 558):
Flexibility - The extent to which you can modify a system for uses or environments other than those for which it was specifically designed.

Reusability - The extent to which and the ease with which you can use parts of a system in other systems.
After all this, I am back to where I started. That is not where I expected to be. The original title of this entry was something like "An ISO model for technical debt". It was not until I started comparing ISO 9126 with McConnell that I came to the conclusion that McConnell better serves my purpose.

In the next entry I'll return to looking at the available tools for calculating technical debt.

20 August 2008

Calculating Technical Debt

In "Quantifying the cost of expediency", I describe the problem IT management face when they trade off shorter development time with robust code. To recap, my conjecture is that a solution to understanding such trade offs can be reached by sub-dividing the problem into two phases and addressing each individually. The phases are:
  1. Evaluate the code base against a set of quality criteria and track trends
  2. Convert these metrics into an easily digestible form
At the AgileNM meeting today, I learned there is a term for the code degradation that often occurs in a software project: technical debt. We had an excellent discussion of qualitative ways attendees assess this cost. These included:
  • Decreases in team velocity over time
  • Developer time to understand unfamiliar code
  • Querying developers
How exciting to have this group interested in the very topic I've been pondering and writing about! In looking for quantitative measures of technical debt, my research turned up several candidates for calculating the current debt and tracking trends. This list is shamelessly lifted from the Sonar Related Tools page:

Open Source
Commercial
The commercial tools don't appear to have evaluation downloads, so I'll focus on open source tools for now. Perhaps I can get evaluation copies in the future.

The open-source tools all leverage other open-source tools that determine one or more pieces of the technical-debt picture. Checkstyle, PMD and CPD are static analysis tools while JUnit and Cobertura are runtime tools which all contribute to understanding the debt .

The open-source tools are all Java-centric. I'll search for similar .Net tools.

Originally, I had planned to jump into Sonar but, given the alternatives, it seemed prudent to consider what features best support calculating technical debt. I haven't looked at any of them closely yet, so hopefully this list isn't overly biased:
  • Plugin architecture for adding new analysis tools
  • Flexible rules for rolling up results into characteristics
  • Current and trend perspectives
  • Support for multiple languages and environments
  • Able to consider qualitative data in calculating characteristics
  • Support for multiple build tools
  • Flexible dashboard creation to display results
  • Interfaces to continuous integration servers
  • Vigorous user community
In the next installment I'll present my take on how the open source tools stack up against this feature set.

13 August 2008

Quantifying the Cost of Expediency

Software project managers often ask teams to proceed at the greatest possible speed without regard for the long-term consequences to code quality. Lets call such choices "expedient". If expedient choices are made infrequently, the development team can recover by refactoring the tainted code. For many shops though expediency becomes the norm rather than the exception. The downside of expedient choices is not initially apparent. There is an uneasy feeling in our stomach, but nothing tangible to explain why. The upside usually is apparent since it can (with some accounting magic) be measured in dollars. My contention is that to make informed decisions managers need the ability to measure the "expediency cost" in dollars too.

We have some idea of this cost to the industry as a whole. Capers Jones estimated that almost two-thirds of developer time is spent repairing software. In Code Complete, he states:
Projects that aim from the beginning at achieving the shortest possible schedules regardless of quality considerations, tend to have the fairly high frequencies of both schedule and cost overruns. Software projects that aim initially at achieving the highest possible levels of quality and reliability tend to have the best schedule adherence records, the highest productivity, and even the best marketplace success.
So evidence suggests we're heading down the wrong path when we attempt to be expedient, but how do we quantify this cost? What should high quality code look like?

In Code Complete McConnell lays out the following categories for internal quality characteristics (descriptions are paraphrased):
  • Maintainability - Can the software be modified?
  • Flexibility - Can the software be repurposed?
  • Portability - Can the software be ported to new environments?
  • Reusability - Can the software be used in other systems?
  • Readability - Can the source code be read?
  • Testability - Can the software be verified correct?
  • Understandability - Can the software be understood at the system-organizational level?
I propose then the task is two-fold. First, evaluate the code base against these criteria. This includes tracking metric changes over time. Second, convert these metrics into information management can use to make informed trade-offs between quality and expediency. If this information is ultimately mapped to dollars, a real apples to apples comparison of the cost of expediency can be made.

05 July 2008

Practice what I Preach - Code Reviews, TDD and RDoc

The first internal release (0.1) of my ongoing attempt to "practice what I preach" is complete. The program works well, has high coverage and acted as a catalyst for many of the quality tools and practices I've been discussing. Three things stuck out in my mind as I completed the release:

(1) Even on a one-man project, code reviews are beneficial. One night I spent hours in the debugger plus adding puts everywhere to complete the refactoring to accommodate changes. It was one of those late nights when I should have probably just walked away. What I realized when it was finished is that about 10 minutes of inspecting the code would have produced the same results as my hours of debugging. Perhaps all of you are smarter than me and wouldn't have this outcome, but give it a try sometime and see if inspection isn't faster. Or better yet, try an informal code review with a co-worker.

(2) I was shocked at how much completing the RDoc improved my code. When I started to write about something and it was complete hogwash, I could either try and explain why I had written complete hogwash or I could clean it up. It was about the same amount of work, so why not clean up the code? An example was the large increase in private methods. If they are private, I don't need to write docs on them. :-)

(3) I wrote most tests and competed the RDoc after the coding was complete. They came out well, so why not keep doing this? Because for my project, these activities are the point. So of course I will finish them. At my day job, the code is (mostly) the point. So while I took the time to do the "right thing" at home, at work I would've been under extreme pressure to move onto the next task. Even if you don't believe TDD will deliver higher quality code, it is worth doing just so testing (and doc) get done at all.

I am researching how to quantify "code rot". My management can quantify the gains of pushing code out the door half-baked if they can attract new customers as a result. But they don't have the foggiest idea what it has cost them in terms of their major asset - the code base which is essential to the viability of the business. If there were some code rot number(s) which were convertible into dollars, the number of jackass decisions may well decrease. Anyone have ideas on this?

25 June 2008

Test::Unit, Rake and Hudson status

All the Hudson Test::Unit tests were passing. A sea of success. Then I started looking at the result files. All was not good after all. The unit tests were not really passing, even though Hudson said they were:


I spent a long time trying to figure out who was dropping the ball on the status. Was it Test::Unit? Rake? Windows? Hudson? I never found a definitive answer, though it was starting to look like it might be Test::Unit.

I did find this post and gave it a try. Unfortunately, this solution works when there is a non-zero exit status, but this is not the case when running Unit::Test from rake.

The solution turned out to be much simpler than trying to figure out how to correct Rake or Test::Unit. Hudson to the rescue. Hudson supports a plugin architecture. The repository has a plugin named Text-finder Plugin (download it here). It determines the success or failure of a build by looking for a regular expression in the build artifacts or system output. I added a rule which looks for the expression /\d+ tests, \d+ assertions, 0 failures, 0 errors/ in the console output. If the expression is found, the build passes; otherwise it fails. Once installed the configuration panel has this section:


Yet again Hudson solves the problem where other players are dropping the ball. The Hudson status now accurately conveys the result of running the unit tests.

14 June 2008

Ruby - Making a Mockery of Testing

Having gotten rcov working well, I now had to eat the bitter fruit of my low coverage. The main code talks to a news server, so testing would not be so simple as calling a few methods and checking the results. Perhaps I still need to refactor into smaller, more easily testable methods. I'll visit that at another time.

The solution is to "mock" the news server. A mock is code that provides the same interface (or at least enough of it for the tests) as a "real" object or class, yet return expected data each time the test is run. Mocks also can be configured to know how many times a method should be called, in what order they are called, the values that should be passed in and so on. If these conditions are not met, an error is thrown and the test fails.

Ruby has several tools for creating mock objects. I settled on Flex Mock after reading a few blogs. Others were passionate about Mocha - I'll take a look at that another time.

For the purposes of my test, a Ruby newsreaderAPI needs to provide the following methods:

  • Net::NNTP.new(host, port, timeout) - Specify the host, port and timeout values to use when connecting (class method)
  • Net::NNTP.connect() - Connect to the host with parameters set in new.
  • Net::NNTP.xover(groupname, :from=>low, :to=>high) - Retrieve the headers from group groupname in the range low..high
  • Net::NNTP.group(group) - Set the group to fetch articles from
  • Net::NNTP.article(id) - Retrieve the article with the given id

Notice there is one class method to mock. After a little looking, I found the answer in the README file (see section "Mocking Class Objects) for FlexMock. The answer is to have the class method return another flexmock object which is the instance.

nntp_mock = flexmock
... more stuff here to define nntp_mock...
flexmock(Net::NNTP).should_receive(:new).and_return(nntp_mock)

Now when the Net::NNTP.new method is called, it will return our flexmock instance which handles the rest of the test.

Lets look at the flexmock instance object now:

nntp_mock = flexmock
nntp_mock.should_receive(:connect).once.
with_no_args
nntp_mock.should_receive(:group).once.
with(String).and_return(group_mock)
nntp_mock.should_receive(:xover).once.
with(String, Hash).and_return([summary1, summary2])
nntp_mock.should_receive(:article).twice.
with(String).and_return(article1, article2)

One interesting feature is that parameters may be chained together. So for example the last line specified that:

  • It responds to the method call 'article'
  • It should be called exactly twice during the test (fail otherwise)
  • It must receive exactly one parameter which is a string
  • It returns the specified articles

So with just a few lines of code we've written a newsgroup reader application which is sufficient for our test. Its behavior is deterministic and further our test will fail if any of the expectations set for it fail. That is a lot of value for a small effort.

The README file is extensive and gave me enough information to write my mock.

The next time you need to write a test for code which references an external resource, mock it instead. You'll be happy you did.

12 June 2008

Walking the Talk, a few days later

The "Walking the Talk" project is well underway. My progress has been brisk and I'm learning quite a bit along the way. The simplicity of Ruby is a gift, allowing me to understand the concepts involved without getting bogged down by endless details.

Several tools were left out of the quality sandbox sandbox last time. RDoc (source-embedded documentation ala JavaDoc) and Log4r (logfile support ala Log4J) are both useful tools for the agile developer. I'm still learning some of the finer points of each, but both were up and functioning at a basic level in no time. Expect to hear more about these in a future post.

Hudson
Hudson continues to be awesome. I just read on Kohsuke Kawaguchi's Blog that Hudson and his other projects he's been doing in the background are going to become his day job. This is exciting news for Hudson fans! The brisk development pace will likely accelerate further.

TDD

I had an epiphany several months ago in our AgileNM meeting about TDD. A member said that TDD significantly reduced the cyclomatic complexity of the code. Ah, TDD is not about having better tests, it is about writing better code! I'll relate my personal experience with this a little later, but let's just say for now that my personal experience has only increased my enthusiasm! Take a look at this paper from David Jansen of the University of Kansas for more background.

rcov
rcov pretty much worked as expected. There were only two ways that it fell short of the standard set by Cobertura for me. The first was the lack of hit counts for each line. I found this interesting to understand where the hot spots in the code might be even before firing up a profiler.

The other shortcoming of rcov compared to Cobertura is that it does not include cyclomatic complexity analysis. After a little Googling, it appears that Saikuro fills that gap. I will report on it as soon as I get a chance to play with it.

Trac
Trac was the biggest disappointment of the bunch. One area where Ruby leaves Python in the dust is in package management. I had to install so many different packages and take so many manual steps just to get something that sort of worked. Compared to RubyGems projects, this is an absolute joke. I couldn't get either of the plugins I installed to work.

The good news is there will soon be an excellent issue tracker/wiki available. Confluence, the wiki we use at work, already has a personal edition available. With the 4.0 release of JIRA (date not announced yet) there will be a personal edition of JIRA as well. Both are from Atlassian, who are fantastic to work with. Goodbye Trac, it was fun knowing you, but I want to spend more time developing my app and less time screwing around with plugins.

07 June 2008

Walking the Talk

At work, I am the champion of many technologies which further software quality, including:
Recently I started playing around writing a newsgroup reader in Ruby. So I immediately started using all these tools, right? Wrong. I just started hacking away. I was able to establish a proof of concept pretty quickly and learn about some of the gotchas. But after being at this a while, I started feeling lost. I'd forget how something worked and have to rediscover something I'd figured out previously. I'd have ideas for things that I wanted to do, but no place to write them down and think about priorities. Perhaps it was time for me to start walking my talk.

At home I have several constraints I don't have at work. I am writing in Ruby, not Java. I am not willing to pay the money for commercial tools (JIRA, Confluence, YourKit). I think Ant is a silly way to do builds and automated tests. So I started thinking about what my own tool set might look like.

Continuous integration: Hudson. I have written about this before, but it is a fantastic continuous integration tool. Kohsuke Kawaguchi is constantly improving it, so it is just getting better and better.

Version control: Subversion. I see no reason to switch here either. Version control with only one developer is dead-simple anyway, but why bother learning something new when Subversion works so well?

Unit testing: Test::Unit. It ships with Ruby and is the spiritual equivalent to JUnit. ZenTest also looks very interesting - promising to go far beyond the simple XUnit frameworks. I'll take a look at this as well.

Defect tracking: Trac. There are endless open source defect trackers out there. Maybe some are better than Trac. But I like that it is light weight, has an integrated wiki and has bazillions of plugins, including Hudson integration. It is also written in Python, my previous favorite language before Ruby.

Wiki: Trac. See Defect Tracking above.

Coverage Analysis: rcov. To the best of my knowledge, this is the only game in town. The outputs look very similar to Cobertura. I haven't played with this yet, but it is on my short list.

Profiling: ruby-prof. This seems to be the clear winner over Profile. I could not find a single reference comparing the two that didn't favor ruby-prof.

Automated builds/testing: Rake. There are quite a few Ruby-based build tools available. Since I am planning to use Rails in my implentation, Rake seems like the obvious choice.

That is the plan. I'll see how it all works out and report back on my progress. Even if it is a big failure, I will have taken a sip (gulp?) of my own medicine.

08 March 2008

Google calendar plugin for Hudson

For some time now I have been a major fan of Hudson, an open source continuous integration server. I recently gave a presentation to my group on using Hudson not only to run our continuous integration effort, but also to routinely check for anomalies in our database and environment.

One question that my boss posed was how we could monitor results from home, since the server Hudson runs on is not exposed to the outside world. At first I did not have a good answer, but a quick trip to the Hudson plugin wiki. First of all, I was shocked to find there were nearly 50 plugins available. Hudson is clearly catching on. The one that seemed like the best solution for our team is the plugin to publish results to Google Calendar. With some simple setup in Hudson, build results are published to a shared Google Calendar. None of the build artifacts are published, so there is no risk of loosing important resources. At the same time, we can stay current with the health of our system from home.

We could also use email for this, but this seems like a more elegant solution. If you are looking for a way to access your Hudson results from outside your firewall, this just might be your ticket!

19 January 2008

Using JRuby when package names include capitals

I discovered solutions to a few problems I was having using JRuby with our Ruby libraries at work. The first relates to our domain name and "magic" names in JRuby. For example, you can write something like:

org.junit.Assert.assertEquals("Silly example", 4, 4)

This code will run just fine. But now consider my company - samba.biz. From the example above, I expected to be able to write:

biz.samba.Assert.assert_equals("Silly example", 4, 4)

But this gives me "NameError: undefined local variable or method 'biz' for main:Object". It turns out that 'com' and 'org' (and perhaps others?) are "magic" names that support this syntax, but 'biz' is not. After some fooling around, I was able to get the above example to work like this:

include_class 'biz.samba.Assert'
Assert.assert_equals("Silly example", 4, 4)

This also solved another problem I was having at work. For some reason, we have some package names that include capital letters (why?). For example

biz.samba.backend.states.CA.utils.ProcessEpnsByList

JRuby wanted to think that CA was a class name, since it starts with a capital letter. Again, the syntax above solves this issue.

http://blogs.sun.com/coolstuff/entry/using_java_classes_in_jruby
was a big help in helping figure this out.