haruki zaemon

Posts

I write, therefore I am.

Don't panic, just cluster

  1. by Simon Harris
  2. Nov 29, 2003
  3. 3 mins

I encountered a rather (un)amusing situation just recently. A project I was casually checking in on (just because I’m a nosey parker) had a unit test suite take 70 mins to run? “How many tests are there?” I ask. A couple of thousand I thought to myself? What if I told you 200!

Ok, so now that you’ve picked yourself up off the floor. You’re first thought would most likely be similar to mine: Aha! They’re hitting the database in their unit tests. Well how about this, they were hitting the database not just in every test, but setUp() and tearDown() are rebuilding the database by loading a schema and data from XML and re-populating it.

But not to fear, they’re on to it! They’re doing some work to speed up the tests. They’ve figured that if they create some extra database users they can run the tests in parallel!

Reminded me of another project I saw some time ago. The (web) application services around 2000 users concurrently. The application craps out so often that they decided to have a dozen or more servers all regularly saving session state to a database (read SLOOOOOW) so that the users would experience no perceptable loss of service.

So what do these have in common? In both cases, instead of looking at the design of the application. Instead of cleaning up the code and making it more reliable. Instead of reducing dependencies between layers. The answer was to cluster. I mean, why build better software when you can just throw more hardware and software licenses at the problem….KACHING$$$$$.

In the case of the testing, had it not ocurred to anyone not to hit the database at all? How can a unit test hit a database? Wouldn’t that classify, by definition, as an integration test?

As for the web app, it seems to me that had the code base been even remotely stable, the claims made by the application server even remotely accurate, add in a UPS and some RAID and the application would have been more stable than the average users internet connection.

Now don’t get me wrong, like all things there are times when you want to service 1000s of customer concurrently and you may well require lots of hardware. But the overriding factor must be to get the application right first. Then think about clustering. The old saying goes “A chain is only as stong as its weakest link.” If the application is broken in the first place, you will get ever diminishing returns my simply adding more and more servers.

It seems to me that in most cases, it’s rare to see a business application that warrants true (real-time session failover) clustering rather than just a simple server farm. I agree that a cluster and/or a server farm gives you the ability to to take a machine out for servicing, handle hardware failures, and to scale an application to handle more and more concurrent users. But it should not be used as a quick fix for poor software development.

Endulge me as I make a tenuous analogy with my martial arts training. Most martial arts teach that it’s not about strength. It’s about technique. Get your technique right first without resorting to brute force. Then when you can make that work, add in your natural strength and you will seriously kick butt.


Servlets are for integration

  1. by Simon Harris
  2. Nov 24, 2003
  3. 2 mins

It seems to me that most people I work with are fixated on the idea that servlets and JSPs are for presentation, session beans for business logic, message driven beans for integration, JNDI for service lookup, etc. I tend to have a different perspective on this.

JSPs are a templating language where one possible use is the creation of HTML which is likely (though not guranteed) to be used for presentation. But JSPs may also be used for generating XML again possibly for presentation via XSLT or maybe for SOAP or even CSV, etc.

JNDI may well be used for service lookup in most J2EE applications but JNDI is really an interface to heirarchical databases allow us to query LDAP, NIS, DNS, File Systems, etc. Nothing new here for some people but I bet if you did a poll at work you’d find that most J2EE developers didn’t know that!

Message driven beans, well theres no doubting that these are for integration. Just make sure you don’t code any business logic directly into them please! Aha you say, of course not that’s what session beans are for right? Not in my opinion, no.

Session beans and Servlets are also for integration! Now I know thats not what Application Server vendors such as IBM, Sun, BEA, etc. would like you to believe but IMHO that’s all they are.

Session beans provide you with a mechanism for remoting services via RMI, IIOP, Corba, etc. Servlets provide you with a mechanism for remoting services via HTTP (think SOAPServlet) and Message Driven Beans provide you with a mecahnsim for remoting services via JMS albeit an asynchronous-only service.

Again, IMHO, it’s all about abstraction. In my mind, these technologies are all different solutions to the problem of distributing services.

If you code your business logic into POJOs then you have the flexibility to distribe your services to RMI clients via Session Beans, web browsers via servlets, SOAP clients via your choice of session beans, servlets and message driven beans. You can even use JavaMail (POP and SMTP) as transports for SOAP messages. For that matter you can serialize java objects and send them via HTTP if you wanted a simple binary protocol for java clients to communicate with an application server through firewalls.

Now try telling that to your Sun-certified J2EE architect :-)

Best of all you can now fully unit test your code, out of container, which means there’s no excuse for not doing TDD.


Don't mock infrastructure

  1. by Simon Harris
  2. Nov 23, 2003
  3. 3 mins

While there are a few cases where I feel you may need to, for the most part I feel very strongly that mocking infrastructure is a fundamentally flawed approach to testing.

I have been down the path of mocking JNDI, JDBC, JMS, UserTransaction, etc. and whilst at the time I thought this was all very cool, I did wonder what the hell I was doing writing all this stuff.

Eventually I realised that the problem I was having was one of abstraction. Or more properly a lack thereof. I was tackling the wrong problem. Instead of thinking my class needs to lookup JNDI therefore I need to mock out JNDI, I started to think about the problem in general. That is, a mechanism for looking up services.

A perfect example is Using mock naming contexts for testing. While I think the intention is great, I don’t believe it goes far enough.

JNDIs Context has a multitude of methods I’ll never use, not to mention the fact that they all throw NamingException which quite frankly my code has never known how to deal with and usually throws a ImGivingUpException.

What I really need is a ServiceLookup interface with a few very simple methods like, not surprisingly, lookup(Class). Then I can have two implementations, one a MockServiceLookup that I can create inside my JUnit test and return whatever I like and a JndiServiceLookup that actually understands how to go about performing all the nasty JNDI stuff.

Not convinced? Ok well now imagine I have more than one way to obtain a service. Sometimes it’s through JNDI and sometimes I instantiate a local object. If all my code is tied to a JndiLookup what will I do? Instead, now that my code is tied to an interface, I can create an implementation that say first looks in a local configuration and then if that fails, delegates to the JNDI version. Or maybe a SuperServiceLookup that holds a list of other ServiceLookup implementations and just delegates to them as appropriate. Again referencing only the interface and not any concrete implementation.

Now think JMS. Instead of writing to/reading from queues, why not have MessageQueue interface with two simple methods: read(Message) write(Message). then I can have a MockMessageQueue that I instantiate in my test and a JmsMessageQueue that talks to the real thing.

Again, we’ve abstracted the problem. That is, sending/receving messages. Not talking to JMS.

So now, back to the original example, I can see that a MockInitialJndiContext may well be useful but probably only for one thing: testing my JndiServiceLookup. And then I can probably just have a constructor in JndiServiceLookup that accepts an Context and a default constructor that creates an InitialContext. Then use something like Easy Mock to fill in the rest for me.

When it comes down to it, InitialContext is really a convenience for calling InitialContextFactory anyway which in turn creates a Context which is after all an interface. So why be constrained by someone elses API?

My rule of thumb is that in general, I want to be mocking out my own interfaces not someone elses. I’ll usually have many places where I need to mock out a given interface of my own and at most one place where I need to mock out someone elses.

Using Factory and AbstractFactory, etc. for creation will then allow you to use Strategy, Decorator, etc. for different behaviour ie. Mock vs JNDI, JMS, etc. Martin Fowlers Patterns of Enterprise Application Architecture also has some great ways for doing this for enterprise applications as well.

Using testability to drive my thinking forces me to abstract problems in a way I hadn’t previously and good design emerges. This is why I’m such a huge TDD bigot. Because as Dan North point out, TDD is not about testing.


HTTPUnit is NOT!

  1. by Simon Harris
  2. Nov 23, 2003
  3. 1 min

I hope that I’m not the only one that sees HTTPUnit as a misnomer. Yes it may well easily plug into the JUnit framework but by definition, any tests involving HTTP and Servlets/JSPs are not unit tests. They are more likely functional tests, at best integration tests.

As my good friend James Ross points out, HTTPUnit is really a rich http client library. Infact I did some work for a client of mine some months ago to screen scrape web pages and after much searching, I decided that JWebUnit which is built on HTTPUnit was the best choice. The only draw back is it has some very trivial dependencies on junit classes.

I would have been happier if HTTPUnit had been called HTTPTest. For that matter HTTPClient as it doesn’t have any real dependencies on JUnit save a few convenience classes for in-servlet testing

But I digress. Why does this all bug me so much? Because HTTPUnit is named so, it’s easy for developers to convince their managers that they are writing unit tests when they’re not.


Abstract classes are not types

  1. by Simon Harris
  2. Nov 1, 2003
  3. 3 mins

I have a rule of thumb that says, in most cases, any class X that is abstract MUST be named AbstractX. Abstract static factories are one instance I can think of where I intentionally break this rule. But the checkstyle check I wrote to enforce this understands that abstract classes are allowed to end in the word Factory.

The next part of the rule says that no variables, fields, parameters, etc. may be of a type who’s name begins Abstract.

Now that I’ve adhered to this rule, I find I’m passing AbstractShapes and AbstractVehicles around which doesn’t really seem to make sense to me.

Why? Because IMHO abstract classes are not types. Abstract classes in Java are a convenience of the language that, for one thing, facilitate code sharing through inheritence. But inheritence is by no means the only, nor the best, way to share common code. We can use delegation (eg composition), decoration (see me in my new AOP colours), etc.

So what do I do instead? In C++ I would have used a pure-virtual class. In Java I use interfaces.

Have you ever seen code that declared a variable of type java.util.AbstractList? No? Why not? It’s there, along with HashMap and TreeSet, etc. Because AbstractList is not a type. List is. AbstractList provides a convenient base from which to implement custom Lists. But I still have the option of implementing my own from scratch if I so choose. Maybe because I need some behaviour for my list such as lazy loading or dynamic expansion, etc. that wouldn’t be satisfied by the default implementation.

Classes are a combination of data and behaviour. Common practice is to hide data in such a way that access to the data is encapsulated within methods. This makes good sense for a number of reasons including the fact that maybe the particular piece of data is actually a calculated field and not a stored value. Just because I choose to implement the age of a Person as a stored value, shouldn’t mean that every Mammal must be implement in this way. What if I wanted to calculate the age of a Dog based on date of birth and the notion that 1 dog year is approximately 7 human years. That’s easy you say, make Mammal an abstract class and the getAge() method abstract. Well yes that would work but now lets extend that reasoning out to other attributes and behaviour of Mammals and soon we end up with a pure-abstract class. In other words an interface.

Attemping to use abstract classes as types also makes it difficult to implement decorators (yes I know we have AOP frameworks), especially if we accept the idea that methods not intended to be overidden should be declared final!

All of this leads inexorably (gotta love the wakowski brothers for bringing this word back into the english language) to the conclusion that abstract classes are not types. They are someones idea of the way in which a convenient base class implementation might behave.

IMHO interfaces are good. TDD almost always leads to the definition of more interfaces. Use them and leave abstract classes to reduce duplication of common code among related imnplementations of a given type.


Novel use of Hibernate?

  1. by Simon Harris
  2. Jul 16, 2003
  3. 1 min

I’ve been working on and off on a project that uses Hibernate persitent classes in Rhino Javascripts.

For various reasons, the database tables (and therefore the persistent classes) aren’t known until runtime at which point the meta-data is read from a database.

I used the ASM byte-code library to dynamically generate classes at runtime, map them using Hibernate and expose them with some "fancy" wrappers in Rhino. So anytime they update the database schema, all they need to do is either restart the server or send it a "re-load signal".

All in all a very simple solution that was very easy to get up and running.

Interestingly, the fact that hibernate doesn’t require code generation was my saving grace. I’m not suggesting thate code generation is bad but in this particular case it ouldn’t have worked as well, IMHO.

I’m now working on doing some file-import using the same “framework”. Basically a legacy system spits out EDI-ish data with record types, etc. that needs to be imported. The database meta-data contains a mapping between record types and layout to tables and columns espectively.Again, this will no doubt involve some dynamic class generation to perform the import? But maybe not. I’ll start with some tests and see what happens ;-) Gotta love TDD.


FIRST POST!

  1. by Simon Harris
  2. Jul 11, 2003
  3. 1 min

Ok…So this was automatically generated! I didn’t have anything to say then for my first post and I still don’t :-)


Enumerating results from an asynchronous network call in Objective-C

  1. by Simon Harris
  2. Jan 1, 0001
  3. 2 mins

I have a facade over an asynchronous network call—that also performs pagination, ie multiple network calls in order to handle very large result sets—along the lines of this but I now want to use it in a context where knowing when it’s finished iterating is important (e.g in an NSOperation):

[myClient enumerateFoosAtURL:URL usingBlock:^(id foo, BOOL *stop) {
  ...
} failure:^(NSError *error) {
  ...
}];

The simplest thing might be to add an extra block but that is so sucky I nearly vomited in my mouth just typing out the example below:

[myClient enumerateFoosAtURL:URL usingBlock:^(id foo, BOOL *stop) {
  ...
} success:^{
  // we're done
} failure:^(NSError *error) {
  ...
}];

A less sucky option might be to indicate if there are more to come:

[myClient enumerateFoosAtURL:URL usingBlock:^(id foo, BOOL more, BOOL *stop) {
  ...
  if (!more) {
    // we're done
  }
} failure:^(NSError *error) {
  ...
}];

Yet another option that I think I like the most might be to simply transform the semantics of the original into this:

[myClient enumerateFoosAtURL:URL usingBlock:^(id foo, BOOL *stop) {
  ...
} finished:^(NSError *error) {
  if (error) {
    // an error occurred
  } else {
    // we're done
    ...
  }
}];

Update: Tony Wallace suggested that he preferred the “less sucky option because it makes it more obvious that the method returns paginated results.” And I have to say I agree with him on that. The only fly in the oitment however, turns out to be some filtering performed within the API methods which meant that determining if there were more results would have required some fancy look-ahead code. In the end I went with my last option. It seems to have worked out rather painlessly :)


Generate a site44.com redirects file for your Octopress blog

  1. by Simon Harris
  2. Jan 1, 0001
  3. 2 mins

Over the weekend I converted my, albeit languishing blog from straight Jekyll on GitHub Pages to use Octopress served via Site44.

In the process I also took the opportunity to switch the URLs I was using for blog entries to the Octopress default. This in turn left me needing a bunch of redirects.

Site44 supports redirects via a well-known text file in the root of your website. The text file provides a mapping between source and destination paths. That’s all very well and good but I didn’t much feel like creating 300+ redirect mappings by hand. Besides the tedious nature of the task, the chances I was going to screw one or more of them up in the process were fairly high.

Thankfully, due to a previous blog move, I happened to have a bunch of permalink definitions in the front-matter of most of my blog entries. All I really needed then was a way to turn those into said text file.

A quick Google turned up the Alias Generator plugin for Octopress by Thomas Mango which was very close to, but not quite what I needed.

Hack hack hack on the plugin, a quick rename of all the permalink attributes in my posts to alias, and voila! a new plugin that generates a redirects.site44.txt with all my redirects:

# Site 44 Redirects Text Generator for Posts.
#
# Generates a www.site44.com compatible redirects file pages for posts with aliases set in the YAML Front Matter.
#
# Place the full path of the alias (place to redirect from) inside the
# destination post's YAML Front Matter. One or more aliases may be given.
#
# Example Post Configuration:
#
#   ---
#     #     title: "How I Keep Limited Pressing Running"
#     alias: /post/6301645915/how-i-keep-limited-pressing-running/index.html
#   ---
#
# Example Post Configuration:
#
#   ---
#     #     title: "How I Keep Limited Pressing Running"
#     alias: [/first-alias/index.html, /second-alias/index.html]
#   ---
#
# Author: Simon Harris
# Site: http://harukizaemon.com

module Jekyll
  REDIRECTS_SITE44_TXT_FILE_NAME = "redirects.site44.txt"

  class RedirectsSite44TxtFile < StaticFile
    def write(dest)
      begin
        super(dest)
      rescue
      end

      true
    end
  end

  class RedirectsSite44TxtGenerator < Generator
    def generate(site)
      unless File.exists?(site.dest)
        FileUtils.mkdir_p(site.dest)
      end

      File.open(File.join(site.dest, REDIRECTS_SITE44_TXT_FILE_NAME), "w") do |file|
        process_posts(site, file)
        process_pages(site, file)
      end

      site.static_files << Jekyll::RedirectsSite44TxtFile.new(site, site.dest, "/", REDIRECTS_SITE44_TXT_FILE_NAME)
    end

    def process_posts(site, file)
      site.posts.each do |post|
        generate_aliases(file, post.url, post.data['alias'])
      end
    end

    def process_pages(site, file)
      site.pages.each do |page|
        generate_aliases(file, page.destination('').gsub(/index\.(html|htm)$/, ''), page.data['alias'])
      end
    end

    def generate_aliases(file, destination_path, aliases)
      Array(aliases).compact.each do |alias_path|
        file.puts("#{alias_path} #{destination_path}")
      end
    end
  end
end

Site44.com deploy task for Octopress

  1. by Simon Harris
  2. Jan 1, 0001
  3. 1 min

Further to my previous post, I also added a rake task to my Rakefile that uses rsync to deploy:

rsync_delete   = false
deploy_default = "local"

# snip

desc "Deploy website via local rsync"
task :local do
  exclude = ""
  if File.exists?('./rsync-exclude')
    exclude = "--exclude-from '#{File.expand_path('./rsync-exclude')}'"
  end
  puts "## Deploying website via local rsync"
  ok_failed system("rsync -avz #{exclude} #{"--delete" unless rsync_delete == false} #{public_dir}/ #{deploy_dir}")
end

I initally thought it would be pretty neat to work out the local website location in Dropbox but in the end I decided it was simpler to just leave the deploy_dir variable set to the default "_deploy" and have that directory symlinked to the approproprate Dropbox folder.