Tuesday, 31 July 2007

Advanced Open Source Collage

Using Esper, Quartz, and Ganglia with Synapse

Here is the scenario. You have a bunch of machines that you want to monitor, and you want to kick off some process when the average CPU utilization stays at more than 90% for 4 hours - you are worried the machine might fry.

I recently built a similar scenario using 4 key open source components:
  • Apache Synapse which is the ESB or glue that holds it together
  • Ganglia which is the hardware monitoring toolkit
  • Esper which can monitor the events and look for patterns
  • Quartz which is a job scheduler

I've tried to capture this in a picture

So how does it work?
Lets start with GMond. Gmond is a neat little process from Ganglia that runs and captures information. It can send it via XDR/Multicast or XML/TCP. Rather than create a new transport for Synapse to accept incoming Ganglia XML, it seemed easier to me to poll for it. Basically if you open up a TCP socket to gmond it will pipe you the state of the machine as an XML statement.

This is where Quartz comes in. We've just added Quartz support to Synapse (we've been meaning to for ages). Quartz lets you run a job every n milliseconds, or based on a calendar, or all sorts of other settings.

The Quartz plugin for Synapse is really simple. You just have to drop the right JARs into the Synapse lib directory, and then add your Quartz XML config into the right place in the synapse.xml:
<startup>
<quartz:quartz >
<quartz:job>
<quartz:job-detail>
....
</quartz:quartz>
</startup>

So I wrote a really simple job that opens a socket, grabs the XML and "injects" this into Synapse. It implements the Quartz Job interface (basically execute()).
The code for the GMondPoller is here.

When you inject a message, you need to address it. Everything in Synapse is addressed by a URI, so I simply created a new "virtual" URI to indicate this is Ganglia XML - urn:gmond.

I then configured any message targeted for "urn:gmond" to be picked up by Esper. Last week I wrote an EsperMediator for Synapse, so that was already useable. I did a couple of improvements to that at the same time. The latest version is here.

Using the Esper configuration model, I told Esper about the Ganglia XML format, and I then set up an EQL statement to grab the average CPU USER across a batch of 100 messages. Of course in real life this could have been the 4 hourly sliding average for the scenario above.

Here is the Synapse config for Esper:
<class name="org.fremantle.esper.EsperMediator">
<property name="Configuration">

<esper-configuration xmlns="">

<event-type alias="Ganglia">

<xml-dom root-element-name="GANGLIA_XML">
<xpath-property property-name="cpu_user"
xpath="//GANGLIA_XML/CLUSTER/HOST/METRIC[@NAME='cpu_user']/@VAL" type="number"/>
</xml-dom>
</event-type>
</esper-configuration>
</property>
<property name="statement"
value="select avg(cpu_user) from Ganglia.win:length_batch(100) "/>

<property name="EventToAddress" value="urn:cpu"/>
</class>

After that, there wasn't much else to do. It would have been simple to send the resulting messages off to a JMS queue or other endpoint, but for the demo it was easier just to log them.

You can see the complete synapse.xml configuration here.

This is a great scenario. For me it shows some key SOA benefits:
  • The fact that Ganglia and Esper both support XML meant it was simple to use Synapse to glue them together.
  • It also meant the GMond poller job was really easy to write.
  • Using virtual URIs as routing points within Synapse is a really nice model - its lightweight, simple and easy to understand
  • I also really appreciate how flexible Esper is. It really exemplifies loose-coupling. With minor changes to the Esper statements and config, you could use this to fire an event based on almost any possible condition happening on the machine.
  • I also like Quartz a lot, though I'm tempted to define a slightly simpler config model. The Quartz Java properties is nice, but the XML is a bit heavyweight, and I think the model we use in Synapse where we inject properties is a bit easier.
Anyway, I hope you enjoy the combination, and my thanks to Dave C. for suggesting it.

Wednesday, 25 July 2007

Event Stream Processing in Synapse

There's a lot of buzz around Esper at the moment. It seems pretty cool to me. I hear that BEA has dual-licensed it and is shipping it as part of (all of?) their new Event Server, so I took a quick look at making it worth together with Synapse.

So what is Esper and how can it work with Synapse?
Basically, you Esper will look for patterns in a set of messages. When it spots a pattern, it creates an event. Because Esper supports XML and XPath through DOM, it fits perfectly with Synapse. Without writing any code you can configure it to look at SOAP or XML messages as they flow through Synapse. When a query "hits" it sends a new message into Synapse. That message can then be sent out or logged. The message can either be an exact copy of one of the messages that came into Synapse or it can be a set of tag/values extracted by Esper.

Esper uses a query language just like SQL except called EQL. Because there aren't any tables, instead you define the root element of the XML and give that a name. You can also pass in a schema. Now you can do selects on that just like it was a table.

It took me less than a day to get it working. In fact, the first message worked pretty soon. It took me longer to get XPath working. I suspect there is a bug in Esper and I've raised a JIRA. I wasted a bit of time on that. I also had to learn Esper, but that turned out pretty simple. I certainly haven't got all the features working, and I only tried a really simple example.

Basically what I did was create a Synapse class mediator. You can add it to your synapse.xml like this:

<class name="org.fremantle.esper.EsperMediator">

<property name="Configuration"
value="c:/synapse-1.0/repository/conf/esper.conf.xml"/>
<property name="statement"
value="select symbol from StockQuoteEvent"/>
<property name="EventToAddress"
value="http://localhost:9999/soap/EventListener"/>
</class>

When an event hits that matches the statement, the event will be re-injected into Synapse with the To address as specified. You can then either mediate it or have a rule to send it out. If you specify "select *" then the whole message will be sent on.

If you specify parts using an XPath, then I simply create a little XML that captures those:

e.g.
<event xmlns="http://fremantle.org">
<entry key="symbol">IBM</entry>
</event>


The Esper configuration that we pointed to above needs to define the XML objects that you are going to see. That allows you to query on them.

<esper-configuration>
<event-type alias="StockQuoteEvent">
<xml-dom root-element-name="getQuote"
default-namespace="http://services.samples/xsd"/>
</event-type>
</esper-configuration>

The code is a bit of a hack at the moment. Ideally we will make a couple of changes to Synapse to make it cleaner. These were already in discussion, but this has pushed them up my priority list.
I'd also like to create a XMLMediatorFactory so this can be plugged in a little more neatly - for example having the Esper config inlined in Synapse.xml. And I'd like to make it more robust and support things like patterns and some of the more advanced Esper features.

Despite that, I think this is enough to get going and play with it. I'm really pleased. The combination of the Synapse performance with Esper's flexibility is a cool combination.

Monday, 23 July 2007

WSAS 2.0 Released!

WSO2 WSAS 2.0 is released!

Highlights include Data Services (exposing DBs as Services), EJB Services, Axis1 services (deploying Axis1 services), full support for WS-Trust, WS-SecureConversation, XKMS, and Eclipse IDE integration.

You can download it here!

Friday, 13 July 2007

Setting the cat amongst the pigeons

UPDATE: we have posted updated performance data. See this blog entry.

Ok.... so we're doing it again. We have done some benchmarking, and once again we've come ahead of the competition. This time its ESB performance.

A few weeks ago, we published some data on how the WSO2 ESB compares to a proprietary ESB. Because of what are known as Freedom of Use restrictions, we can't name that ESB - their license prohibits publishing any benchmark data.

But of course, with Open Source ESBs there aren't any such restrictions, so we have now benchmarked against Mule and Apache ServiceMix as well.

We chose three scenarios, all XML based, where we did the following things:
  • Virtualization (routing)
  • Content-Based Routing with an XPath
  • XSLT-based transformation
The results were very interesting. Firstly, we had some significant issues with getting Mule to do these scenarios. When we did, the results were disappointing - for Mule. We came around 4x faster. Now, I'm hoping the Mule guys will get their act together and provide a decent contest, because we couldn't get Mule to do HTTP KeepAlive without failing, and also they keep sending unnecessary HEAD requests. You can read more about the Mule comparison on Asankha's blog.

ServiceMix did better. On one test - XSLT - they beat us. Of course we didn't take it sitting down - we did a fix to Apache Synapse and WSO2 ESB that gave us a 2x improvement on XSLT and now we can do 1800tps including transformation.

We also found a significant ServiceMix bug that means you couldn't use it in a real production environment with HTTP - if the number of incoming connections exceeds the thread pool then it crashes and you have to restart it. Would make for a nice DoS attack I guess. This is one area we've worked really hard on and we've tested our non-blocking transport with more than 2000 concurrent connections.

Anyway, enjoy. I'm expecting some fireworks!!