A JBoss Project
Red Hat

Blogs About RHQ

RHQ

Changing the Endpoint Address of an RHQ Storage Node
There is very limited support for changing the endpoint address of a storage node. In fact the only way to do so is by undeploying and redeploying the node with the new address. And in some cases, like when there is only a single storage node, this is not even an option. BZ 1103841 was opened to address this, and the changes will go into RHQ 4.13.

Changing the endpoint address of a Cassandra node is a routine maintenance operation. I am referring specifically to the address on which Cassandra uses for gossip. This address is specified by the listen_address property in cassandra.yaml. The key thing when changing the address is to ensure that the node's token assignments do not change. Rob Coli's post on changing a node's address provides a nice summary of the configuration changes involved.

With CASSANDRA-7356 however, things are even easier. Change the value of listen_address and restart Cassandra with the following system properties defined in cassandra-env.sh,

  • -Dcassandra.replace_address=<new_address>
  • -Dcassandra.replace_address_first_boot=true 
The seeds property in cassandra.yaml might need to be updated as well. Note that there is no need to worry about the auto_bootstrap, initial_token, or num_tokens properties.

For the RHQ Storage Node, these system properties will be set in cassandra-jvm.properties. Users will be able to update a node's address either through the storage node admin UI or through the RHQ CLI. One interesting to note is that the RHQ Storage Node resource type uses the node's endpoint address as its resource key. This is not good. When the address changes, the agent will think it has discovered a new Storage Node resource. To prevent this we can add resource upgrade support in the rhq-storage plugin, and change the resource key to use the node's host ID which is a UUID that does not change. The host ID is exposed through the StorageServiceMBean.getLocalHostId JMX attribute.

If you interested in learning more about the work involved with adding support for changing a storage node's endpoint address, check out the wiki design doc that I will be updating over the next several days.
RHQ-Metrics and Grafana (updated)

As you may know I am currently working on RHQ-Metrics, a time series database with some charting extensions that will be embeddable into Angular apps.

One of the goals of the project is also to make the database also available for other consumers that may have other input sources deployed as well as their own graphing.

Over the weekend I looked a bit at Grafana which looks quite nice, so I started to combine RHQ-metrics with Grafana.

I did not (immediately) find a description of the data format between Grafana and Graphite, but saw that InfluxDB is supported, so I fired up WireShark and was able to write an endpoint for RHQ-Metrics that supports listing of metrics and retrieving of data.

The overall setup looks like this:

RHQ metrics grafana setup
Setup used

A Ganglia gmond emits data over IP multicast, which is received with the protocol translator client, ptrans. This translates the data into requests that are pushed into the RHQ-metrics system and there stored in the Cassandra backend.

Grafana then talks to the RHQ-Metrics server and fetches the data to be displayed.
For this to work, I had to comment out the graphiteUrl in conf.js and use the influx backend like this:

 datasources: {
influx: {
default: true,
type: 'influxdb',
username: 'test',
password: 'test',
url: 'http://localhost:8080/rhq-metrics/influx',
}
},

As you can see, the URL points to the RHQ-Metrics server.

The code for the new handler is still "very rough" and thus not yet in the normal source repository. To use it, you can clone my version of the RHQ-Metrics repository and then run start.sh in the root directory to get the RHQ-Metrics server (with the built-in memory backend) running.

Update

I have now added code (still in my repository) to support some of the aggregating functions of Influx like min, max, mean, count and sum. The following shows a chart that uses those functions:

RHQ 4.11 released


I am proud to announce the immediate availability of RHQ 4.11.

As always this release contains new features and bug fixes.

Notable changes are:

  • Plugins can now define plugin specific DynaGroup expressions, which can even be auto-activated
  • Further agent footprint reduction for jmx-clients (especially if the agent runs on jdk 1.6)
  • Much improved agent install via ssh from the RHQ UI.
  • Support for Oracle 12 as backend database
  • New login screen that follows patternfly.org

As always RHQ is available for download in form of a zip archive. If you want to try out RHQ without too much setup, you can also use a pre-created Docker image
from https://index.docker.io/u/gkhachik/rhq-fedora.20/ (the link contains setup information).

Please consult the release notes for further details and a download link.

If you only want to have a quick look, you can also consult a
running RHQ 4.11 instance that we have set up. User/Pass are guest / rhqguest

Special thanks goes to

  • Elias Ross
  • Michael Burman

for their code contributions for this release.

My first Netty-based server
Netty logo
 
RHQ logo

For our rhq-metrics project (and actually for RHQ proper) I wanted to be able to receive syslog messages, parse them and in case of content in the form of


type=metric thread.count=11 thread.active=1 heap.permgen.size=20040000

forward the individual key-value pairs (e.g. thread.count=11) to a rest endpoint for further processing.

After I started with the usual plain Java ServerSocket.. code I realized that this may become cumbersome and that this would be the perfect opportunity to check out Netty. Luckily I already had an Early Access copy of Netty in Action lying around and was ready to start.

Writing the first server that listens on TCP port 5140 to receive messages was easy and modifying /etc/syslog.conf to forward messages to that port as well.

Unfortunately I did not get any message.

I turned out that on OS/X syslog message forwarding is only UDP. So I sat down, converted the code to use UDP and was sort of happy. I was also able to create the backend connection to the REST server, but here I was missing the piece on how to "forward" the incoming data to the rest-client connection.

Luckily my colleague Norman Maurer, one of the Netty gurus helped me a lot and so I got that going.

After this worked, I continued and added a TCP server too, so that it is now possible to receive messages over TCP and UDP.

The (current) final setup looks like this:

Overviw of my Netty app

I need two different decoders here, as for TCP the (payload) data received is directly contained in a ByteBuf while for UDP it is inside a DatagramPacker and needs to be pulled out into a ByteBuf for further decoding.

One larger issue I had during development (aside from wrapping my head around "The Netty Way" of doing things) was that I wrote a Decoder like


public class UdpSyslogEventDecoder extends
MessageToMessageDecoder<DatagramPacket>{

protected void decode(ChannelHandlerContext ctx, DatagramPacket msg, List<Object> out) throws Exception {

and it never got called by Netty. I could see that it was instantiated when the pipeline was set up, but for incoming messages it was just bypassed.

After I added a method


public boolean acceptInboundMessage(Object msg) throws Exception {
return true;
}

my decoder was called, but Netty was throwing ClassCastExceptions, but they were pretty helpful, as it turned out that the IDE automatically imported java.net.DatagramPacket, which is not what Netty wants here. As Netty checks for the signature of the decode() and this did not match, Netty just ignored the decoder.

You can find the source of this server on GitHub.

Aerogear UPS alert plugin for RHQ

 

As already indicated, Matze Wessendorf and I sat together at Red Hat Summit and have created an alert sender plugin for RHQ that uses the unified push server from Aerogear to send alerts as push notifications to administrator phones.

For this to work you not only need to install the alert-sender plugin on RHQ side, but also have an appropriate application installed on your handset.

After you have installed the plugin in RHQ you need to go to Administration->Server Plugins and select the Alert:UPS entry. In the configuration settings you need to provide the host name where the push server lives, the secret to talk to it and also the id of your deployed handset app:

Bildschirmfoto 2014 04 30 um 09 24 53

After this is done, you can select this alert sender when you set up new notifications as with other senders:

Bildschirmfoto 2014 04 30 um 09 26 08

Right now this does not have any alert-specific settings, but I guess that in the future one could perhaps set the alert sound here for iOS devices or similar.

And after all this is done, the app is deployed and your handset is on, you could receive a push message just like this:

Screenshot

Screenshot of our first alert sent

If you want to have a look at the sender plugin, you can go to the RHQ repository on GitHub.

RHQ and JBoss ON at Summit 2014

I was very fortunate to be able to attend this years Red Hat Summit in San Francisco.

Right before Summit, the accompanying DevNation conference already started on Sunday with some pretty interesting talks - among others about Fabric8 and Hawt.io.

As both Summit and DevNation happened at Moscone center, it was pretty easy to mix and match sessions. Dev Nation also featured a "hack area": some tables that had huge power strips in the middle to easily connect laptop and smartphone chargers. And usually the table were not as empty as on the next image:

I also met at one of those tables with Matze Wessendorf from the Aerogears team and we developed together an Alert Sender plugin for RHQ, that uses the Unified Push Sender to directly bring alerts from RHQ to the lock screen of the admins phones. The plugin only exists in private git at the moment. We need to do some polishing and remove some passwords before pushing to rhq git.

Summit presentation
Thomas Segismont and I did a presentation "What is new with JBoss ON" and we also had a BOF session later on.
The presentation ran well with around 60 attendees. Most of the audience already knew JBoss ON.
Of those around 2/3 are running 3.1, a few 3.2 and one person was even on 3.0 (unfortunately I was not able to find him later to find out why)

The slides of the presentation are already available online; unfortunately they do not cover the two live demos from Thomas, showing how to add a new Storage Node and how to make use of the new bundle permissions that were introduced in JBoss ON 3.2 (and also RHQ).

IMG 4188

Thomas at work and me doing a bit of talking.

BOF

The BOF/Meet and Greet session later was also quite well attended.
People had good questions and ideas and we were talking a bit about roadmap and also about some vision wrt. JBoss ON 4. Unfortunately we got kicked out the room much too early.

OCSystems
Thomas and I also had the luck to be invited by OCSystems, the makers of RTI that also runs on top of JBoss ON for a dinner at the Waterfront restaurant, from where we had an excellent view on to the bay bridge:

IMG 4163
Front row: Alan Santos, myself, Thomas Segismont

Middle: Tobias Hartwig, Bill Critch, all Red Hat

Behind: Steve North, Georgia Ferretti, both OCSystems

Hackathon

And last but not least at the Dev Nation Hackathon, Team "RHQ" won the 2nd prize with some "home automation":
a RaspberryPI running the agent had an LED blinking and we had also one of the MBed boards (Those are developer boards a little like Arduino but with an ARM cpu and sensors + an LCD on board)
connected to read in sensor data; unfortunately I lost some time at the start of the hackathon,
so we could not really use that info in the plugin; 1/2h more time and ... :-)
For the 2nd half of this, Thomas has built a Cordova application for his Android phone and showed how to receive push messages that were sent when alerts are fired with the brand new aerogear-ups alert sender plugin mentioned above.

I should have taken a photo of all the wiring we had set up, but totally forgot
about it.

I will re-create / finish this demo and blog about it


Completed Remote Agent Install
My previous blog post talked about work being done on implementing an enhancement request which asked for the ability to remotely install an RHQ Agent. That feature has been finished and checked into the master branch and will be in the next release.

I created a quick 11-minute demo showing the UI (which is slightly differently than what the prototype looked like) and demonstrates the install, start, stop, and uninstall capabilities of this new feature.

I can already think of at least two more enhancements that can be added to this in the future. One would be to support SSH keys rather than passwords (so you don't have to require passwords to make the remote SSH connection) and the other would be to allow the user to upload a custom rhq-agent-env.sh file so that file can be used to override the default agent environment (in other words, it would be used instead of the default rhq-agent-env.sh that comes with the agent distribution).
Running the RHQ-agent on a Raspberry PI [updated]

I finally got a Raspberry Pi too. After the hurdles of initial installation, got it hooked up to my LAN and of course I had to install an RHQ agent on it.

And it turned out that this was dead simple, as the Pi already has Java 1.7 installed (in the Raspbian Wheezy distro that I am using). Thus it was only a matter of laying down the rhq-agent, and starting it the usual way.

Now there was one caveat: the agent did not find any file systems or network interfaces etc. This is due to the fact that there is no native library for Sigar on arm v6 cpus supplied with the agent.

I cloned Sigar from its git repository, changed into the 1.6 branch and built that library myself.

Now after dropping libsigar-arm-linux.so into the agent's lib/ directory, the native library is available and on agent restart all the native stuff could be found.

Screenshot of platform details in RHQ
(Platform details in RHQ)

If you don't want to compile that library yourself, you can take my version from https://sourceforge.net/projects/rhq/files/rhq/misc/.

I will try to get that library into the upcoming RHQ 4.10 release, but can't promise anything.

Update

If you run the agent from current master (or upcoming RHQ 4.10), you can configure a list of plugins to be enabled, so that the agent only uses these plugins (and thus uses less memory and starts faster).
This property can be found in the file conf/agent-configuration.xml:


<entry key="rhq.agent.plugins.enabled" value="Platforms,JMX,RHQAgent"/>

The entries are a comma separated list of plugin (short) names. To determine those, you can run
plugins info at the agent command prompt:


> plugins info
Details of the plugins that are currently installed:

rhq-agent-plugin-4.10.0-SNAPSHOT.jar
Plugin Name: RHQAgent
Display Name: RHQ Agent
Last Updated: 14. Februar 2014 11:03:35 MEZ
File Size: 51.558 bytes
MD5 Hashcode: f7eb7577af667ee4883437230e4b2d8c
[...]

Summary of installed plugins:
[RHQAgent, Platforms, JMX]

The short names are the ones encoded as "Plugin Name" and which are also shown on the summary line. There has actually been a property to disable unwanted plugins for a longer time, but just enabling the ones needed is probably easier.


The other thing you should do it to remove the -Xms setting in the rhq-agent.sh script- the default of a 64MB minimum heap is just too large here.

With those 3 plugins above and the removed Xms setting, my agent has a committed heap of ~14MB and a used heap of ~11MB. A dump is/was 4MB in size.

P.S.: “Raspberry Pi" is a trademark of the Raspberry Pi Foundation

Back from OneDayTalk

[ I should have already written this a bit earlier, but I had some trouble with my left knee and had to go through some surgery (which went well). ]

As in the previous years I have been invited to give a talk at the OneDayConference in Munich. This year it was in a new location in a suburb of Munich called Germering. Getting there was easy for me, as there is a S-Bahn stop almost in front of the conference location.

The new location featured more and larger rooms and especially an area to sit down between talks or during lunch time. As in the last years the conference featured three parallel tracks.

As I said before I like that conference as everything is like a big family event with the organizers and also the presenters which featured many JBoss Colleagues; while I wrote that Ray would be there, Andrew Rubinger replaced him. The only talk that I really attended was the Wildfly one from Harald Pehl, which was full house. In the remaining part of the conference I talked to various attendees and colleagues from Jan Wildeboer to Gavin King and Heiko Braun. Heiko gave me an introduction about his (and Harald's) work to generate UIs from descriptors (which they use in the Wildfly console), which looks very interesting and where I think we could use some of that inside of RHQ to create "wizards" for several types of target resources.

In my talk, which was in the last slot, I had around 30 attendees (which was around 1/3 of the attendees still present). To my surprise I found out that the large majority did not yet know or use RHQ, so I had to switch from my original agenda and gave a brief introduction into RHQ first. Next I talked about the recent changes in RHQ and tried to gather feedback for future use case, but that was of course harder with attendees not knowing too much about RHQ. So much for "know your audience".

How do others try to find out their audience when the only thing they know is "This conference is all about JBoss projects" ?

You can find my slides in AsciiDoc format on GitHub that you can render via AsciiDoctor to html presentation.

Beware of the empty tag

I started playing with AngularJS lately and made some progress with RHQ as backend data source (via the REST api). Now I wanted to write a custom directive and use it like this:


<rhq-resource-picker/>
<div id="graph" style="padding: 10px"></div>

Now the very odd thing (for me) was that in the DOM this looked like this:


<rhq-resource-picker>
<div id="graph" style="padding: 10px"></div>
</rhq-resource-picker>

My custom directive wrapped around the <div> tag.

To fix this I had to turn my tag in a series of opening and closing tags instead of the empty one:


<rhq-resource-picker></rhq-resource-picker>
<div id="graph" style="padding: 10px"></div>

And in fact it turns out that a few more tags like the popular <div> tag show the same property of not allowing to write an empty tag, but requiring a tag pair.

OneDayTalk conference in Munich

I will as in the previous years be at this years JBoss OneDayTalk conference in Munich and talk about recent and future developments in RHQ.

OneDayTalk is a pretty nice little conference organized by the JBoss User Group Munich, that offers three tracks with 6 sessions each, where I usually have the problem that I can't divide myself into three to visit them all. And for 99 Euro you also get some good food and many opportunities to meet myself and other JBossians like Eric Schabell, Gavin King, Emanuel Muckenhuber, Heiko Braun or Ray Ploski. The conference web site has the full listing of speakers as well as the scheduled program.

More clever builds and tests ?

Hey,

when you have already tried to build RHQ and run all the tests you probably have seen that this may take a huge amount of time, which is fine for the first build, but later when you e.g. change a typo in the as4-plugin, you don’t want GWT being compiled again.
Of course as a human developer it is relatively easy to just go into the right folder and only run the build there.

Now on build automation like Jenkins this is less easy, which is why I am writing
this post.

What I want to have is similar to

  • if a class in GWT has changed, only re-build GWT

  • if a class in a plugin has changed, only rebuild and test that plugin
    (and perhaps dependent plugins like hibernate on top of jmx)

  • if a class in enterprise changes, only build and test enterprise

  • if a class in server/jar changes, only rebuild that and run server-itests

  • if a class in core changes, rebuild everything (there may be more fine grained rules as e.g. a change in core/plugin-container does not require compiling GWT again)

This is probably a bit abbreviated, but you get the idea.

What I can imagine is that we compile the whole project (actually we may even do incremental compiles to get build times further down and may also only go into a specific module (and deps) and just build those).
And then instead of running mvn install we run mvn install -DskipTests
and afterwards analyze what has changed, throw the above rules
at it and only run the tests in the respective module(s).

We could perhaps have a little DSL like (which would live in the root of the project tree and be in git)

    rules: {
rule1: {
if :modules/enterprise/,
then: {
compile: modules/enterprise,
skip: modules/enterprise/gui/coregui,
test: ["modules/enterprise"]
}
},
rule2: {
if: modules/plugins/as7-plugin,
then: {
compile:[ modules/plugins/as7-plugin,
modules/integration-tests/as7-plugin],
test: [ modules/plugins/as7-plugin,
modules/integration-tests/as7-plugin]
}
}

And have them evaluated in a clever way by some maven build helper
that parses that file and also the list of changes since the last build to
figure out what needs testing.

We can still run a full build with everything to make sure that we don’t loose coverage
by those abbreviations

There may be build systems like gradle that have this capability built in; I think for maven
this requires some additional tooling

QUESTIONS:

  • Are there any "canned" solutions available?

  • Has anyone already done something like "partial tests" (and how)?

  • Anyone knows of maven plugins that can help here?

  • Anyone interested in getting that going? This is for sure not only interesting for RHQ but for a lot of projects

Having such an infrastructure will also help us in the future to better integrate
external patches, as faster builds / tests can allow to automatically test those and
"pre-approve" them.




Availability Updates in RHQ GUI
In older versions, the RHQ GUI showed you the availability status of resources but if you were viewing the resource in the GUI, it did not update the icons unless you manually refreshed the screen.

In RHQ 4.9, this has changed. If you are currently viewing a resource and its availability status changes (say, it goes down, or it comes back up), the screen will quickly reflect the new availability status by changing the availabilty icon and by changing the tree node icons.

To see what I mean, take a look at this quick 3-minute demo to see the feature in action (view this in full-screen mode if you want to get a better look at the icons and tree node badges):


RHQ 4.9 released

It is a pleasure for me to announce on behalf of the RHQ development team the immediate availability of RHQ 4.9

Features

Some of the new features since RHQ 4.8 are:

Be sure to read the release notes and the installation documents.

Note

For security reasons we have made changes to the installation in the sense that there is no more default password for the rhqadmin super user. Also the default bind address of 0.0.0.0 has been removed. You need to set them before starting the installation.

If you are upgrading from RHQ 4.8 you need to run a script to remove the native components from Cassandrabefore the upgrade. Otherwise RHQ 4.9 will fail to start.

Thanks

Special thanks goes to

  • Elias Ross

  • Jérémie Lagarde

  • Michael Burman

for their code contributions for this release and to Stian Lund for his repeated testing of the new graphs implementation.

Downloads

As usual you can download the release from the RHQ-project downloads on SourceForge




Fine-Grained Security Permissions In Bundle Provisioning
RHQ allows one to bundle up content and provision that bundle to remote machines managed by RHQ Agents. This is what we call the "Bundle" subsystem, the documentation actually titles it the "Provisioning" subsystem. I've blogged about it here and here if you want to read more about it.

RHQ 4.9 has just been released and with it comes a new feature in the Bundle subsystem. RHQ can now allow your admins to give users fine-grained security constraints around the Bundle subsystem.

In the older RHQ versions, it was an all-or-nothing prospect - a user either could do nothing with respect to bundles or could do everything.

Now, users can be granted certain permissions surrounding bundle functionality. For example, a user could be given the permission to create and delete bundles, but that user could be denied permission to deploy those bundles anywhere. A user could be restriced in such a way to allow him to deploy bundles only to a certain group of resources but not others.

Along with the new permissions, RHQ has now introduced the concept of "bundle groups." Now you can organize your bundles into separate groups, while providing security constraints around those bundles so only a select set of users can access, manipulate, and deploy bundles in certain bundle groups.

If you want all the gory details, you can read the wiki documentation on this new security model for bundles.

I put together a quick, 15-minute demo that illustrates this fine-grained security model. It demonstrates the use of the bundle permissions to implement a typical use-case that demarcates workflows to provision different applications to different environments:

Watch the demo to see how this can be done. The demo will illustrate how the user "HR Developer" will only be allowed to create bundles and put them in the "HR Applications" bundle group and the user "HR Deployer" will only be allowed to deploy those "HR Applications" bundles to the "HR Environment" resource group.

Again, read the wiki for more information. The RHQ 4.9 release notes also has information you'll want to read about this.
Upgrading to RHQ 4.9
RHQ 4.8 introduced the new Cassandra backend for metrics. There has been a tremendous amount of work since then focused on the management of the new RHQ Storage Node. We do want to impose on users the burden of managing a second database. One of our key goals is to provide robust management such that Cassandra is nothing more than an implementation detail for users.

The version of Cassandra shipped in RHQ 4.8 includes some native libraries. One of the main uses for those native libraries is compression.  If the platform on which Cassandra is running has support for the native libraries, table compression will be enabled. Data files written to disk will be compressed.

All of the native libraries have been removed from the version of Cassandra shipped in RHQ 4.9. The reason for this change is to ensure RHQ continues to provide solid cross-platform support. The development and testing teams simply do not have the bandwidth right now to maintain native libraries for all of the supported platforms in RHQ and JON.

The following information applies only to RHQ 4.8 installs.

Since RHQ 4.9 does not ship with native those compression libraries, Cassandra will not be able to decompress the data files on disk.

Compression has to be disabled in your RHQ 4.8 installation before upgrading to 4.9. There is a patch which you will need to run prior to upgrading. Download rhq48-storage-patch.zip and follow the instructions provided in rhq48-storage-patch.sh|bat.

I do want to mention that we will likely re-enable compression using a pure Java compression library in a future RHQ release.
Moving from Eclipse to IntelliJ
Well, the second shoe dropped. The final straw was placed on the camel's back and the camel's back broke. I tried one more time and, once again, Eclipse still doesn't have a good Maven integration - at least for such a large project as RHQ.

Now, for some history, I've been using Eclipse for at least a decade. I like it. I know it. I'm comfortable with it. While I can't claim to know how to use everything in it, I can navigate around it pretty good and can pump out some code using it.

However, the Maven integration is just really bad from my experience. I've tried, I really have. In fact, it has been an annual ritual of mine to install the latest Maven plugin and see if it finally "just works" for me. I've done this for at least the last three years if not longer. So it is not without a lack of trying. Every year I keep hearing "try it again, it got better." (I really have heard this over the span of years). But every time I install it and load in the RHQ project, it doesn't "just work". I tried it again a few weeks ago and nothing has changed. What I expect is to import my root Maven module and have Eclipse load it in and let me just go back to doing my work. Alas, it has never worked.

I hate to leave Eclipse because, like I said, I have at least a decade invested in using it. But I need a good Maven integration. I don't want to have tons of Eclipse projects in my workspace - but then again, if the Eclipse Maven plugin needs to create one project per Maven module so it "just works", so be it. I can deal with it (after all, IntelliJ has tons of modules, even if it places them under one main project). But I can't even get that far.

So, after hearing all the IntelliJ fanboys denigrate Eclipse and tell me that I should move to IntelliJ because "it's better", I finally decided to at least try it.

Well, I can at least report that IntelliJ's Maven integration actually does seem to "just work" - but that isn't to say I didn't have to spend 15 minutes or so figuring out some things to get it to work (I had to make sure I imported it properly and I had to make sure to set some options). But spending 15 minutes and getting it to work is by far better than what I've gone through with Eclipse (which is, spending lots more time and never getting it to work over the years). So, yes, I can confirm that the IntelliJ folks are correct that Maven integration "just works" - with that small caveat. It actually is very nice.

In addition, I really like IntelliJ's git integration - it works out of box and has some really nice features.

I also found that IntelliJ provides an Eclipse keymap - so, while I may not like all the keystrokes required to unlock all the features in IntelliJ (more on that below), I do like how I can use many of the Eclipse keystrokes I know and have it work in IntelliJ.

As I was typing up this blog, I was about to rail on IntelliJ about its "auto-save" feature. Reading their Migration FAQ they make it sound like you can't turn off that auto-save feature (where, as soon as you type, it saves the file). I really hate that feature. But, I just found out, to my surprise, you can kinda turn that off. It still maintains the changes though, in what I suppose is a cache of changed files. So if I close the editor with the changed file, and open it back up again, my changes are still there. That's kinda annoying (but yet, I can see this might be useful, too!). But at least it doesn't change the source file. I'll presume there is a way to throw away these cached changes - at least I can do a git revert and that appears to do it.


However, with all that said, as I use IntelliJ (and really, it's only been about week), I'm seeing on the edges of it things that I do not like where Eclipse is better. If you are an IntelliJ user and know how to do the following, feel free to point out my errors. Note: I'm using the community version of  IntelliJ v12.14.

For one thing, where's the Problems View that Eclipse has? I mean, in Eclipse, I have a single view with all the compile errors within my project. I do not see anywhere in IntelliJ a single view that tells me about problems project-wise. Now, I was told that this is because Eclipse has its own compiler and IntelliJ does not. That's an issue for me. I like being able to change some code in a class, and watch the Problems View report all the breakages that that change causes. I see in the Project view, you can limit the scope to problem files. That gets you kinda there - but I want to see it as a list (not a tree) and I want to see the error messages themselves,  not just what files have errors in them.

Second, the Run/Debug Configuration feature doesn't appear to be as nice as Eclipse. For example, I have some tool configurations in Eclipse that, when selected, prompt the user for parameter values, but apparently, IntelliJ doesn't support this. In fact, Eclipse supports lots of parameter replacement variables (${x}) whereas it doesn't look like IntelliJ supports any.

Third, one nice feature in Eclipse is the ability to have the source code for a particular method to popup in a small window when you hover over a method call while holding down, say, the ALT key (this is configurable in  Eclipse). But, I can't see how this is done in IntelliJ. I can see that View->QuickDefinition does what I want, but I just want to hold down, say, ALT or SHIFT and have the quick definition popup where I hover. I have a feeling you can tell IntelliJ to do this, I just don't know how.

Another thing I am missing is an equivalent to Eclipse's "scrapbook" feature. This was something I use(d) all the time. In any scrapbook page, you can add and highlight any Java snippet and execute it. The Console View shows the output of the Java snippet. This is an excellent way to quickly run some small code snippet you want to try out to make sure you go it right (I can't tell you how many times I've used it to test regex's). The only way it appears you can do this in IntelliJ is if you are debugging something and you are at a breakpoint. From there, you can execute random code snippets. But Eclipse has this too (the Display view). I want a way to run a Java snippet right from my editor without setting up a debug session.

I also don't want to see this "TODO" or "JetGradle" or other views that IntelliJ seems to insist I want. You can't remove them from the UI entirely.

Finally, IntelliJ seems to be really keen on keyboard control. I am one of those developers that hates relying on keystrokes to do things. I am using a GUI IDE, I want to use the GUI :-) I like mouse/menu control over keystrokes. I just can't remember all the many different key combinations to do things, plus my fingers can't consistently reach all the F# function keys, but I can usually remember where in the menu structure a feature is. I'm sure as I use IntelliJ more that I'll remember more. And most everything does seem to have a main menu or popup-menu equivalent. So, this is probably just a gripe that I have to spend time on a learning curve to learn a new tool - can't really blame IntelliJ for that (and with the Eclipse keymap, lots of Eclipse keystrokes now map in IntelliJ). I guess I have to blame Eclipse for that since it's forcing me to make this move in the first place.

Some of those are nit-picky, others not. And I'm sure I'll run into more things that either IntelliJ doesn't have or is hiding from me. Maybe as I use IntelliJ more, and my ignorance of it recedes a bit, I'll post another blog entry to indicate my progress.

Custom Deserializer in Jackson and validation

tl;dr: it is important to add input validation to custom json deserializers in Jackson.

In RHQ we make use of Json parsing in a few places - be it directly in the as7/Wildfly plugin, be it in the REST-api indirectly via RESTEasy 2.3.5, that already does the heavy lifting.

Now we have a bean Link that looks like

public class Link {
String rel;
String href;
}

The standard way for serializing this is

{ "rel":"edit", "href":"http://acme.org" }

As we need a different format I have written a custom serializer and attached it on the class.

@JsonSerialize(using = LinkSerializer.class)
@JsonDeserialize(using = LinkDeserializer.class)
@Produces({"application/json","application/xml"})
public class Link {

private String rel;
private String href;

This custom format looks like:

{
"edit": {
"href": "http://acme.org"
}
}

As a client can also send links, some custom deserialization needs to happen. A first cut of the deserializer looked like this and worked well:

public class LinkDeserializer extends JsonDeserializer<Link>{

@Override
public Link deserialize(JsonParser jp,
DeserializationContext ctxt) throws IOException
{
String tmp = jp.getText(); // {
jp.nextToken(); // skip over { to the rel
String rel = jp.getText();
jp.nextToken(); // skip over {
[…]

Link link = new Link(rel,href);

return link;
}

Now what happened the other day was that in some tests I was sending data and our server blew up horribly. Memory usage grew, the garbage collector took huge amounts of cpu time and the call eventually terminated with an OutOfMemoryException.

After some investigation I found that the client did not send the Link object in our special format, but in the original format that I showed first. Further investigation showed that in fact the LinkDeserializer was consuming the tokens from the stream as seen above and then also swallowing subsequent tokens from the input. So when it returned, the whole parser was in a bad shape and was then trying to copy large arrays around until we saw the OOME.

After I got this, I changed the implementation to add validation and to bail out early on invalid input, so that the parser won’t get into bad shape on invalid input:

    public Link deserialize(JsonParser jp,
DeserializationContext ctxt) throws IOException
{

String tmp = jp.getText(); // {
validate(jp, tmp,"{");
jp.nextToken(); // skip over { to the rel
String rel = jp.getText();
validateText(jp, rel);
jp.nextToken(); // skip over {
tmp = jp.getText();
validate(jp, tmp,"{");
[…]

Those validate*() then simply compare the token with the passed expected value and throw an Exception on unexpected input:

    private void validate(JsonParser jsonParser, String input,
String expected) throws JsonProcessingException {
if (!input.equals(expected)) {
throw new JsonParseException("Unexpected token: " + input,
jsonParser.getTokenLocation());
}
}

The validation can perhaps be improved more, but you get the idea.




JavaFX UI for the RHQ plugin generator (updated)

In RHQ we have a tool to generate initial plugin skeletons, the plugin generator. This is a standalone tool, that you can use to get going with plugin development.
This is for example described in the "How to write a plugin" article.

Now my colleague Simeon always wanted a graphical frontend (he is very much in favor of graphical frontends, while I am one of those old Unix-neckbeards :). Inspired from this years Java Forum Stuttgart I sat down on the weekend and played a bit with JavaFX. The result is a first cut of UI frontend for the generator:

UI for the plugin generator

On top you see a message bar, that shows field validation issues, the middle is the part where you enter the settings and at the bottom you get a short description for each property.

I've pushed a first cut, that is not yet complete, but you will get the idea. Help is always appreciated.
Code is in RHQ git under modules/helpers/pluginGen.

[update]

I continued working a bit on the generator and have added a possibility to give a directory that contains classes that have been annotated with the annotations from the org.rhq.helpers:rhq-pluginAnnotations module.

Screenshot scan-for-annotations

A class could look like this


public class FooBean {

@Metric(description = "How often was this bean invoked",
displayType = DisplayType.SUMMARY,
measurementType = MeasurementType.DYNAMIC,
units = Units.SECONDS)
int invocationCount;

@Metric(description = "Just a foo", dataType = DataType.TRAIT)
String lastCommand;

@Operation(description = "Increase the invocation count")
public int increaseCounter() {
invocationCount++;
return invocationCount;
}

@Operation(description = "Decrease the counter")
public void decreaseCounter(@Parameter(description
= "How much to decrease?", name = "by") int by) {
invocationCount -= by;
}

And the generator would then create the respective <metric> and <operation> elements in the plugin descriptor - in this case you don't need to select the two "Has Metrics/Operations" flags above.

Now this work is still not finished. And in fact it would be good to find a common set of annotations for a more broader scope of projects.

New Android tooling with gradle: Order matters
I am trying to convert RHQPocket over to use the new Android build system with Gradle.
The documentation is comprehensive, but as always does not exactly match what I have in front of me.

After moving the sources around I started generating a build.gradle file. I added my external libs to the dependencies but the build did not succeed with a lot of trial and error.

At the end with the help of googling and StackOverflow I found out that order matters a lot in the build file:

First comes the section about plugins for the build system itself:

buildscript {
repositories {
mavenCentral()
}

dependencies {
classpath 'com.android.tools.build:gradle:0.4.2'
}
}

Settings herein only apply to the build system, but not to the project to be built.

After this we can use those plugins (the android one)

apply plugin: 'android'
apply plugin: 'idea'


And only when those plugins are loaded, we can start talking about the project itself and add dependencies

repositories {
mavenCentral()
}
dependencies {
compile 'org.codehaus.jackson:jackson-core-asl:1.9.12'
compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.12'
}

android {
compileSdkVersion 17
buildToolsVersion "17.0"
}


I guess for someone experienced with Gradle, this is no news, but it took me quite some time to figure out especially as the documentation mentions all the parts but not on a larger example.

The file layout now looks like this:

Bildschirmfoto 2013 06 29 um 22 07 02


One caveat is that the external libraries are not automagically yet loaded into the IDE - one still has to manually add them in the project structure wizard.

The last thing is then to mark build/source/r/debug as source directory to also get completion for the R.* entries.
RHQ 4.8 released


RHQ 4.8 has been released with another batch of new features and bug fixes.
Major changes since RHQ 4.7 are
  • New storage backend for metrics based on Cassandra:

    Numerical metric data is now stored in a Cassandra cluster to improve scalability for larger deployments with many metrics.
    Architecture overview cassandra color

    The intention is to move more mass data into Cassandra into the future (events, calltime data etc). We just had to start somewhere. If you are upgrading from an earlier version of RHQ, a utility to migrate metrics gathered is provided.
  • More improvements to the new charts
    Bildschirmfoto 2013 06 25 um 10 57 07

    The selection of the time period has been changed:
    • There is now a button bar at the top to quickly select the timeframe to display
    • If you hover with the mouse between two "bars" you will see a little crosshair, that starts the
      selection mode - drag it over a series of bars to zoom in. Mike Thompson has created a YouTube video that explains this.
  • New installation process:
    With the above change of the backend storage, we have changed the installation process again to reduce the complexity to install RHQ. The new process basically goes:
    • unzip rhq-server-4.8.0.zip
    • cd rhq-server-4.8.0
    • edit bin/rhq-server.properties
    • bin/rhqctl install (--storage-data-root-dir=<some directory>

    For upgrades the process is very similar: you run unzip the new binaries and then run bin/rhqctl upgrade --from-server-dir=<old server>
    Please consult the installation and upgrade guides before starting.
  • REST-api is now (mostly) stable and supports paging on (some) resources. The documentation is also up on SourceForge and the rhq-samples project has a special section on examples for the REST-api.


As always there have been many more smaller improvements and bug fixes - many thanks to everyone who has contributed via bug report, comments, discussions on #rhq and/or via code contributions.

Please check the full release notes for details. They also contain a list of commits.

RHQ is an extensible tool to monitory your infrastructure of machines and applications, alert operators on user defined conditions, configure resources and run operations on them from a central web-based UI. Other ways of communicating with RHQ include a command line interface and a REST-api.

You can download the release from source forge.


As mentioned above, the old installer is gone, so make sure to read
the wiki document describing how to use the new installer.

Maven artifacts are available from the JBoss Nexus repository and should also be available from Central.


Please give us feedback, be it in Bugzilla, Mailing lists or the forum. Or just join us on IRC at irc.freenode.net/#rhq.

And as I have been asked recently: yes we are happy to accept code contributions -- be it for the RHQ core, as well as plugins or for the samples project. Also if you e.g. have written a plugin, share a pointer to it, which we can then share on the wiki etc.

Support for paging in the RHQ REST-api (updated)
I have just pushed some changes to RHQ master that implement paging for (most) of the REST-endpoints that return collections of data.



If you remember, I was asking the other day if there is a "standard" way to do this. Basically there are two
"big options":
  1. Put paging info in the Http-Header
  2. Put paging info in the body of the message


While I think paging information are meta data that should not "pollute" the body, I understand the arguments from the JavaScript side that says that they don't easily have access to the http headers
within AJAX requests. So what I have now done is to implement both:
  1. Paging in the http header: this is the standard way that you get if you just request the "normal" media types of application/xml or application/json (output from running RestAssured):

    [update]
    My colleague Libor pointed out that the links do not match with format from RFC 5988 Web Linking, which is now fixed.
    [/update]

    Request method: GET
    Request path: http://localhost:7080/rest/resource
    Query params: page=1
    ps=2
    category=service
    Headers: Content-Type=*/*
    Accept=application/json
    HTTP/1.1 200 OK
    Server=Apache-Coyote/1.1
    Pragma=No-cache
    Cache-Control=no-cache
    Expires=Thu, 01 Jan 1970 01:00:00 CET
    Link=<http://localhost:7080/rest/resource?ps=2&category=service&page=2>; rel="next"
    Link=<http://localhost:7080/rest/resource?ps=2&category=service&page=0>; rel="prev"
    Link=<http://localhost:7080/rest/resource?ps=2&category=service&page=152>; rel="last"
    Link=<http://localhost:7080/rest/resource?page=1&ps=2&category=service>; rel="current"
    Content-Encoding=gzip
    X-collection-size=305
    Content-Type=application/json
    Transfer-Encoding=chunked
    Date=Thu, 23 May 2013 07:57:38 GMT

    [
    {
    "resourceName": "/",
    "resourceId": 10041,
    "typeName": "File System",
    "typeId": 10013,
    …..
  2. Paging as part of the body - there the "real collection" is wrapped inside an object that also contains paging meta data as well as the paging links. To request this representation, a media type of application/vnd.rhq.wrapped+json needs to be used (and this is only available with JSON at the moment):

    Request method: GET
    Request path: http://localhost:7080/rest/resource
    Query params: page=1
    ps=2
    category=service
    Headers: Content-Type=*/*
    Accept=application/vnd.rhq.wrapped+json
    Cookies:
    Body:
    HTTP/1.1 200 OK
    Server=Apache-Coyote/1.1
    Pragma=No-cache
    Cache-Control=no-cache
    Expires=Thu, 01 Jan 1970 01:00:00 CET
    Content-Encoding=gzip
    Content-Type=application/vnd.rhq.wrapped+json
    Transfer-Encoding=chunked
    Date=Thu, 23 May 2013 07:57:40 GMT

    {
    "data": [
    {
    "resourceName": "/",
    "resourceId": 10041,
    "typeName": "File System",
    "typeId": 10013,

    ],
    "pageSize": 2,
    "currentPage": 1,
    "lastPage": 152,
    "totalSize": 305,
    "links": [
    {
    "next": {
    "href": "http://localhost:7080/rest/resource?ps=2&category=service&page=2"
    }
    },

    }

Please try this if you can. We want to get that into a "finished" state (as for the whole REST-Api) for RHQ 4.8
Creating a delegating login module (for JBoss EAP 6.1 )


[ If you only want to see code, just scroll down ]

Motivation


In RHQ we had a need for a security domain that can be used to secure the REST-api and its web-app via container managed security. In the past I had just used the classical DatabaseServerLoginModule to authenticate against the database.

Now does RHQ also allow to have users in LDAP directories, which were not covered by above module. I had two options to start with:
  • Copy the LDAP login modules into the security domain for REST
  • Use the security domain for the REST-api that is already used for UI and CLI


The latter option was of course favorable to prevent code duplication, so I went that route. And failed.

I failed because RHQ was on startup dropping and re-creating the security domain and the server was detecting this and complaining that the security domain referenced from the rhq-rest.war was all of a sudden gone.

So next try: don't re-create the domain on startup and only add/remove the ldap-login modules (I am talking about modules, because we actually have two that we need).

This also did not work as expected:
  • The underlying AS sometimes went into reload needed mode and did not apply the changes
  • When the ldap modules were removed, the principals from them were still cached
  • Flushing the cache did not work and the server went into reload-needed mode


So what I did now is to implement a login module for the rest-security-domain that just delegates to another one for authentication and then adds roles on success.

This way the rhq-rest.war has a fixed reference to that rest-security-domain and the other security domain could just be handled as before.

Implementation



Let's start with the snippet from the standalone.xml file describing the security domain and parametrizing the module


<security-domain name="RHQRESTSecurityDomain" cache-type="default">
<authentication>
<login-module code="org.rhq.enterprise.server.core.jaas.DelegatingLoginModule" flag="sufficient">
<module-option name="delegateTo" value="RHQUserSecurityDomain"/>
<module-option name="roles" value="rest-user"/>
</login-module>
</authentication>
</security-domain>


So this definition sets up a security domain RHQRESTSecurityDomain which uses the DelegatingLoginModule that I will describe in a moment. There are two parameters passed:
  • delegateTo: name of the other domain to authenticate the user
  • roles: a comma separated list of modules to add to the principal (and which are needed in the security-constraint section of web.xml


For the code I don't show the full listing; you can find it in git.

To make our lives easier we don't implement all functionality on our own, but extend
the already existing UsernamePasswordLoginModule and only override
certain methods.


public class DelegatingLoginModule extends UsernamePasswordLoginModule {


First we initialize the module with the passed options and create a new LoginContext with
the domain we delegate to:

@Override
public void initialize(Subject subject, CallbackHandler callbackHandler,
Map<String, ?> sharedState,
Map<String, ?> options)
{
super.initialize(subject, callbackHandler, sharedState, options);

/* This is the login context (=security domain) we want to delegate to */
String delegateTo = (String) options.get("delegateTo");

/* Now create the context for later use */
try {
loginContext = new LoginContext(delegateTo, new DelegateCallbackHandler());
} catch (LoginException e) {
log.warn("Initialize failed : " + e.getMessage());
}



The interesting part is the login() method where we get the username / password and store it for later, then we try to log into the delegate domain and if that succeeded we tell super that we had success, so that it can do its magic.


@Override
public boolean login() throws LoginException {
try {
// Get the username / password the user entred and save if for later use
usernamePassword = super.getUsernameAndPassword();

// Try to log in via the delegate
loginContext.login();

// login was success, so we can continue
identity = createIdentity(usernamePassword[0]);
useFirstPass=true;

// This next flag is important. Without it the principal will not be
// propagated
loginOk = true;

the loginOk flag is needed here so that the superclass will call LoginModule.commit() and pick up the principal along with the roles.
Not setting this to true will result in a successful login() but no principal
is attached.


if (debugEnabled) {
log.debug("Login ok for " + usernamePassword[0]);
}

return true;
} catch (Exception e) {
if (debugEnabled) {
LOG.debug("Login failed for : " + usernamePassword[0] + ": " + e.getMessage());
}
loginOk = false;
return false;
}
}


After success, super will call into the next two methods to obtain the principal and its roles:

@Override
protected Principal getIdentity() {
return identity;
}


@Override
protected Group[] getRoleSets() throws LoginException {

SimpleGroup roles = new SimpleGroup("Roles");

for (String role : rolesList ) {
roles.addMember( new SimplePrincipal(role));
}
Group[] roleSets = { roles };
return roleSets;
}


And now the last part is the Callback handler that the other domain that we delegate to will use to obtain the credentials from us. It is the classical JAAS login callback handler. One thing that first totally confused me was that this handler was called several times during login and I thought it is buggy. But in fact the number of times it is called corresponds to the number of login modules configured in the RHQUserSecurityDomain.


private class DelegateCallbackHandler implements CallbackHandler {
@Override
public void handle(Callback[] callbacks) throws IOException, UnsupportedCallbackException {

for (Callback cb : callbacks) {
if (cb instanceof NameCallback) {
NameCallback nc = (NameCallback) cb;
nc.setName(usernamePassword[0]);
}
else if (cb instanceof PasswordCallback) {
PasswordCallback pc = (PasswordCallback) cb;
pc.setPassword(usernamePassword[1].toCharArray());
}
else {
throw new UnsupportedCallbackException(cb,"Callback " + cb + " not supported");
}
}
}
}



Again, the full code is available in the RHQ git repository.

Debugging (in EAP 6.1 alpha or later )



If you write such a login module and it does not work, you want to debug it. Started with the usual means to find out that my login() method was working as expected, but login just failed. I added print statements etc to find out that the getRoleSets() method was never called. But still everything looked ok. I did some googling and found this good wiki page. It is possible to tell a web app to do audit logging


<jboss-web>
<context-root>rest</context-root>
<security-domain>RHQRESTSecurityDomain</security-domain><disable-audit>false</disable-audit>


This flag alone is not enough, as you also need an appropriate logger set up, which is explained on
the wiki page. After enabling this, I saw entries like


16:33:33,918 TRACE [org.jboss.security.audit] (http-/0.0.0.0:7080-1) [Failure]Source=org.jboss.as.web.security.JBossWebRealm;
principal=null;request=[/rest:….

So it became obvious that the login module did not set the principal. Looking at the code in the superclasses then led me to the loginOk flag mentioned above.

Now with everything correctly set up the autit log looks like


22:48:16,889 TRACE [org.jboss.security.audit] (http-/0.0.0.0:7080-1)
[Success]Source=org.jboss.as.web.security.JBossWebRealm;Step=hasRole;
principal=GenericPrincipal[rhqadmin(rest-user,)];
request=[/rest:cookies=null:headers=authorization=user-agent=curl/7.29.0,host=localhost:7080,accept=*/*,][parameters=][attributes=];

So here you see that the principal rhqadmin has logged in and got the role rest-user assigned, which is the one matching in the security-constraint element in web.xml.

Further viewing



I've presented the above as a Hangout on Air. Unfortunately G+ muted me from time to time when I was typing while explaining :-(

After the video was done I got a few more questions that at the end made me rethink the startup phase for the case that the user has a previous version of RHQ installed with LDAP enabled. In this case, the installer will still install the initial DB-only RHQUserSecurityDomain and then in the startup bean
we check if a) LDAP is enabled in system settings and b) if the login-modules are actually present.
If a) matches and they are not present we install them.

This Bugzilla entry also contains more information about this whole story.
Creating Https Connection Without javax.net.ssl.trustStore Property
Question: How can you use the simple Java API call java.net.URL.openConnection() to obtain a secure HTTP connection without having to set or use the global system property "javax.net.ssl.trustStore"? How can you make a secure HTTP connection and not even need a truststore?

I will show you how you can do both below.

First, some background. Java has a basic API to make a simple HTTP connection to any URL via URL.openConnection(). If your URL uses the "http" protocol, it is very simple to use this to make basic HTTP connections.

Problems creep in when you want a secure connection over SSL (via the "https" protocol). You can still use that API - URL.openConnection() will return a HttpsURLConnection if the URL uses the https protocol - however, you must ensure your JVM can find and access your truststore in order to authenticate the remote server's certificate.

[note: I won't discuss how you get your trusted certificates and how you put them in your truststore - I'll assume you know, or can find out, how to do this.]

You tell your JVM where your truststore is by setting the system property "javax.net.ssl.trustStore" and you tell your JVM how to access your truststore by giving your JVM the password via the system property "javax.net.ssl.trustStorePassword".

The problem is these are global settings (you often see instructions telling you to set these values via the -D command line arguments when starting your Java process) so everything running in your JVM must use that truststore. And you can't alter those system properties during runtime and expect those changes to take effect. Once you ask the JVM to make a secure connection, those system property values appear to be cached in the JVM and are used thereafter for the life of the JVM (I don't know exactly where in the JRE code these values are cached, but my experience shows me that they are). Changing those system properties later on in the lifetime of the JVM has no effect; the original values are forever used.

Another problem that some people run into is having the need for a truststore in the first place. Sometimes you don't have a requirement to authenticate the server endpoint; however, you would still like to send your data encrypted over the wire. You can't do this readily since the connection you obtain from URL.openConnection() will, by default, expect to use your truststore located at the path pointed to by the system property javax.net.ssl.trustStore.

To allow me to use different truststores for different connections, or to allow me to encrypt a connection but not authenticate the endpoint, I wrote a Java utility object that allows you to do just this.

The main constructor is this:

public SecureConnector(String secureSocketProtocol,
                       File   truststoreFile,
                       String truststorePassword,
                       String truststoreType,
                       String truststoreAlgorithm)

You pass it a secure socket protocol (such as "TLS") and your truststore file location. If the truststore file is null, the SecureConnector object will assume you do not want to authenticate the remote server endpoint and you only want to encrypt your over-the-wire traffic. If you do provide a truststore file, you need to provide its password, its type (e.g. "JKS"), and its algorithm (e.g. "SunX509") - if you pass in null for type and/or algorithm, the JVM defaults are used.

Once you create the object, just obtain a secure connection to any URL via a call to SecureConnector.openSecureConnection(URL). This expects your URL to have a protocol of "https". If successful, an HttpsURLConnection object is returned and you can use it like any other connection object. You do not need to set javax.net.ssl.trustStore (or any other javax.net.ssl system property) and, as explained above, you don't even need to provide a truststore at all (assuming you don't need to do any authentication).

The code for this is found inside of RHQ's agent - you can read its javadoc and look through SecureConnector code here.

The core code is found in openSecureConnection and looks like this, I'll break it down:

First, it simply obtains the HTTPS connection object from the URL itself:
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
Then it prepares a custom SSLContext object using the given secure socket protocol:
TrustManager[] trustManagers;
SSLContext sslContext = SSLContext.getInstance(getSecureSocketProtocol());
If no truststore file was provided, it will build its own "no-op" trust manager and "no-op" hostname verifier. What these "no-op" objects will do is always accept all certificates and hostnames thus they will always allow the SSL communications to flow. This is how the authentication is by-passed:
if (getTruststoreFile() == null) {
    // configured to not care about authenticating server, encrypt but don't worry about certificates
    trustManagers = new TrustManager[] { NO_OP_TRUST_MANAGER };
    connection.setHostnameVerifier(NO_OP_HOSTNAME_VERIFIER);
If a truststore file was provided, then it will be loaded in memory and stored in a KeyStore instance:
} else {
    // need to configure SSL connection with truststore so we can authenticate the server.
    // First, create a KeyStore, but load it with our truststore entries.
    KeyStore keyStore = KeyStore.getInstance(getTruststoreType());
    keyStore.load(new FileInputStream(getTruststoreFile()), getTruststorePassword().toCharArray());
The truststore file's content (now stored in a KeyStore object) is used to initialize a trust manager. Unlike the "no-op" trust manager that was created above (if a truststore file was not provided), this trust manager really does perform authentication and it uses the provided truststore's certificates to authorize the server being communicated with. This is why we no longer need to worry about the system properties "javax.net.ssl.trustStore" and "javax.net.ssl.trustStorePassword" - this builds its own trust manager using the data provided by the caller:
    // create truststore manager and initialize it with KeyStore we created with all truststore entries
    TrustManagerFactory tmf = TrustManagerFactory.getInstance(getTruststoreAlgorithm());
    tmf.init(keyStore);
    trustManagers = tmf.getTrustManagers();
}
Finally, the SSL context is initialized with the trust manager that was created earlier (either the "no-op" trust manager, or the trust manager that was initialized with the truststore's certificates). That SSL context is handed off to the SSL connection so the connection can use the context when it needs to perform authentication:
sslContext.init(null, trustManagers, null);
connection.setSSLSocketFactory(sslContext.getSocketFactory());
The connection is finally returned to the caller, fully configured and ready to be used.
return connection;
This is helpful for certain use cases. First, it is helpful when you have multiple truststores that you need to choose from when connecting to different servers as well as being able to switch truststores at runtime (remember, the system property values of javax.net.ssl.trustStore, et. al. are fixed for the lifetime of the JVM - this helps bypass that restriction). This is also helpful in local testing, debugging and demo scenarios when you don't really need or care about setting up truststores and certificates but you do want to connect over https.

RHQ 4.7 released


RHQ 4.7 has been released and one of the two major features in this release are
the new charts that finally have replaced the year old charts that we had since the
start of RHQ project:


Screenshot with new charts


The new charts are implemented on with the awesome D3.js toolkit as I've written before.

The other big change is an upgrade of the underlying app server to JBoss EAP 6.1 alpha 1.

As always there have been many more smaller improvements and bug fixes.

Please check the full release notes for details. They also contain a list of commits.

RHQ is an extensible tool to monitory your infrastructure of machines and applications, alert operators on user defined conditions, configure resources and run operations on them from a central web-based UI. Other ways of communicating with RHQ include a command line interface and a REST-api.

You can download the release from source forge.


As mentioned above, the old installer is gone, so make sure to read
the wiki document describing how to use the new installer.

Maven artifacts are available from the JBoss Nexus repository and should also be available from Central.



Please give us feedback, be it in Bugzilla, Mailing lists or the forum. Or just join us on IRC at irc.freenode.net/#rhq.

Deleting RHQ Agent Made Easier
In the past, if you wanted to remove an RHQ Agent from your RHQ environment, the simple answer was "just uninventory the platform" which would, under the covers, also remove the agent record completely.

However, in some cases, users found it difficult to remove their agent. Usually, what happens is they try to install the RHQ Agent, run into problems, and then get their RHQ system in a state that causes their RHQ Agent to not be able to register with the RHQ Server. For example, this could happen when a person runs the agent as a different user from before or with the -L command line option - both of which essentially purges the agent's security token and could cause the RHQ Server to reject future registration requests from the agent.

If a person does not understand the linkage between platform and agent, or if the agent's platform was never committed to inventory, it became difficult to understand how to get out of the quadmire.

This has now been addressed as an enhancement as requested by BZ 849711. Now, the answer is simple - regardless of whether the platform is in inventory or not, and even if the agent's resources do not yet show up in the discovery queue - you have a way to quickly purge your agent from the system. This will allow you to get back to a clean slate and attempt to re-install your agent.

You do this by going to the top Adminstration page and selecting the "Agents" item. From here, you see the list of all the agents currently registered in your RHQ environment. If you select one or more of them, you now have the option of pressing the new "Delete" button at the bottom. This will do a few things. First, if the agent's platform is already in inventory, it will uninventory that platform. This means that platform and all its child servers and services will be removed (so be careful and make sure you really want to do this - you will lose all manageability and all audit history for all resources previously managed by that agent). Once that is done, all resources will disappear from the inventory and you won't even see any resources for that agent in the Discovery Queue. Finally, the agent's record itself is removed - so the Administration>Agents page will show that the agents have disappeared.

 
With the agent and its resources completely removed, you have the option to attempt to re-install the agent if you wish to bring it back.

You can also use this feature if your managed infrastructure has changed and you no longer want to manage a machine. Just select the agent that was responsible for managing that machine and delete it.

Note that if your agent is still running, it will attempt to re-register itself! So if you no longer wish to manage a machine, make sure you shutdown the agent as well (you'll obviously want to do this anyway, since you won't want an RHQ Agent consuming resources on a machine that you no longer want managed by RHQ).
Awesome new graphs in RHQ - based on d3.js


Mike Thompson has yesterday presented the latest and greatest version of the new graphs for RHQ in a video on YouTube. Shortly after he has committed the results of his huge work into RHQ master branch.


While this work is not yet finished, it is the result of the work started by Denis Krusko in last years Google Summer of Code. At the moment both the old and new graphs are can be looked at in the UI, so that you can compare them and potentially report non-matches.


Here are some screenshots to foster your appetite:


Popup chart for a single metric
Popup chart for a single metric

Individual metric
Individual metric on the monitoring tab


As the subject already says are those graphs made with the help of the awesome D3.js framework - I let Mike chime in to describe in more details what he and Denis had to do to get this to work inside GWT+SmartGWT.

I've uploaded a snapshot from master as of this morning (my time) of this from our CI server onto SourceForge for you to try. THIS IS NOT FOR PRODUCTION.


There is a known issue where the red bar shows "..global exception.." this is harmless and we will fix that anyway. Also the graph portlets in the dashboard don't honor the column width yet.


Please do not forget to report bugs (if there are any :-)

Sorry Miss Jackson -- or how I loved to do custom Json serialization in AS7 with RestEasy
Who doesn't remember the awesome "Sorry Miss Jackson" video ?

Actually that doesn't have to do anything with what I am talking here -- except that RestEasy inside JBossAS7 is internally using the Jackson json processing library. But let me start from the beginning.

Inside the RHQ REST api we have exposed links (in json representation) like this:

{ 
"rel":"self",
"href":"http://...
}

which is the natural format when a Pojo

class Link {
String rel;
String href;
}

is serialized (with the Jackson provider in RestEasy).

Now while this is pretty cool, there is the disadvantage that if you need the link for a rel of 'edit', you had to load the list of links, iterate over them and check for each link if its rel is 'edit' and then take the value of the href.

And there is this Bugzilla entry.

I started investigating this and thought, I will use a MessageBodyWriter and all is well. Unfortunately MessageBodyWriters do not work recursively and thus cannot format Link objects as part of another object in a special way.

Luckily I have done custom serialization with Jackson in the as7-plugin before, so I tried this, but the Serializer was not even instantiated. More trying and fiddling and a question on StackOverflow led me to a solution by copying the jackson libraries (core, mapper and jaxrs) into the lib directory of the RHQ ear and then all of a sudden the new serialization worked. The output is now

{ "self":
{ "href" : "http://..." }
}


So far so nice.

Now the final riddle was how to use the jackson libraries that are already in user by the resteasy-jackson-provider. And this was solved by adding respective entries to jboss-deployment-structure.xml, which ends up in the META-INF directory of the rhq.ear directory.

The content in this jboss-deployment-structure.xml then looks like this (abbreviated):


<sub-deployment name="rhq-enterprise-server-ejb3.jar">
<dependencies>
....
<module name="org.codehaus.jackson.jackson-core-asl" export="true"/>
<module name="org.codehaus.jackson.jackson-jaxrs" export="true"/>
<module name="org.codehaus.jackson.jackson-mapper-asl" export="true"/>
</dependencies>
</sub-deployment>


AS7 is still complaining a bit at startup:


16:48:58,375 WARN [org.jboss.as.dependency.private] (MSC service thread 1-3) JBAS018567: Deployment "deployment.rhq.ear.rhq-enterprise-server-ejb3.jar" is using a private module ("org.codehaus.jackson.jackson-core-asl:main") which may be changed or removed in future versions without notice.


but for the moment the result is good enough - and as we do not regularly change the underlying container for RHQ, this is tolerable.

And of course I would be interested in an "official" solution to that -- other than just doubling those jars into my application (and actually I think this may even complicate the situation, as (re-)using the jackson jars that RestEasy inside as7 uses, guarantees that the version is compatible).

RHQ 4.6 released

As I've mentioned before, the RHQ team has been very busy since RHQ 4.5.1 (and actually already before that) and has switched the application server it uses to JBoss AS 7.1. Directly after the switchover we have posted a first alpha version.


Now after more work and fixes, we are happy to provide the release 4.6 of RHQ, that has all the issues resolved that arose from the switch. Features of this release are:

  • The internal app server is now JBossAS 7.1.1
  • GWT has been upgraded to version 2.5
  • There is a new installer (this has also changed since the 4.6 alpha release)
  • The REST-Api has been enhanced
  • Korean translations have been added (contributed by SungUk Jeon)
  • Webservices have been removed
  • Building RHQ now requires Java7, but it will still run on Java6
.

See the full release notes for details. They also contain a list of commits.

You can download the release from source forge.


As mentioned above, the old installer is gone, so make sure to read
the wiki document describing how to use the new installer.

Maven artifacts are available from the JBoss Nexus repository and should soon also be available from Central.

We also like to say thank you to our contributors for this release:

  • Jürgen Hoffmann
  • Richard Hensman
  • SungUk Jeon


Please give us feedback, be it in Bugzilla, Mailing lists or the forum. Or just join us on IRC at irc.freenode.net/#rhq.
Best practice for paging in RESTful APIS (updated)?
In the RHQ-REST api, we return individual objects and also collections of objects. Some of those collections are rather small (number of platforms), while others can grow a lot (number of resources in total or number of alerts fired). For the latter it is advised to do some paging and not return the full result set in one go. Some of the reasons are:
  • Memory consumption in server and client
  • Bandwidth need to transfer the data.
  • Latency to transfer huge amounts of data over slow networks


Inside RHQ we have the concept of a PageList<?> where an internal PageControl object defines the page size and other criteria like sorting. The PageList then only contains the objects from a certain page. I think this is a pretty common setup.

And here is where my question comes:

What is the "best-practice" to represent such a PageList in a RESTful api? So far I have seen two major ways:
  1. Add a Link: header that contains the prev and next relations. This is what RFC 5988 suggests and what projects like AeroGear use. The advantage here is that the body still contains the "raw" data and not meta data. And for both cases of 'single' object and 'collection' the data is at the 'root' of the body. Also paging is available for HEAD requests.

    On the other hand, it may get a bit harder for some client code (JavaScript, jQuery) to access the header and make use of the paging links
  2. Put the prev and next relations in the body of the request next to the collection. This has the advantage that there is no need to parse the http header. Disadvantage is that the real payload is now shifted "one level down" for collections.


    I sort of see the paging links as meta-data and think that this should not be mixed with the payload. Now a colleague of mine said: "Isn't that a state change link for the collection like the 'rel=edit' for a single object?". This sounds odd, but can't be denied.

Actually I have also seen mentioning the use of cookies to send the paging information, but that looks very non-transparent to me, so I am not considering this at all.

Just to be clear: I am explicitly talking about paging of collections and not about affordances of individual objects.

So are there established best practices? How do others do it?

If going for the Link: header: would people rather like to see multiple Link headers (see RFC 2616), one for each relation:

Link: <http://foo/?page=3>; rel='next'
Link: <http://foo/?page=1>; rel='prev'

or rather the combined way:

Link: <http://foo/?page=3>; rel='next', Link: <http://foo/?page=1>; rel='prev'

that is listed in RFC 5988?

[update]

I just saw that URLConnection.getHeaderField(name) does not support the multiple Link: headers as it only returns the last occurrence:

If called on a connection that sets the same header multiple times with possibly different values, only the last value is returned.


While there may be other ways to access all the Link: headers, this is a too obvious pitfall, that can be prevented by not using that style.

"Nested Transactions" and Timeouts
While coding up some EJB3 SLSB methods, the following question came up:

If a thread is already associated with a transaction, and that thread calls another EJB3 SLSB method annotated with "REQUIRES_NEW", the thread gets a new (or "pseudo-nested" or for ease-of-writing what I'm simply call the "nested" transaction, even though it is not really "nested" in the true sense of the term) transaction but what happens to the parent transaction while the "nested" transaction is active? Does the parent transaction's timeout keep counting down or is it suspended and will start back up counting down only when the "nested" transaction completes?

For example, suppose I enter a transaction context by calling method A and this method has a transaction timeout of 1 minute. Method A then immediatlely calls a REQUIRES_NEW method B, which itself has a 5 minute timeout. Now suppose method B takes 3 minutes to complete. That is within B's allotted 5 minute timeout so it returns normally to A. A then immediately returns.

But A's timeout is 1 minute! B took 3 minutes on its own. Even though the amount of time A took within itself was well below its allotted 1 minute timeout, its call to B took 3 minutes.

What happens? Does A's timer "suspend" while its "nested" transaction (created from B) is still active?  Or does A's timer keep counting down, regardless of whether or not B's "nested" transaction is being counted down at the same time (and hence A will abort with a timeout)?

Here's some code to illustrate the use-case (this is what I actually used to test this):

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
@TransactionTimeout(value = 5, unit = TimeUnit.SECONDS)
public void testNewTransaction() throws InterruptedException {
   log.warn("~~~~~ Starting new transaction with 5s timeout...");
   LookupUtil.getTest().testNewTransaction2();
   log.warn("~~~~~ Finishing new transaction with 5s timeout...");
}
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
@TransactionTimeout(value = 10, unit = TimeUnit.SECONDS)
public void testNewTransaction2() throws InterruptedException {
   log.warn("~~~~~ Starting new transaction with 10s timeout...sleeping for 8s");
   Thread.sleep(8000);
   log.warn("~~~~~ Finishing new transaction with 10s timeout...");
}
I don't know what any of the EE specs say about this, but it doesn't matter - all I need to know is how JBossAS7 behaves :) So I ran this test on JBossAS 7.1.1.Final and here's what the log messages say:
17:51:22,935 ~~~~~ Starting new transaction with 5s timeout...
17:51:22,947 ~~~~~ Starting new transaction with 10s timeout...sleeping for 8s
17:51:27,932 WARN  ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffc0a80102:-751071d8:5115811c:449 in state  RUN
17:51:27,936 WARN  ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffc0a80102:-751071d8:5115811c:449
17:51:30,948 ~~~~~ Finishing new transaction with 10s timeout...
17:51:30,949 ~~~~~ Finishing new transaction with 5s timeout...
17:51:30,950 WARN  ARJUNA012077: Abort called on already aborted atomic action 0:ffffc0a80102:-751071d8:5115811c:449
17:51:30,951 ERROR JBAS014134: EJB Invocation failed on component TestBean for method public abstract void org.rhq.enterprise.server.test.TestLocal.testNewTransaction() throws java.lang.InterruptedException: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back
...
Caused by: javax.transaction.RollbackException: ARJUNA016063: The transaction is not active!
   at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1155) [jbossjts-4.16.2.Final.jar:]
   at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:117) [jbossjts-4.16.2.Final.jar:]
   at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
   at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:92) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
So it is clear that, at least for JBossAS7's transaction manager, the parent transaction's timer is not suspended, even if a "nested" transaction is activated. You see I enter the first method (which activates my first transaction) at 17:51:22, and immediately enter the second method (which activates my second, "nested", transaction at the same time of 17:51:22). My first transaction has a timeout of 5 seconds, my second "nested" transaction has a timeout of 10 seconds. My second method sleeps for 8 seconds, so it should finish at 17:51:30 (and it does if you look at the log messages at that time). BUT! Prior to that, my first transaction is aborted by the transaction manager at 17:51:27 - exactly 5 seconds after my first transaction was started. So, clearly my first transaction's timer was not suspended and was continually counting down even as my "nested" transaction was active.

So, in short, the answer is (for JBossAS7 at least) - a transaction timeout is always counting down and starts as soon as the transaction activates. It never pauses nor suspends due to "nested" transactions.

Monitoring IP Endpoints Via RHQ
RHQ has a little known plugin called the "netservices" plugin. Someone was asking about how RHQ can monitor HTTP endpoints on the #rhq freenode chat room - you use this netservices plugin to do this.

There are actually two different resource types that plugin provides - one allows you to monitor an HTTP URL endpoint (e.g. so you can monitor for different HTTP status codes). The other is a very basic resource type that just verifies that you can ping a specific IP or hostname.

Here's the RHQ 4.5.1 version of that plugin and its source:

http://mirrors.ibiblio.org/maven2/org/rhq/rhq-netservices-plugin/4.5.1/

(note: if anyone in the community is interested in doing some work on RHQ and looking for something simple to do - here's a perfect opportunity. It would be nice to have some documentation written for that plugin, but more importantly, I think we could beef up this plugin. For example, it would be nice to have another resource type (similar to that basic PingService) that is able to confirm that a particular port on a particular IP/host can be connected to. We also could use some testing done on this plugin to make sure it still works as expected.)
Why I am ready to move to CQL for Cassandra application development
Earlier this year, I started learning about Cassandra as it seemed like it might be a good fit as a replacement data store for metrics and other time series data in RHQ. I developed a prototype for RHQ. I used the client library Hector for accessing Cassandra from within RHQ. I defined my schema using a Cassandra CLI script. I recall when I first read about CQL. I spent some time deliberating over whether to define the schema using a CLI script or using a CQL script. Although I was intrigued but ultimately decided against using CQL. As the CLI and the Thrift interface were more mature, it seemed like the safer bet. While I decided not to invest any time in CQL, I did make a mental note to revisit it at a later point since there was clearly a big emphasis within the Cassandra community for improving CQL. That later point is now, and I have decided to start making extensive use of CQL.

After a thorough comparative analysis, the RHQ team decided to move forward with using Cassandra for metric data storage. We are making heavy use of dynamic column families and wide rows. Consider for example the raw_metrics column family in figure 1,

Figure 1. raw_metrics column family

The metrics schedule id is the row key. Each data point is stored in a separate column where the metric timestamp is the column name and the metric value is the column value. This design supports fast writes as well as fast reads and works particularly well for the various date range queries in RHQ. This is considered a dynamic column family because the number of columns per row will vary and because column names are not defined up front. I was quick to rule out using CQL due to a couple misconceptions about CQL's support for dynamic column families and wide rows. First, I did not think it was possible to define a dynamic table with wide rows using CQL. Secondly, I did not think it was possible to execute range queries on wide rows.

A couple weeks ago I came across this thread on the cassandra-users mailing list which points out that you can in fact create dynamic tables/column families with wide rows. And conveniently after coming across this thread, I happened to stumble on the same information in the docs. Specifically the DataStax docs state that wide rows are supported using composite column names. The primary key can have multiple components, but there must be at least one column that is not part of the primary key. Using CQL I would then define the raw_metrics column family as follows,

This CREATE TABLE statement is straightforward, and it does allow for wide rows with dynamic columns. The underlying column family representation of the data is slightly different from the one in figure 1 though.

Figure 2. CQL version of raw_metrics column family
Each column name is now a composite that consists of the metric timestamp along with the string literal, value. There is additional overhead on reads and writes as the column comparator now has to compare the string in addition to the timestamp. Although I have yet to do any of my own benchmarking, I am not overly concerned by the additional string comparison. I was however concerned about the additional overhead in terms of disk space. I have done some preliminary analysis and concluded that the difference with just storing the timestamp in the column name is negligible due to compression of SSTables which is enabled by default.

My second misconception about executing range queries is really predicated on the first misconception. It is true that you can only query named columns in CQL; consequently, it is not possible to perform a date range query against the column family in figure 1. It is possible though to execute a date range query against the column family in figure 2.

RHQ supports multiple upgrade paths. This means that in order to upgrade to the latest release (which happens to be 4.5.0 at the time of this writing), I do not have to first upgrade to the previous release (which would be 4.4.0). I can upgrade from 4.2.0 for instance. Supporting multiple upgrade paths requires a tool for managing schema changes. There are plenty of such tools for relational databases, but I am not aware of any for Cassandra. But because we can leverage CQL and because there is a JDBC driver, we can look at using an existing tool instead of writing something from scratch. I have done just that and working on adding support for Cassandra to Liquibase. I will have more on that in future post. Using CQL allows us to reuse existing solutions which in turn is going to save a lot of development and testing effort.

The most compelling reason to use CQL is the familiar, easy to use syntax. I have been nothing short of pleased with Hector. It is well designed, the online documentation is solid, and the community is great. Whenever I post a question on the mailing list, I get responses very quickly. With all that said, contrast the following two, equivalent queries against the raw_metrics column family.

RHQ developers can look at the CQL version and immediately understand it. Using CQL will result in less, easier to maintain code. We can also leverage ad hoc queries with cqlsh during development and testing. The JDBC driver also lends itself nicely to applications that run in an application as RHQ does.

Things are still evolving both with CQL and with the JDBC driver. Collections support is coming in Cassandra 1.2. The JDBC driver does not yet support batch statements. This is due to the lack of support for it the server side. The functionality is there in the Cassandra trunk/master branch, and I expect to see it in the 1.2 release. The driver also currently lacks support for connection pooling. These and other critical features will surely make their way into the driver. With the enhancements and improvements to CQL and to the JDBC driver, adding Cassandra support to Hibernate OGM becomes that much more feasible.

The flexibility, tooling, and ease of use make CQL a very attractive option for working with Cassandra. I doubt the Thrift API is going away any time soon, and we will continue to leverage the Thrift API through Hector in RHQ in various places. But I am ready to make CQL a first class citizen in RHQ and look forward to watching it continue to mature into a great technology.
Fedorahosted GIT links have changed (again)
Well, fedorahosted did it again.

All of my links to RHQ source code in all of my blog entries (and everywhere else I used them - bugzilla entries, wiki pages, etc) will be invalid now, because fedorahosted once again changed their git URLs and AFAICS the old URLs no longer work and do not redirect to the new URLs.

This is at least the second time that I know of that they did this - it may have happened more. Previously, I went through all of my blog entries and updated my source links, but I am not going to do it this time. It is too time consuming and I am not convinced this won't happen again.

So, if you click a link to source code in my blogs and get an error message saying the link is invalid, you know why. :-/

(Heiko is right, we need to move to github :-)
RHQ's New Availability Checking
The latest RHQ release has introduced a few new things related to its availability scheduling and checking. (side note: you'll notice that link points to RHQ's new documentation home! The wiki moved to jboss.org)

It used to be that RHQ's availability scanning was pretty rigid. RHQ scanned every resource on a fixed 5 minute schedule. Though this was configurable, it was only configurable on an agent-wide scale (meaning, you could not say that you want to check resource A every 5 minutes, but resource B every 1 minute).

In addition, when the agent would go down (for whatever reason), it took a long time (around 15 minutes by default) for the RHQ Server to consider the agent to be "missing in action" and mark all of the resources that agent was managing as "down". Prior to those 15 minutes (what RHQ calls the "Agent Max Quiet Time Allowed"), you really had no indication there was a problem and some users found this to be too long before alerts started firing. And even though this Agent Max Quiet Time Allowed setting is configurable, under many situations, setting it to something appreciably lower than 15 minutes would cause the RHQ Server to think agents were down when they weren't, triggering false alerts.

RHQ has since addressed these issues. Now, you will notice all server and service resources have a common "Availability" metric in their Monitoring>Schedules subtab in the GUI. This works just like other metric schedules - you can individually configure these as you wish just like normal metrics. By default, all server resources will have their availability checked every 1 minutes (for example, see the server resource snapshot in figure 1). By default, all service resources will have their availability checked every 10 minutes (for example, see the service resource snapshot in figure 2).

Figure 1 - A server resource and its Availability metric, with default interval of 1 minute
Figure 2 - A service resource and its Availability metric, with default interval of 10 minutes
The reason for the difference in defaults is this - we felt most times it didn't pay to scan all services at the same rate as servers. That just added more load on the agent and more load on the managed resources for no appreciable gain in the default case. Because normally, if a service is down, we could have already detected that by seeing that its server is down. And if a server is up, generally speaking, all of its services are normally up as well. So we found checking the server resources more frequently, combined with checking all of their services less frequently, helps detect the common failures (those being, crashed servers) more quickly while lowering the total amount of work performed. Obviously, these defaults are geared toward the general case; if these assumptions don't match your expectations, you are free to alter your own resources' Availability schedules to fit your own needs. But we felt that, out-of-box, we wanted to reduce the load the agent puts on the machine and the resources it is managing while at the same time we wanted to be able to detect downed servers more quickly. And this goes a good job of that.

Another change that was made is that when an agent gracefully shuts down, it sends one final message to the RHQ Server to indicate it is shutting down and at that point the RHQ Server immediately marks that agent's platform as "down" rather than wait for the agent to go quiet for a period of 15 minutes (the Agent Max Quiet Time Allowed period) before being marked "down". Additionally, although the platform is marked "down", that platform's child resources (all of its servers and services) are now marked as "unknown". This is because, technically, RHQ doesn't really know what's happening on that box now that the agent is no longer reporting in. Those managed resources that the agent was monitoring (the servers and services) could be down, but RHQ can't make that determination - they could very well still be up. Therefore, the RHQ Server places all the servers and services in the new availability state known as "unknown" as indicated by the question-mark-within-a-gray-circle icon. See figure 3 as an example.

Figure 3 - Availability data for a service that has been flagged as being in an "unknown" state
As part of these infrastructure changes, the Agent Max Quiet Time Allowed default setting has been lowered to 5 minutes. This means that, in the rare case that the agent actually crashes and was unable to gracefully notify the RHQ Server, the RHQ Server will consider the agent "missing in action" faster than before (now 5 minutes instead of 15 minutes). Thus, alerts can be fired in a more timely manner when agents go down unexpectedly.

Setting up a local Cassandra cluster using RHQ
As part of my ongoing research into using Cassandra with RHQ, I did some work to automate setting up a Cassandra cluster (for RHQ) on a single machine for development and testing. I put together a short demo showing what is involved. Check it out at http://bit.ly/N3jbT8.
EJB Calltime Monitoring
I wanted to give a brief illustration of RHQ's calltime monitoring feature by showing how it is used to collect EJB calltime statistics. The idea here is that I have an EJB application (in my example, I am using the RHQ Server itself as the monitored application!) and I want to see calltime statistics for the EJB method calls being made. For example, in my EJB application being monitored, I have many EJB Stateless Session beans (SLSBs). I want to see how my EJB SLSBs are behaving by looking at how many times my EJB SLSB methods are called and how efficient they are (that is, I want to see the maximum, minimum and average time each EJB SLSB method took).

So, first what I do is go to my EJB SLSB resources that are in inventory and I enable the "Method Invocation Time" calltime metric. You can do this on an individual EJB resource or you can do a bulk change of all your EJB's schedules by navigating to your EJB autogroup and changing the schedule in the autogroup's Schedules subtab:



At this point, the RHQ Agent will collect the information and send it up to the RHQ Server just like any other metric collection. Over time, I can now examine my EJB SLSB calltime statistics by traversing to my EJB resource's Calltime subtab:



I can even look at an aggregate view of all my EJBs via the EJB autogroup. In the RHQ GUI, you will see in the left hand tree all of my EJB SLSBs are children of the autogroup node "EJB3 Session Beans". If I select that autogroup node and navigate to its Calltime subtab, I can see all of the measurements for all of my EJB SLSBs:



Note that this calltime measurement collection feature is not specific to EJBs, or the JBossAS 4 plugin. This is a generic subsystem supported by any plugin that wants to enable this feature. If you want to write your own custom plugin that wants to monitor calltime-like statistics (say, for HTTP URL endpoints or any other type of infrastructure that has a "duration" type metric that can be collected) you can utilize the calltime subsystem to collect your information and let the RHQ Server store it and display it in this Calltime subtab for your resource.
Aggregating Metric Data with Cassandra

Introduction

I successfully performed metric data aggregation in RHQ using a Cassandra back end for the first time recently. Data roll up or aggregation is done by the data purge job which is a Quartz job that runs hourly. This job is also responsible for purging old metric data as well as data from others parts of the system. The data purge job invokes a number of different stateless session EJBs (SLSBs) that do all the heavy lifting. While there is a still a lot of work that lies ahead, this is a big first step forward that is ripe for discussion.

Integration

JPA and EJB are the predominant technologies used to implement and manage persistence and business logic. Those technologies however, are not really applicable to Cassandra. JPA is for relational databases and one of the central features of EJB is declarative, container-managed transactions. Cassandra is neither a relational nor a transactional data store. For the prototype, I am using server plugins to integrate Cassandra with RHQ.

Server plugins are used in a number of areas in RHQ already. Pluggable alert notifcation senders is one of the best examples. A key feature of server plugins is the encapsulation made possible by the class loader isolation that is also present with agent plugins. So let's say that Hector, the Cassandra client library, requires a different version of a library that is already used by RHQ. I can safely use the version required by Hector in my plugin without compromising the RHQ server. In addition to the encapsulation, I can dynamically reload my plugin without having to restart the whole server. This can help speed up iterative development.

Cassandra Server Plugin Configuration
You can define a configuration in the plugin descriptor of a server plugin. The above screenshot shows the configuration of the Cassandra plugin. The nice thing about this is that it provides a consistent, familiar interface in the form of the configuration editor that is used extensively throughout RHQ. There is one more screenshot that I want to share.

System Settings
This is a screenshot of the system settings view. It provides details about the RHQ server itself like the database used, the RHQ version, and build number. There are several configurable settings, like the retention period for alerts and drift files and settings for integrating with an LDAP server for authentication. At the bottom there is a property named Active Metrics Server Plugin. There are currently two values from which to choose. The first is the default, which uses the existing RHQ database. The second is for the new Cassandra back end. The server plugin approach affords us a pluggable persistence solution that can be really useful for prototyping among other things. Pluggable persistence with server plugins is a really interesting topic in and of itself. I will have more to say on that in future post.

Implementation

The Cassandra implementation thus far uses the same buckets and time slices as the existing implementation. The buckets and retention periods are as follows:

Metrics Data BucketData Retention Period
raw data7 days
one hour data2 weeks
6 hour data1 month
1 day data1 year

Unlike the existing implementation, purging old data is accomplished simply by setting the TTL (time to live) on each column. Cassandra takes care of purging expired columns. The schema is pretty straightforward. Here is the column family definition for raw data specified as a CLI script:


The row key is the metric schedule id. The column names are timestamps and column values are doubles. And here is the column family definition for one hour data:


As with the raw data, the schedule id is the row key. Unlike the raw data however, we use composite columns here. All the buckets with the exception of the raw data, store computed aggregates. RHQ calculates and stores the min, max, and average for each (numeric) metric schedule. The column name consists of a timestamp and an integer. The integer identifies whether the value is the max, min, or average. Here is some sample (Cassandra) CLI output for one hour data:


Each row in the output reads like a tuple. The first entry is the column name with a colon delimiter. The timestamp is listed first followed by the integer code to identify the aggregate type. Next is the column value, which is the value of the aggregate calculation. Then we have a timestamp. Every column has a timestamp in Cassandra has a timestamp. It is used for conflict resolution on writes. Lastly, we have the ttl. The schema for the remaining buckets is similar the one_hour_metric_data column family so I will not list them here.

The last implementation detail I want to discuss is querying. When the data purge job runs, it has to determine what data is ready to be aggregated. With the existing implementation that uses the RHQ database, queries are fast and efficient using indexes. The following column family definition serves as an index to make queries fast for the Cassandra implementation as well:


The row key is the metric data column family name, e.g., one_hour_metric_data. The column name is a composite that consists of a timestamp and a schedule id. Currently the column value is an integer that is always set to zero because only the column name is needed. At some point I will likely refactor the data type of the column  value to something that occupies less space. Here is a brief explanation of how the index is used. Let's start with writes. Whenever data for a schedule is written into one bucket, we update the index for the next bucket. For example, suppose data for schedule id 123 is written into the raw_metrics column family at 09:15. We will write into the "one_hour_metric_data" row of the index with a column name of 09:00:123. The timestamp in which the write occurred is rounded down to the start of the time slice of the next bucket. Further suppose that additional data for schedule 123 is written into the raw_metrics column family at times 09:20, 09:25, and 09:30. Because each of those timestamps gets rounded down to 09:00 when writing to the index, we do not wind up with any additional columns for that schedule id. This means that the index will contain at most one column per schedule for a given time slice in each row.

Reads occur to determine what data if any needs to be aggregated. Each row is in the index is queried. After a column is read and the data for the corresponding schedule is aggregated into the next bucket, that column is then deleted. This index is a lot like a job queue. Reads in the existing implementation that use a relational database should be fast; however, there is still work that has to be done to determine what data if any needs to be aggregated when the data purge job runs. With the Cassandra implementation, the presence of a column in a row of the metrics_aggregates_index column family indicates that data for the corresponding schedule needs to be aggregated.

Testing

I have pretty good unit test coverage, but I have only done some preliminary integration testing. So far it has been limited to manual testing. This includes inspecting values in the database via the CLI or with CQL and setting break points to inspect values. As I look to automate the integration testing, I have been giving some thought to how metric data is pushed to the server. Relying on the agent to push data to the server is sub optimal for a couple reasons. First, the agent sends measurement reports to the server once a minute. I need better control of how frequently and when data is pushed to the server.

The other issue with using the agent is that it gets difficult to simulate older metric data that has been reported over a specified duration, be it an hour, a day, or a week. Simulating older data is needed for testing that data is aggregated into 6 hour and 24 hour buckets and that data is purged at appropriate times.

RHQ's REST interface is a better fit for the integration testing I want to do. It already provides the ability to push metric data to the server. I may wind up extending the API, even if just for testing, to allow for kicking off the aggregation that runs during the data purge job. I can then use the REST API to query the server and verify that it returns the expected values.

Next Steps

There is still plenty of work ahead.I have to investigate what consistency levels are most appropriate for different operations. There is a still a large portion of the metrics APIs that needs to be implemented, some of the more important ones being query operations used to render metrics graphs and tables. The data purge job is not the best approach going forward for doing the aggregation. Only a single instance of the job runs each hour, and it does not exploit any of the opportunities that exist for parallelism. Lastly and maybe most importantly, I have yet to start thinking about how to effectively manage the Cassandra cluster with RHQ. As I delve into these other areas I will continue sharing my thoughts and experiences.
Modeling Metric Data in Cassandra
RHQ supports three types of metric data - numeric, traits, and call time. Numeric metrics include things like the amount of free memory on a system or the number of transactions per minute. Traits are strings that track information about a resource and typically change in value much less frequently than numeric metrics. Some examples of traits include server start time and server version. Call time metrics capture the execution time of requests against a resource. An example of call time metrics is EJB method execution time.

I have read several times that with Cassandra it is best to let your queries dictate your schema design. I recently  spent some time thinking about RHQ's data model for metrics and how it might look in Cassandra. I decided to focus only on traits for the time being, but much of what I discuss applies to the other metrics types as well.

I will provide a little background on the existing data model to make it easier to understand some of the things I touch on. All metric data in RHQ belongs to resources. A particular resource might support metrics like those in the examples above, or it might support something entirely different. A resource has a type, and the resource type defines which type of metrics that it supports.We refer to these as measurement definitions. These measurement definitions, along with other meta data associated with the resource type, are defined in the plugin descriptor of the plugin that is responsible for managing the resource. You can think of a resource type of an abstraction and a resource is a realization of that abstraction. Similarly, a measurement definition is an abstraction, and a measurement schedule is a realization of a measurement definition. A resource can have multiple measurement schedules, and each schedule is associated with measurement definition. The schedule has a number of attributes like the collection interval, an enabled flag, and the value. When the agent reports metric data to the RHQ server the data is associated with a particular schedule. To tie it all together, here is a snippet of some of the relevant parts of the measurement classes:

To review, for a given measurement schedule, we can potentially add an increasing number of rows in the RHQ_MEASUREMENT_DATA_TRAIT table over time. There are a lot of fields included in the snippet for MeasurementDefinition. I chose to include most of them because they are pertinent to the discussion.

For the Cassandra integration, I am interested primarily in the MeasurementDataTrait class. All of the other types are managed by the RHQ database. Initially when I started thinking about what column families I would need, I felt overcome with writer's block. Then I reminded myself to think about trait queries and try to let those guide my design. I decided to focus on some resource-level queries and leave others like group-level queries for a later exercise. Here is a screenshot of one of the resource-level views where the queries are used:


Let me talk a little about this view. There are a few things to point out in order to understand the approach I took with the Cassandra schema. First, this is a list view of all the resource's traits. Secondly, the view shows only the latest value for each trait. Finally, the fields required by this query span across multiple tables and include resource id, schedule id, definition id, display name, value, and time stamp. Because the fields span across multiple tables, one or more joins is required for this query. There are two things I want to accomplish with the column family design in Cassandra. I want to be able to fetch all of the data with a single read, and I want to be able to fetch all of the traits for a resource in that read. Cassandra of course does not support joins; so, some denormalization is needed to meet my requirements. I have two column families for storing trait data. Here is the first one that supports the above list view as a Cassandra CLI script:
create column family resource_traits
with comparator = 'CompositeType(DateType, Int32Type, Int32Type, BooleanType, UTF8Type, UTF8Type)' and
default_validation_class = UTF8Type and
key_validation_class = Int32Type;
The row key is the resource id. The column names are a composite type that consist of the time stamp, schedule id, definition id, enabled flag, display type, and display name. The column value is a string and is the latest known value of the trait. This design allows for the latest values of all traits to be fetched in a single read. It also gives me the flexibility to perform additional filtering. For example, I can query for all traits that are enabled or disabled. Or I can query for all traits whose values last changed after a certain date/time. Before I talk about the ramifications of the denormalization I want to introduce the other column family that tracks the historical data. Here is the CLI script for it:
create column family traits
with comparator = DateType and
default_validation_class = UTF8Type and
key_validation_class = Int32Type;
This column family is pretty straightforward. The row key is the schedule id. The column name is the time stamp, and the column value is the trait value. In the relational design, we only store a new row in the trait table if the value has changed. I have only done some preliminary investigation, and I am not yet sure how to replicate that behavior with a single write. I may need to use a custom comparator. It is something I have to revisit.

I want to talk a little bit about the denormalization. As far this example goes, the system of record for everything except the trait data is the RHQ database. Suppose a schedule is disabled. That will now require a write to both the RHQ database as well as to Cassandra. When a new trait value is persisted, two writes have to be made to Cassandra - one to add a column to the traits column family and one to update the resource_traits column family.

The last thing I will mention about the design is that I could have opted for a more row based approach where each column in resource_traits is stored in a separate row. With that approach, I would use statically named columns like scheduleId and the corresponding value would be something like 1234. The primary reason I decided against this is because the RandomPartitioner is used for the partitioning strategy, which happens to be the default. RandomPartitioner is strongly recommended for most cases to allow for even key distribution across nodes. Without going into detail, range scans, i.e., row-based scans, are not possible when using the RandomPartitioner. Additionally, Cassandra is designed to perform better with slice queries, i.e., column-based queries than with range queries.

The design may change as I get further along in the implementation, but it is a good starting point. The denormalization allows for efficient querying of a resource's traits and offers the flexibility for additional filtering. There are some trade offs that have to be made, but at this point, I feel that they are worthwhile. One thing is for certain. Studying the existing (SQL/JPA) queries and understanding what data is involved and how helped flush out the column family design.
Working with Cassandra
RHQ provides a rich feature set in terms of its monitoring capabilities. In addition to collecting and storing metric data, RHQ automatically generates baselines, allows you to view graphs of data points over different time intervals, and gives you the ability to alert on metric data. RHQ uses a single database for storing all of its data. This includes everything from the inventory, to plugin meta data, to metric data. This presents an architectural challenge for the measurement subsystem particularly in terms of scale. As the number of managed resources grows and the volume of metrics being collected increases, database performance starts to degrade. Despite various optimizations that have been made, the database remains a performance bottleneck. The reality is that the relational database simply is not the best tool for write-intensive applications like time-series data.

This architectural challenge has in large part motivated me to start learning about Cassandra. There are plenty of other, non-relational database systems that I think could address the performance problems with our measurement subsystem. There are a couple things about Cassandra that provided enough intrigue that made me decide to invest time learning about it.

The first point of intrigue is that Cassandra is a distributed, peer-to-peer system with no single point of failure. Any node in the cluster can serve read and write requests. Nodes can be added to and removed from the cluster at any point making it easier to meet demands around scalability. This design is largely inspired by Amazon's Dyanmo.

The second point of intrigue for me is that running a node involves running only a single Java process. For the purposes of RHQ and JBoss Operations Network (JON), this is much more important to me than the first point about single points of failure. The fewer the moving parts, the better. It simplifies management which will goes along way towards the goal of having a self-contained solution.

Cassandra could be a great fit for RHQ, and the time I have spent thus far learning it is definitely time well spent. There are some learning curves and hurdles one has to overcome though. I find the project documentation to be lacking. For example, it took some time to wrap my head around super columns. It was only after I started understanding super columns to the point where I could begin thinking about how to leverage them with RHQ's data model that I then discovered that composite columns should be favored over super columns. Apparently composite columns do not have the performance and memory overhead inherent to super columns. And composite columns allow for an arbitrary level of nesting whereas super columns do not. Fortunately DataStax's docs help fill in a lot of the gaps.

One thing that was somewhat counter-intuitive initially is how the sorting works. With a relational database, you first define the schema, and then queries are defined later on. Sorting is done on column values and is specified at query time. With Cassandra, sorting is based on column names and is specified at the time of schema creation. This might seem really strange if you are thinking in terms of a relational database, but Cassandra is a distributed key-value store. If you think about it more along the lines of say, java.util.TreeMap, then it makes a lot more sense. With a TreeMap, sorting is done on keys. When I want to use a TreeMap or another ordered collection, I have to decide in advance how the elements of the collection should be ordered. This aspect of Cassandra is a good thing. It contributes to the high performance read/writes for which Cassandra is known. It also lends itself very well to working with time-series data.

DataStax posted a great blog the other day about how they use Cassandra as a metrics database. The algorithm described sounds similar to what we do in RHQ; however, there are a few differences (aside from the obvious one of using different database systems). One difference is in bucket sizes. They use bucket sizes of one minute, five minutes, two hours, and twenty-four hours. RHQ uses bucket sizes of one hour, six hours, and twenty-four hours. I will briefly explain what this means. RHQ writes raw data points into a set of round-robin tables. Every hour a job runs to perform aggregation. The latest hour of data points is aggregated into the one hour table or bucket. RHQ calculates the max, min, and average for each metric collection schedule. When the one hour table has six hours worth of data, it is aggregated and written into the  six hour table.

Disk space is cheap, but it is not infinite. There needs to be a purge mechanism in place to prevent unbounded growth. For RHQ, the hourly job that does the aggregation also handles the purging. Data in the six hour bucket for instance, is kept for 31 days. With Cassandra, DataStax simply relies on Cassandra's built-in TTL (time to live) feature. When data is written into a column, the TTL is set on it so that it will expire after the specified duration.

So far it has been a good learning experience. Cassandra is clearly an excellent fit for storing RHQ's metric data, but I am starting to how it could also be a good fit for other parts of the data model as well.
Monitoring Custom Data from DB Queries
There is an interesting little feature in RHQ that I thought I would quickly mention here. Specifically, it's a feature of the Postgres plugin that let's you track metrics from any generic query you specify.

Suppose you have data in your database and you want to expose that data as a metric. For example, suppose you want to track the total number of users that are currently logged into your application and that information is tucked away in some database table that you can query.

Import your Postgres database into RHQ and manually add a "Query" resource under your Postgres Database Server resource (see the image below where the "Import" menu provides you with the names of the resource types you can manually add as a child to the database server resource - in this case, the only option is the Query resource type).



When you "import" this Query resource through the manual add feature, you will be asked for, among other things, the query that you want to execute that extracts your metric data.



Once you do, you'll have a new Query resource in your RHQ inventory that is now tracking your metric value like any other metric (e.g. you will be able to see the historical values of your data in the graph on the Monitoring tab; you'll be able to alert on those values; etc.)



The one quirky thing about this is the query needs to return a single row of two columns - the first column must have a value of "metricColumn" (literally) and the second column must be a numeric value. To follow the earlier example (tracking the number of users currently logged in), it could be something like:

SELECT 'metricColumn', count(id) FROM my_application_user WHERE is_logged_in = true

That's it. A pretty simple feature, but it seems like this could have a wide range of uses. Hopefully, this little tidbit can spark an idea in your head about how you can use this feature while monitoring your systems.
Enviando notificações XMPP através do RHQ
Este post é para aqueles que vivem o cotidiano das ferramentas de monitoramento de servidores e plataformas, como o RHQ. Normalmente, estas ferramentas possuem a funcionalidade de envio de notificações, quando na ocorrência de algum tipo de alerta: memória insuficiente, sobrecarga no processamento, limite de conexões, etc. A notificação, na maioria dos casos, é realizada […]
Sending RHQ Alerts over XMPP
Rafael has created a very cool server plugin to allow RHQ to send alerts to his Google account over XMPP. Not only that, but he was able to use that same XMPP channel to send commands to the RHQ Server, like another kind of CLI.

Watch his demo
to see how it works. Very awesome!

This is exactly the kind of innovation we envisioned the community being able to do via the server plugins, as I mentioned in my earlier blog entry titled "RHQ Server Plugins - Innovation Made Easy"

Nice job, Rafael!
Managing Compliance Across Multiple Resources
In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

If you've seen my previous blog, it (along with its demo) described how you can monitor changes in files belonging to a single resource. If a resource's files deviate from a trusted snapshot, that resource will be considered out of compliance and the UI will indicate when this happens.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources. This allows you to have a single snapshot which all resources can share. In other words, if you have a cluster of app servers and they all have the same web application deployed, you can pin a snapshot of the in-compliant web application to a drift template that all of your app servers use when they scan for drift. Thus, for an entire cluster, if one of the servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance".

To see how this works, see my "Managing Compliance Across Multiple Resources" demo.
Managing Compliance with Drift Pinning
In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

Drift monitoring provides the ability to monitor changes in files and to determine if those files are in or out of compliance with a desired state. In other words, if I installed an application and someone changes files in that installation, I can be told when those changes occurred and I can analyze those changes.

I put together another demo for my "drift demo series" that illustrates this concept:

How it works is this - suppose you have a set of files on your machine and you don't want those files changed. In other words, the files you have now are "in compliance" with what you want and this set of in compliance files should not be touched. In RHQ/JON, you would create a new drift definition and pin that definition to your current snapshot of files. This pinning effectively marks that snapshot of files as the "in compliant" version. Any changes now made to those files being tracked will be considered drift and out of compliance. In the graphical user interface, you can see what has gone out of compliance and you can drill down to see what files drifted and even what parts of those files drifted.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources allowing you to have a single snapshot which all resources can share. In other words, if I have a cluster of app servers and they all have the same web application deployed, I can pin a snapshot of the in-compliant web application to a drift template that all my app servers use when they scan for drift. Thus, for my entire cluster, if one of my servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance". I will be posting another demo to illustrate this concept in the near future.
JBoss ON 3.0 Released and a Drift Demo
Red Hat has released the next version of JBoss Operations Network. This is version 3.0 and incorporates the new look and feel of the GWT user interface that RHQ introduced recently. The bundle UI has been enhanced a bit as well (it was the only GWT-based portion of the UI that was in JBoss ON 2.4.1).

A new feature introduced in this JBoss ON 3.0 release is drift monitoring. If you have been following the progress of the RHQ upstream project, you'll know that this drift monitoring feature provides administrators with the ability to keep on the look out for unintended changes (either accidental or malicious) to your installed software and other file content.

I posted a demo showing the basics of the new drift feature:
I plan on posting some more demos on drift in the near future.
RHQ enviando Notificações para o Nagios
Fala galera. Muitos usuários têm adotado o RHQ (JON/JOPR) como ferramenta de monitoramento, não só de instâncias JBoss, como também de outros recursos como: sistema operacional, banco de dados, servidores Web, etc.Entretanto, muitos querem centralizar a visibilidade dos alertas gerados pelo RHQ com os alertas de outras ferramentas de monitoramento. Normalmente, essa centralização acontece através […]
Drift Management Coming to RHQ
Introduction
I am excited to share that we are very close to releasing a beta of RHQ 4.1.0. I have been working on Drift Management, one of the new features going into the release. I have been meaning to write a little bit about what this new feature is all about, and now is as good a time as any. I will try to provide a high level overview and save getting into more specific, detailed topics for future posts.

What is Drift?
The first thing we need to do is define what is exactly is meant by the term Drift Management. Let's start with the first part. Conceptually, we can define drift as an unplanned or unintended change to a managed resource. Let's consider a couple examples to illustrate the concept.

We have an EAP server that is configured for production use. That is, things like the JVM heap size, data source definitions, etc. are configured with production values. At some point suppose the heap settings for the EAP server are changed such that they are no longer consistent with what is expected for production use. This constitutes drift.

Now let's consider another example involving application deployment. Suppose we have a cluster of EAP servers that is running our business application. We deploy an updated version of the application. For some reason, one of the cluster nodes does not get updated with the newer version of the application while the others have. We now have a cluster node that does have content that is expected to be deployed on it. This constitutes drift.

Why Do We Care about Drift?
Now that we have looked at some examples to illustrate the concept of drift, there is a perfectly reasonably question to ask. Why should we care? Unplanned or unintended changes frequently lead to problems. Those problems can manifest themselves as production failures, defects, outages, etc. Even with planned, intended changes, problems arise. It is not a question of if but rather when. A production server going down can result in a significant loss of time and money among other things. Anything you can do to be proactive in handling issues when the occur could help save your organization time, money, and resources.

How Will RHQ Manage Drift?
What can RHQ do to deal with drift? First and foremost, it can monitor resources for unintended or unplanned changes. RHQ allows you to specify which resources or which parts of resources you want to monitor for drift. The agent can periodically scan the file system looking for changes. When the agent detects a change, it notifies the server with the details of what has changed.

The server maintains a history of the changes it receives from the agent. This makes it possible for example to compare the state of a resource today versus its state two weeks ago. One of the many interesting and challenging problems we are tackling is how to present that history in meaningful ways so that users can quickly and easily identify changes of interest.

An integral aspect of RHQ's monitoring capabilities is its alerting system. RHQ allows you to define different rules which can result in alerts being triggered. For example, we can create a rule that will trigger an alert whenever an EAP server goes down. Similarly, RHQ could (and will) give you the ability to have alerts triggered whenever drift is detected on any of your managed EAP servers.

Another key aspect of RHQ's drift management functionality is remediation. Some platforms and products provide automatic remediation. Consider the earlier example of the changed heap settings on the EAP server. With automatic remediation, those settings might be reverted back to their orignal values as soon as the change is detected.

Then there is also manual remediation. Think merge conflicts in a version control system. There are lots of visual editors for view diffs and resolving conflicts. A couple that I use are diffmerge and meld. RHQ will provide interfaces and tools for generating and viewing diffs and for performing remediation much in the same way you might with a visual diff editor.

What's Next?
Here is a quick run down of drift management features that will be in the beta:

  • Enable drift managent for individual resources
    • This involves defining the drift configuration or rules which specify what files to monitor for drift and how often monitoring should be done
  • Perform drift monitoring (done by the agent)
  • View change history in the UI
  • Execute commands from the CLI to:
    • Query for change history
    • Generate snapshots
      • A snapshot provides a point in time view of a resource for a specified set of changes
    • Diff snapshots (This is not a file diff)

Here are some notable features that will not be available in the beta:
  • Define filters that specify which files to include/exclude in drift monitoring (Note that you actually can define the filter. They just are not handled by the agent yet)
  • Perform manual remediation (i.e., visual diff editor)
  • Support for golden images (more on this in a future post)
  • Generate/view snapshots in the UI
  • Alerts integration
It goes without saying that there will be bugs, some of which are known, and that functionality in the beta is subject to change in ways that will likely break compatibility with future releases. More information will be provided in the release notes as soon as they are available. Stay tuned!
Telling GWT To Ignore Certain Classes
We recently hit a problem in RHQ that required us to learn about a feature in GWT (specifically the GWT compiler) that was very useful and I thought I would blog about it.

First, some background. In RHQ, we have split up the source code into several "modules". Each module is built by Maven and artifacts are produced (for example, some modules will output a jar file after the build completes). Dependent artifacts are built first, then modules that depend on other modules' artifacts are built afterwards - Maven knows how to maintain the proper dependent hierarchy so it can build modules in the proper order. After our entire suite of modules are built, the build system assembles all the module artifacts into the RHQ distribution. As you can see, there is nothing special here, any complex application needs a build system that does the same thing.

One of RHQ's modules is what we call the "domain" module. This is simply the module that contains the source code for all of our domain objects that map to our data model (in other words, the domain objects simply represent the set of all of our database entities, such as Users, or Roles, or Resources, or Alerts, etc.)

These domain objects are essentially the basic building blocks on which all of RHQ is built. These domain objects are used all throughout the RHQ codebase - in the agent, in the server, in the remote CLI and in the GWT client. All of their modules depend on the domain module - the domain module is one of the first ones built.

Since our GWT client needs these domain objects, the domain module needs to not only be passed through the Java compiler, it needs a second pass through the GWT compiler.

Here is where the problem comes in. Some of our domain objects require that they import certain Java classes that GWT doesn't support. Because the domain objects are used everywhere, they end up needing to provide functionality that is required by the server and agent, not just the GWT client. But this functionality sometimes requires the use of certain JRE features that are not emulated, and thus not available, in the GWT runtime (things such as java.io or java.sql classes).

But how can we do this? If we import these GWT-unsupported JRE classes into some the domain classes, the GWT client cannot load the domain jar and the GWT client becomes dead. But without those JRE classes, the functionality that the domain module needs to provide to the other modules (like the server or agent) becomes broken. It is a catch-22. So, the question becomes, how can we provide all of the domain module functionality to all dependent modules, but remove only portions of it that are not GWT compatible from the GWT client module?

One idea was to create another module - a second domain module - that was compatible with the GWT client (essentially it would be the original domain module, minus the GWT-incompatible code). We could possibly do this by refactoring the original domain objects and creatively using inheritance. But it would give us two domain jars. Adding yet another module to the build isn't something we wanted to do.

As it turns out, after being perplexed at how best to design around this problem, we found out that GWT provides a very, very easy solution to this. You can tell the GWT compiler to filter out and exclude certain classes when it compiles the Java source into GWT Javascript. Since those classes don't get compiled into GWT Javascript, the GWT client never sees them or tries to load them. Without having to split our domain module into two separate modules, we get the effect of having two different domain libraries - one for the GWT client and one for the rest of the RHQ system.

Now we can use normal Java inheritance to share common code while at the same time we ensure that only certain subclasses will use the non-GWT-supported features of the JRE. We can then exclude those subclasses from the GWT compilation thus allowing the GWT client to load the domain module, minus those classes it couldn't use anyway.

To use this feature, we had to add the <exclude> XML element to our GWT module's XML file in order to tell the GWT compiler which Java classes to ignore:


<entry-point class="org.rhq.core.client.RHQDomain" />
<source path="domain">
<exclude name="**/DriftFileBits.*" />
</source>


This essentially tells the GWT compiler to compile all of our domain module's classes except for the DriftFileBits class. If, in the future, we have to introduce other classes that are not supported by GWT, we can add more <exclude> elements to filter out those additional Java packages or individual classes.

One negative aspect of this is that we have to be careful to not "leak" references to those excluded classes into our GWT API or GWT implementation classes. But even if we do, it is caught rather quickly when the GWT client attempts to load the application.
Deploying RHQ Bundles To Non-Platform Resources
I recently completed the initial implementation of a feature that folks have been asking for lately, so I decided to put together a quick demo to show it.

First, some background. The initial RHQ Bundle Provisioning feature (of which I have blogged about in the past) supported the ability to deploy bundle distributions (which are, essentially, just files of generic content) to a group of platforms. That is, given a set of files bundled together, RHQ can ship out those files to a group of platform resources where the agents on those platforms will unbundle the files to the root file system.

This new feature (submitted as enhancement request BZ 644328) now allows you to target your bundle deployments to a group of non-platform resources if you wish. For example (as the demo shows), we can now target bundle deployments to JBossAS Server resources. This means you can bundle up a WAR or an EAR, and push that bundle to a group of JBoss AS Servers such that the WAR or EAR gets deployed directly to the deploy/ directory (and hence gets deployed into your application servers).

The demo is 10 minutes long and shows what a simple workflow looks like to deploy a WAR bundle to a group of JBoss EAP application servers.

View the demo here.

[Note: unlike my previous demos which were built on a Windows laptop using Wink and formatted as Flash, I decided to try Istanbul on my Fedora desktop. So the demo is formatted in .ogg format, as opposed to Flash. Hopefully, this doesn't limit the audience that is able to view the demo.]
Manually Add Resources to Inventory from CLI
Resources in RHQ are typically added to the inventory through discovery scans that run on the agent. The plugin container (running inside the agent) invokes plugin components to discover resources. RHQ also allows you to manually add resources into inventory. There may be times when discovery scans fail to find a resource you want to manage. The other day I was asked whether or not you can manually add a resource to inventory via the CLI. Here is a small CLI script that demonstrates manually adding a Tomcat server into inventory.


The findResourceType and findPlatform functions are pretty straightforward. The interesting work happens in createTomcatConnectionProps and in manuallyAddTomcat. The key to it all though is on line 44. DiscoveryBoss provides methods for importing resources from the discovery queue as well as for manually adding resources. manuallyAddResources expects as arguments a resource type id, a parent resource id, and the plugin configuration (i.e., connection properties).

Determining the connection properties that you need to specify might not be entirely intuitive. I looked at the plugin descriptor as well as the TomcatDiscoveryComponent class from the tomcat plugin to determine the minimum, required connection properties that need to be included.

Here is how the script could be used from the CLI shell:

rhqadmin@localhost:7080$ login rhqadmin rhqadmin
rhqadmin@localhost:7080$ exec -f manual_add.js
rhqadmin@localhost:7080$ hostname = '127.0.0.1'
rhqadmin@localhost:7080$ tomcatDir = '/home/jsanda/Development/tomcat6'
rhqadmin@localhost:7080$ manuallyAddTomcat(hostname, tomcatDir)
Resource:
id: 12071
name: 127.0.0.1:8080
version: 6.0.24.0
resourceType: Tomcat Server

rhqadmin@localhost:7080$

This effectively adds the Tomcat server to the inventory of managed resources. This same approach can be used with other resource types. The key is knowing what connection properties you need to specify so that the plugin (in which the resource type is defined) knows how to connect to and manage the resource.
Making Links Work Right in a SmartGWT App on IE
The GUI of RHQ 4.0 and later is built upon SmartGWT. In many places in the SmartGWT app, we embedded raw HTML to render fragment links (e.g. #Inventory/Servers) to other places in the app. We used raw HTML, rather than widgets, for a few reasons:

  1. SmartGWT does not provide a link widget. GWT provides the HyperLink and InlineHyperLink widgets, but we try to avoid using non-SmartGWT widgets when possible to prevent layout issues or CSS issues caused by straying from the SmartGWT framework. A SmartGWT Label or HTMLFlow can be extended to simulate a link using a ClickHandler but it will not be rendered as an 'a' tag and so will not inherit the CSS styles used for 'a' tags and will not display the link's URL in the browser status bar when the user hovers over the link.
  2. Many of our links are inside ListGrid cells. There is no straightforward reliable way to embed arbitrary widgets in ListGrid cells. I tried using the mechanism http://www.smartclient.com/smartgwt/showcase/#grid_cell_widgets described here and encountered overflow and wrapping issues, which I was unable to overcome. The CellFormatter interface only supports returning a String, but that String can include HTML, so that's what we ended up using for cells that need to contain a link.
  3. For FormItems that need to contain links, CanvasItem can be extended in order to embed GWT HyperLink widgets, but using a StaticTextItem with HTML embedded in its value is more straightforward.
Unfortunately, we noticed that clicking on any of our raw HTML fragment links in IE caused a full page refresh, which is not at all desirable in a GWT app, which is intended to be pure AJAX. Further investigation revealed that this is a longstanding quirk (aka bug) in IE; rather than simply generating a history event for the URL with the updated fragment, it sends an unnecessary request for the URL to the server. If you use the GWT HyperLink widget, GWT uses some JavaScript fanciness involving iframes to circumvent the IE bug and make fragment links work properly. However, since we were using raw HTML for all the links in the RHQ GUI, this magic was not there for us. Converting all our HTML links would be a ton of work and was simply not a viable option for links in ListGrid cells for the reasons described above, so we needed to find a way to execute GWT's magic when any of our raw HTML links were clicked. The answer ended up being to add a native preview event handler that intercepts browser click events and executes the magic if the click was on one of our 'a' tags. We did this by making our EntryPoint class implement the GWT Event.NativePreviewHandler interface as follows:
    public void onPreviewNativeEvent(Event.NativePreviewEvent event) {
if (SC.isIE() && event.getTypeInt() == Event.ONCLICK) {
NativeEvent nativeEvent = event.getNativeEvent();
EventTarget target = nativeEvent.getEventTarget();
if (Element.is(target)) {
Element element = Element.as(target);
if ("a".equalsIgnoreCase(element.getTagName())) {
// make sure it's not a hyperlink that GWT already
// handles
if (element.getPropertyString("__listener") == null) {
String url = element.getAttribute("href");
String historyToken = getHistoryToken(url);
if (historyToken != null) {
GWT.log("Forcing History.newItem(\"" +
historyToken + "\")...");
History.newItem(historyToken);
nativeEvent.preventDefault();
}
}
}
}
}
}

private static String getHistoryToken(String url) {
String token;
if (url.startsWith("#")) {
token = url.substring(1);
} else if (url.startsWith("/#")) {
token = url.substring(2);
} else if (url.contains(Location.getHost()) && url.indexOf('#') > 0) {
token = url.substring(url.indexOf('#') + 1);
} else {
token = null;
}
return token;
}
We then add the native preview handler at app load time by adding the following line to our EntryPoint class:
Event.addNativePreviewHandler(this);
This solution is working great. However, we still might eventually go back and switch over to using GWT HyperLinks, rather than raw HTML, in places where it is feasible, such as FormItems, since it is generally better to use widgets rather than raw HTML to keep things object-oriented and leave the generation of HTML, CSS, and JavaScript to the framework.
Detaching Hibernate Objects to pass to GWT
One thing we encountered pretty quickly when we started our GWT work was the fact that you can't serialize objects over the wire from server to GWT client if those objects were obtained via a Hibernate/JPA entity manager.

If you've ever worked with Hibernate/JPA, you'll know that when you get back entity POJOs whose fields are not loaded (i.e. marked for lazy loading and you didn't ask for the data to be loaded), your entity POJO instance will have Hibernate proxies where you would expect a "null" object to be (this is to allow you to load the data later, if your object is still attached to the entity manager session).

Having these proxies even after leaving a JPA entity manager session is a problem in the GWT world because the GWT client sitting in your browser doesn't have Hibernate classes available to it! Trying to send these entity POJO instances that have references to Hibernate proxies causes serialization errors and your GWT client will fail to operate properly.

This is a known issue and is discussed here.

We pretty quickly decided against using DTOs. As that page above mentioned, "if you have many Hibernate objects that need to be translated, the DTO / copy method creation process can be quite a hassle". We have a lot of domain objects that are used server side in RHQ. There was no reason why we shouldn't be able to reuse our domain objects both server side and client side - introducing DTOs just so we could workaround this serialization issue seemed ill-advised. It would have just added bloat and unnecessary complexity.

I can't remember how mature the Gilead project was at the time we started our GWT work, or maybe we just didn't realize it existed. Gilead does require you to have your domain objects and server side impl classes extend certain Java classes (LightEntity for example), so it has a slight downside that it requires you to modify all your domain objects. In any event, we do not use Gilead to do this detaching of hibernate proxies.

RHQ's solution was to write our own "Hibernate Detach Utility". This is a single static utility that you use to process your objects just prior to sending them over the wire to your GWT client. Essentially it scrubs your object of all Hibernate proxies, cleaning it such that it can be serialized over the wire successfully.

We also used this when we originally developed a web services interface to the RHQ remote API.

Here is the HibernateDetachUtility source code in case you are interested in seeing how we do it - maybe you could use this in your own GWT/Hibernate application. I think it is reuseable - not much custom RHQ stuff is going on in here.
RHQ 4 Has Been Released
It has been a very long road for this one, but we managed to release RHQ 4.

We managed to standardize on GWT as our user interface framework. Here's an example of what the new GWT based UI looks like:


There are still a few JSP pages around, but for the most part, the RHQ GUI is now a GWT application with SmartGWT components. One of the fun parts of this job is to learn new and exciting technologies (though I guess "new" is relative - but I still consider GWT a "new" GUI technology, compared to things like Struts and JSP).

Take a look, give it a test drive and see what you think.
A REPL for the RHQ Plugin Container
Overview
RHQ plugins run inside of a plugin container that provides different services and manages the life cycles of  plugins. The plugin container in turn runs inside of the RHQ agent. If you are not familiar with the agent, it is deployed to each machine you want RHQ to manage. You can read more about it here and here. While the plugin container runs inside of the agent, it is not coupled to the agent. In fact is it used quite a bit outside of the agent. It is used in Embedded Jopr which was intended to be a replacement for the JMX web console in JBoss AS. The plugin container is also used a lot during development in automated tests.

My teammate, Heiko Rupp, has developed a cool wrapper application for the plugin container. It defines a handful of commands for working with the plugin container interactively. What is nice about this is that it can really speed up plugin development. Heiko has written several articles about the standalone container including Working on a standalone PluginContainer wrapper and I love the Standalone container in Jopr (updated). After reading some of his posts I got to thinking that a REPL for the plugin container would be really nice but not just any REPL though. I was thinking specifically about Clojure's REPL.

I have spent some time exploring the different ways Clojure could be effectively integrated with RHQ. There is little doubt in my mind that this is one of them. I recently started working on some Clojure functions to make working with the plugin container easier. I am utilizing Clojure's immutable and persistent data structures as well as some of the other great language features such as first class functions and multimethods. I am trying to make these functions easy enough to use so that someone who might not be a very experienced Clojure programmer might still find them useful during plugin development and testing.

Getting the Code
The project is available on github at https://github.com/jsanda/clj-rhq. It is built with leiningen so you want to get it installed. I typically run a swank server and connect from Emacs, but you can also start a REPL session directly if you are not an Emacs user. The project pulls in the necessary dependencies so that you can work with plugin container-related classes as you will see in the following sections.

Running the Code
These steps assume that you already have leiningen installed. First, clone the project:

git clone https://jsanda@github.com/jsanda/clj-rhq.git

Next, download project dependencies with:

lein deps

Some plugins reply  on native code provided by the Sigar library which you should find at clj-rhq/lib/sigar-dist-1.6.5.132.zip. Create a directory in lib named native and unzip sigar-dist-1.6.5.132.zip there. The project is configured to look for native libraries in lib/native.

Finally, if you are using Emacs run lein swank to start a swank server; otherwise, run lein repl to start a REPL session on the command line.

Starting/Stopping the Plugin Container

The first thing I do is call require to load the rhq.plugin-container namespace. Then I call the start function. The plugin container emits a line of output, and then the function returns nil. Next I verify that the plugin has started up by calling running?.  Then I call the stop function to shutdown the plugin container and finally call running? again to verify that the plugin container has indeed shutdown.

Executing Discovery Scans
So far we have looked at starting and stopping the PC. One of the nice things about working interactively in the REPL is that you are not limited to a pre-defined set of functions. If rhq.plugin-container did not offer any functions for executing a discovery scan, you could write something like the following:


The pc function simply returns the plugin container singleton which gives us access to the InventoryManager. We call InventoryManager's executeServerScanImmediately method and store the InventoryReport object that it returns in a variable named inventory-report. Alternatively you can use the discover function.


On the first call to discover we pass the keyword :SERVER as an argument. This results in a server scan being run. On the second call, we pass :SERVICE which results in a service scan being run. If you invoke discover with no arguments, a server scan is executed followed by a service scan. The two inventory reports from those scans are returned in a vector. The use of the count function to see how many resources were discovered is a good example that demonstrates how you can easily use functions defined outside of the rhq.plugin-container namespace to provide additional capabilities and functionality.

Searching the Local Inventory
Once you have the plugin container running and are able to execute discovery scans, you need a way query the inventory for resources with which you want to work. The inventory function does just that. It can be invoked in one of two ways. In its simpler form which takes no arguments, it returns the platform Resource object. In its more complex form, it takes a map of pre-defined filters and returns a lazy sequence of those resources that match the filters.


inventory is invoked on line 1 without any arguments, and then a string version of it is returned with a call to str. The type is Mac OS X indicating that the object is in fact the platform resource. On line 5 we invoke inventory with a single filter to include resources that are available.  That call shows that there are 62 resources in inventory that are up.  On line 7 we query for resources that are a service and see that there are 60 in inventory. On line 9 we specify multiple filters that will return down services. When multiple filters are specified, a resource must match each one in order to be included in the results. On line 10 we query for webapps from the JBossAS plugin. On line 13 we specify a custom filter in the form of an anonymous function with the :fn key. This filter finds resources that define at least two metrics.

Conclusion
We have looked at a number of functions to make working with the plugin container from the REPL a bit easier. Each function should also include a useful docstring as in the following example,


We have only scratched the surface with the functions in the rhq.plugin-container namesapce. In some future posts we will explore invoking resource operations, updating resource configurations, and deploying resources like EARs and WARs.
Remote Streams in RHQ
The agent/server communication layer in RHQ provides rich, bi-directional communication that is highly configurable, performant, and fault tolerant. And as a developer it has another feature of which I am quite fond - I rarely have to think about it. It just works.

Recently I started working on a new feature that involves streaming potentially large files from agent to server. This work has led me to look under the hood of the comm layer to an extent. The comm layer allows for high-level APIs between server and agent. Consider the following example:

public interface ConfigurationServerService {
...
@Asynchronous(guaranteedDelivery = true)
void persistUpdatedResourceConfiguration(int resourceId, Configuration resourceConfiguration);
}

The agent calls persistUpdateResourceConfiguration when it has detected a resource configuration change that has occurred outside of RHQ. The @Asynchronous annotation tells the communication layer that the remote method call from agent to server can be performed asynchronously. There are no special stubs or proxies that I have to worry about to use this remote API. It is all nicely tucked away in the communication layer.

Several posts could be devoted to discussing RHQ's communication layer but back to my current work of streaming large files. I needed to put in place a remote API on the server so that the agent can upload files. You might consider something like the following as an initial approach:

// Remote API exposed by RHQ server to stream files from agent to server
void uploadFile(byte[] data);

The problem with this approach is that it involves loading the file contents into memory. File sizes could easily exceed several hundred megabytes in size resulting in substantial memory usage that would be impractical. The RHQ agent is finely tuned to keep a low foot print in terms of memory usage as well as CPU utilization. When reading the contents of a large file that is too big to fit into memory, java.io.InputStream is commonly used. With the RHQ communication layer, I am able to expose an API like the following,

// Remote API exposed by RHQ server to stream files from agent to server
void uploadFile(InputStream stream);

With this API, the agent passes an InputStream object to the server. Keep in mind though that none of Java's standard InputStream classes implement Serializable which is a requirement for using objects with a remote invocation framework like RMI or JBoss Remoting. Fortunately for me RHQ provides the RemoteInputStream class which extends java.io.InputStream. The Javadocs from that class state,

This is an input stream that actually pulls down the stream data from a remote server. Note that this extends InputStream so it can be used as any normal stream object; however, all methods are overridden to actually delegate the methods to the remote stream.

When the agent wants to upload a file, it calls uploadFile passing a RemoteInputStream object. The server can then read from the input stream just as it would any other input stream unbeknownst to it that the bytes are being streamed over the wire.

While I find myself impressed with RemoteInputStream, it gets even better. I wanted to read from the stream asynchronously. When the agent calls uploadFile, instead of reading from the stream in the thread handling the request, I fire off a message to a JMS queue to free up the thread to service other agent requests. I am able to pass the RemoteInputStream object in a JMS message and have a Message Driven Bean then read from the stream to upload the file from the agent.

This level of abstraction along with the performance, fault tolerance, and stability characteristics of the agent/server communication layer makes it one of those hidden gems you do not really appreciate until you have to look under the hood so to speak. And rarely if ever do I find myself having to look under the hood because... it just works. Lastly, I should point out that there is a RemoteOutputStream class that compliments the RemoteInputStream class.
RHQ Bundle Recipe for Deploying JBoss Server
My colleague mazz wrote an excellent blog post that describes in detail the provisioning feature of RHQ. The post links to a nice Flash demo he put together to illustrates the various things he discusses in his article. Taking what I learned from his post, I put together a simple recipe to deploy a JBoss EAP server and then start the server after it has been laid down on the destination file system. Here is the recipe:


The bundle declaration itself on lines 4 - 11 is pretty straightforward. If this part is not clear, read through the docs on Ant bundles. Where things became a little less than straightforward is with the <exec> task starting on line 18. The first problem I encountered was Ant saying that it could not find run.sh. I think this is because it was looking for it on my PATH. Adding resovleexecutable="true" on line 21 took care of this problem. This tells Ant to look for the executable in the specified execution directory.

On line 22 I specify arguments to run.sh. -b 0.0.0.0 tells JBoss to bind to all available addresses. Initially I had line 22 written as:

<arg value="-b 0.0.0.0"/>

That did not get parsed correctly and resulting in JBoss throwing an exception with an error message saying that an invalid bind address was specified. Specifying the line attribute instead of the value attribute fixed the problem.

The last problem I encountered was Ant complaining that it did not have the necessary permissions to execute run.sh. It turned out that when the EAP distro was unpacked, the scripts in the bin were not executable. This is why I added the <chmodgt; call on line 17. It seems that the executable file mode bits are getting lost somewhere along the way in the deployment process. I went ahead and filed a bug for this you. You can view the ticket here.

After working through these issues, I was able to successfully deploy my JBoss server and have it start up without error. Now I can easily deploy my bundle a single machine, a cluster of RHEL servers that might serves as a QA or staging environment, or even a group heterogeneous machines that could consist of Windows, Fedora (or other Linux distros), and Mac OS X. Very cool! Provisioning is still a relatively new feature to RHQ. It add tremendous value to the platform, and fortunately I think it can add even more value. One of the things I would like to see is more support for common tasks like starting/stopping a server whether it is in the form of custom Ant tasks or something else.
Writing an RHQ Plugin in Clojure
Clojure is a new, exciting language. My biggest problem with it is that I do not find enough time to work with it. One of the ways I am trying to increase my exposure to Clojure is by exploring ways of integrating it into RHQ. RHQ is well-suited for integrating non-Java, JVM languages because it was designed and built to be extended. In previous posts I have talked about various extension points including agent plugins, server plugins, and remote clients.

I decided to write an agent plugin in Clojure. If you are not familiar with RHQ plugins or what is involved with implementing one, check out this excellent tutorial from my colleague Heiko. Right now, I am just doing exploratory work. I have a few goals in mind though as I go down this path.

First, I have no desire to wind up writing Java in Clojure. By that I mean that I do not want to get bogged down dealing with mutable objects. One of the big draws to Clojure for me is that it is a purely functional language with immutable data structures; so, as I continue my exploration of integrating Clojure with RHQ, I want to write idiomatic Clojure code to the greatest extent possible.

Secondly, I want to preserve what I like to think of as the Clojure development experience. Clojure is a very dynamic language in which functions and name spaces can be loaded and reloaded on the fly. The REPL is an invaluable tool. It provides instant feedback. In my experience Test-Driven Development usually results in short, quick development iterations. TDD + REPL produces extremely fast development iterations.

Lastly, I want to build on the aforementioned goals in order to create a framework for writing RHQ plugins in Clojure. For instance, I want to be able to run and test my plugin in a running plugin container directly from the REPL. And then when I make a change to some function in my plugin, I want to be able to just reload that code without having to rebuild or redeploy the plugin.

Now that I have provided a little background on where I hope to go, let's take a look at where I am currently. Here is the first cut at my Clojure plugin.


I am using gen-class to generate the plugin component classes. As you can see, this is just a skeleton implementation. Here is the plugin descriptor.


I have run into some problems though when I deploy the plugin. When the plugin container attempts to instantiate the plugin component to perform a discovery scan, the following error is thrown:


I was not entirely surprised to see such an error because I have heard about some of the complexities involved with trying to run Clojure in an OSGi container, and the RHQ plugin container shares some similarities with OSGi. There is a lot of class loader magic that goes on with the plugin container. For instance, each plugin has its own class loader, plugins can be reloaded at runtime, and the container limits the visibility of certain classes and packages. I came across this Clojure Jira ticket, CLJ-260, which talks about setting the context class loader. Unfortunately this did not help my situation because the context class loader is already set to the plugin class loader.

After spinning my wheels a bit, I decided to try a different approach. I implemented my plugin component class in Java, and it delegates to a Clojure script. Here is the code for it.


And here is the Clojure script,


This version deploys without error. I have not fully grokked the class loading issues, but at least for now, I am going to stick with a thin Java layer that delegates to my Clojure code. Up until now, I have been using leiningen to build my code, but now that I am looking at a mixed code base, I may consider switching over to Maven. I use Emacs and Par Edit for Clojure, but I use IntelliJ for Java. The IDE support for Maven will come in handy when I am working on the Java side of things.
Server-Side Scripting in RHQ
Introduction
The RHQ platform can be extended in several ways, most notably through plugins that run on agents. There also exists the capability to extend the platform's functionality with scripting via the CLI. The CLI is a remote client, and even scripts that are run on the same machine on which the RHQ server resides are still a form of client-side scripting because they run in a separate process and operate on a set of remote APIs exposed by the server.

In this post I am going to introduce a way to do server-side scripting. That is, the scripts are run in the same JVM in which the RHQ server is running. This form of scripting is in no way mutually exclusive to writing CLI scripts; rather, it is complementary. While a large number of remote APIs are exposed through the CLI, they do not encompass all of the functionality internal to the RHQ server. Server-side scripts however, have full and complete access to the internal APIs of the RHQ server.


Server Plugins
RHQ 3.0.0 introduced server plugins which are distinct from agent plugins. Server plugins run directly in the RHQ server inside a server plugin container. Unlike agent plugins, they do not perform any resource discovery. The article, RHQ Server Plugins - Innovation Made Easy, provides a great introduction to server plugins. Similar to agent plugins, server plugins can expose operations which can be invoked from the UI. They can also be configured to run as scheduled jobs. Server plugins have full access to the internal APIs of the RHQ server. Reference documentation for server plugins can be found here. The server-side scripting capability we are going to look at is provided by a server plugin.

Groovy Script Server
Groovy Script Server is a plugin that allows you to dynamically execute Groovy scripts directly on the RHQ server. Documentation for the plugin can be found here. The plugin currently provides a handful of features including,
  • Customizable classpath per script
  • Easy access to RHQ EJBs through dynamic properties
  • An expressive DSL for generating criteria queries

An Example
Now that we have introduced server plugins and the Groovy Script Server, it is time for an example. A while back, I wrote a post on a way to auto-import resources into inventory using the CLI. We will revisit that script, written as a server-side script.

resourceIds = []

criteria(Resource) {
filters = [inventoryStatus: InventoryStatus.NEW]
}.exec(SubjectManager.overlord) { resourceIds << it.id }

DiscoveryBoss.importResources(SubjectManager.overlord, (resourceIds as int[]))

On line three we call the criteria method which is available to all scripts. This method provides our criteria query DSL. Notice that the method takes a single parameter - the class for which the criteria query is being generated. Filters are specified as a map of property names to property values.
Properties names are derived from the various addFilterXXX methods exposed by the criteria object being built. In this instance, the filter corresponds to the method ResourceCriteria.addFilterInventoryStatus.

The criteria method returns a criteria object that corresponds to the class argument. In this example, a ResourceCriteria object is returned. Notice that exec is called on the generated ResourceCriteria object. This method is dynamically added to each generated criteria object. It takes care of calling the appropriate manager which in this case is ResourceManager. exec takes two arguments - a Subject and a closure. Most stateless session bean methods in RHQ go through a security layer to ensure that the user specified by the Subject has the necessary permissions to perform the requested operation. In the CLI, you may have noticed that you do not have to pass a Subject to the various manager methods. This is because the CLI implicitly passes the Subject corresponding to the logged in user. The second argument, a closure, is called once for each entity in the results returned from exec.

Let's look at a second example that builds off of the previous one. Instead of auto-importing everything in the discovery queue, suppose we only want to import JBoss AS 5 or or AS 6 instances.

resourceIds = []

criteria(Resource) {
filters = [
inventoryStatus: InventoryStatus.NEW,
resourceTypeName: 'JBossAS Server',
pluginName: 'JBossAS5'
]
}.exec(SubjectManager.overlord) { resourceIds << it.id }

DiscoveryBoss.importResources(SubjectManager.overlord, (resourceIds as int[]))

Here we add two additional filters for the resource type name and the plugin. If we did not filter on the plugin name in addition to the resource type name, then our results could include JBoss AS 4 instances which we do not want.

Future Work
The Groovy Script Server, as well as server plugins in general, are relatively new to RHQ. There are some enhancements that I already have planned. First, is adding support for running scripts as scheduled jobs. This is one of the big features of server plugins. With support for scheduled jobs, we could configure the auto-inventory script to run periodically freeing us from manually having to log into the server to execute the script. The CLI version of the script could be wrapped in a cron job. If we did that with the CLI script though, we might want to include some error handling logic in case the server is down or otherwise unavailable. With the server-side scheduled job, we do not need that kind of error handling logic.

The second thing I have planned is to put together additional documentation and examples. With the work that has already been done, the server-side scripting capability opens up a lot of interesting possibilities. I would love to hear feedback on how you might utilize the script server as well as any enhancements that you might like to see.
RHQ: Deleting Agent Plugins
Introduction
RHQ is an extensible management platform; however, the platform itself does not provide the management capabilities. For example, there is nothing built into the platform for managing a JBoss AS cluster. The platform is actually agnostic of the actual resource and types of resources it manages, like the JBoss AS cluster. The management capabilities for resources like JBoss AS are provided through plugins. RHQ's plugin architecture allows the platform to be extended in ways such that it can manage virtually any type of resource.

Plugin JAR files can be deployed and installed on an RHQ server (or cluster of servers), they can be upgraded, and they can even be disabled. They cannot however be deleted. In this post, we spend a little bit of time exploring plugin management, from the perspectives of installing and upgrading to disabling them. Then we consider my recent work for deleting plugins.

Installing Plugins
Plugins can be installed in one of two ways. The first involves copying the plugin JAR file to /jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins . And starting with RHQ 3.0.0, you can alternatively copy the plugin JAR file to /plugins which is arguably easier the much shorter path. The RHQ server will periodically scan these directories for new plugin files. When a new or updated plugin is detected, the server will deploy the plugin. This approach is particularly convenient during development when the RHQ server is running on the same machine on which I am developing. In fact, RHQ's Maven build is set up to copy plugins to a development server as part of the build process.

The second approach to installing a plugin involves uploading the plugin file through the web UI. The screenshot below shows the UI for plugin file upload.



 Deploying plugins through the web UI is particularly useful when the plugin is on a different file system that the one on which the RHQ server is running. It is worth noting that there currently is no API exposed for installing plugins through the CLI.

Upgrading Plugins
The platform not only supports deploying new plugins that previously have not been installed in the system, but it also supports upgrading existing plugins. From a user's perspective there really is no difference in upgrading a plugin versus installing one for the first time. The steps are the same. And the RHQ server, for the most part, treats both scenarios the same as well.

Installing a new or upgraded plugin does not affect any agents that are currently running. Agents have to be explicitly updated in one of a number of ways including,
  • Restarting the agent
  • Restarting the plugin container
  • Issuing the plugins update command from the agent prompt
  • Issuing a resource operation for one of the above. This can be done from the UI or from the CLI
  • Issuing a resource operation for one of the above from a server script.
Disabling Plugins
Installed plugins can be disabled. Disabling a plugin results in agents ignoring that plugin once the agent is restarted (or more precisely, when the plugin container running inside the agent is restarted). The plugin container will not load that plugin, which means resource components, discovery components, and plugin classloaders are not loaded. This results in a reduced memory footprint of the agent. It also reduces overall CPU utilization since the agent's plugin container is performing fewer discovery and availability scans.

Plugins can be disabled on a per-agent basis allowing for a more heterogeneous deployment of agents. For instance, I might have a web server that is only running Apache and the agent that is monitoring it, while on another machine I have a JBoss AS instance running.  I could disable the JBoss-related plugins on the Apache box freeing up memory and CPU cycles. Likewise, I can disable the Apache plugins on the box running JBoss AS.

When a plugin is disabled, nothing is removed from the database. Any resources already in inventory from the disabled plugin remain in inventory. Type definitions from the disabled plugin also remain in the system.

Deleting Plugins
Recently I have been working on adding support for deleting plugins. Deleting a plugin not only deletes the actual plugin from the system, but also everything associated with it including all type definitions and all resources of the types defined in the plugin. When disabling a plugin, the plugin container has to be explicitly restarted in order for it to pick up the changes. This is not the case though with deleting plugins. Agents periodically send inventory reports to the server. If the report contains a resource of a type that has been deleted, the server rejects the report and tells the agent that it contains stale resource types. The agent in turn recycles its plugin container, purging its local inventory of any stale types and updating its plugins to match what is on the server. No type definitions, discovery components, or resource components from the plugin will be loaded

Use Cases for Plugin Deletion
There are a number of motivating use cases for supporting plugin deletion. The most import of these might be the added ability to downgrade a plugin. But we will also see the benefits plugin deletion brings to the plugin developer.

Downgrading Plugins
We have already mentioned that RHQ supports upgrading plugins. It does not however support downgrading a plugin. Deleting a plugin effectively provides a way to rollback to a previous version of a plugin. There may be times in a production deployment for example when a plugin does not behave as expected or as desired. Users currently do not have the capability to downgrade to a previous version of that plugin. Plugin deletion now makes this possible.

Working with Experimental Plugins
Working with an experimental plugin or one that might not be ready for production use carries with it certain risks. Some of those risks can be mitigated with the ability to disable a plugin; however, the plugin still exists in the system. Resources remain in inventory. Granted those resources can be deleted easily enough, but there is still some margin for error in so far as failing to delete all of the resources from the plugin or accidentally deleting the wrong resources. And there exists no way to remove type definitions such as metric definitions and operation definitions without direct database access. Having the ability to delete a plugin along with all of its type definitions and all instances of those type definitions completely eliminates these risks.

Simplifying Plugin Development
A typical work flow during during plugin development includes incremental deployments to an RHQ server as changes are introduced to the plugin. Many if not all plugin developers have run into situations in which they have to blow away their database due to changes made in the plugin (This normally involves changes to type definitions in the plugin descriptor). This slows down development, sometimes considerably. Deleting a plugin should prove much less disruptive to a developer's work flow than having to start with a fresh database installation, particularly when a substantial amount of test data has been built up in the database. To that extent, I can really see the utility in a Maven plugin for RHQ plugin development that deploys the RHQ plugin to a development server. The Maven plugin could provide the option to delete the RHQ plugin if it already exists in the system before deploying the new version.

Conclusion
Development for the plugin deletion functionality is still ongoing, but I am confident that it will make it into the next major RHQ release. If you are interested in tracking the progress or experimenting with this new functionality, take a look at the delete-agent-plugin branch in the RHQ Git repo. This is where all of the work is currently being done. You can also check out this design document which provides a high level overview of the work involved.
Dealing with Asynchronous Workflows in the CLI
Introduction
There is constant, ongoing communication between agents and servers in RHQ. Agents at regularly scheduled intervals for example send inventory and availability reports up to the server. The server sends down resource-related requests such as updating a configuration or executing a resource operation. Examples of these include updating the connection pool setting for a JDBC data source and starting a JBoss AS server. Some of these work flows are performed in a synchronous manner while others are carried out in an asynchronous fashion. A really good example of an asynchronous work flows is scheduling a resource operation to execute at some point in the future. There is a common pattern used in implementing these asynchronous work flows. We will explore this pattern in some detail and then consider the impacts on remote clients like the CLI.

The Pattern
The asynchronous work flows are most prevalent in requests that produce mutative actions against resources. Let's go through the pattern.
  • A request is made on the server to take some action against a resource (e.g., invoke an operation, update connection properties, update configuration, deploy content, etc.)
  • The server logs the request on the audit trail
  • The server sends the request to the agent
    • Note that control is return back to the server immediately after sending the request to the agent. This means that the call to the agent will likely return before the requested action has actually been carried out.
  • The plugin container (running in the agent) invokes the appropriate resource component
  • The resource component carries out the request and reports the results back to the plugin container
  • The agent sends the response back to the server. The response will indicate success or failure.
  • The server updates the audit trail indicating that the request has completed and also whether it succeeded or failed.
    • Note that it is the same request that was originally logged on the original audit trail that is updated
Let's revisit the earlier example of scheduling an operation to start a JBoss server. Suppose I schedule the operation to execute immediately. Then I navigate to the operation history page for the JBoss server. I will see the operation request listed in the history. The history page is a view of the audit trail. The operation shows a status of In Progress. We could continually refresh the page until we see the status change. Eventually it will change to Success or Failure. The status does not necessarily change immediately after the operation completes. It changes after the agent reports the results back to the server and the audit trail is updated.

As previously stated, this pattern is very common throughout RHQ. Consider making a resource configuration update which is performed asynchronously as well. Once I submit submit the configuration update request, I can navigate to the configuration history page to check the status of the request. The status of the update request will show in progress until the agent reports back to the server that the update has completed. When the agent reports back to the server, the corresponding audit trail entry is updated with the results. The same pattern can also be observed when manually adding a new resource into the inventory.

Understanding the Impact to the CLI
So what does this asynchronous work flow mean for remote clients, notably CLI scripts? First and foremost, you need to understand when and where requests are carried out asynchronously to avoid unpredictable, unexpected results. We will discuss a number of things can potentially impact how you think about and how you write CLI scripts.

A method that returns without error does not necessarily mean that the operation succeeded
Let's say we have a requirement to write a script that performs a couple resource configuration  updates, but we only want to perform the second update if the first one succeeds. We might be inclined to implement this as follows,

ConfigurationManager.updateResourceConfiguration(resourceId, firstConfig);
ConfigurationManager.updateResourceConfiguration(resourceId, secondConfig);

Provided we are logged in as a user having the necessary permissions to update the resource configuration and provided the agent is online and available, the first call to updateResourceConfiguration will return without error. We proceed to submit the second configuration change, but the first update might have actually failed. With the code as is we could easily wind up violating the requirement of applying the second update only if the first succeeds. What we need to do here essentially is to block until the first configuration update finishes so that we can verify that it did in fact succeed. This can be  implemented by polling the ResourceConfigurationUpdate object that is returned from the call to updateResourceConfiguration.

ConfigurationManager.updateResourceConfiguration(resourceId, firstConfig);
var update = ConfigurationManager.getLatestResourceConfiguration(resourceId);
while (update.status == ConfigurationUpdateStatus.INPROGRESS) {
java.lang.Thread.sleep(2000); // sleep for 2 seconds
update = ConfigurationManager.getLatestResourceConfiguration(resourceId);
}
if (update.status == ConfigurationUpdateStatus.SUCCESS) {
ConfigurationManager.updateResourceConfiguration(resourceId, secondConfig);
}

The ResourceConfigurationUpdate object is our audit trail entry. The object's status will change once the resource component (running in the plugin container) finishes applying the update and the agent sends the response back to the server.

Resource proxies offer some polling suppport
 Resource proxies greatly simplify working with a number of the RHQ APIs. Invoking resource operations is one those enhanced areas. With a resource proxy, operations defined in the plugin descriptor appear as first class methods on the proxy object. This allows us to invoke a resource operation in a much more concise and intuitive fashion. Here is a brief example.

var jbossServerId1 = // look up resource id of JBoss server 1
var jbossServerId2 = // look up resource of JBoss server 2
server1 = ProxyFactory.getResource(jbossServerId1);
server2 = ProxyFactory.getResource(jbossServerId2);
server1.start();
server2.start();

The call to server1.start() does not immediately return. It polls the status of the operation waiting for it to complete.  The proxy sleeps for a short delay and the fetches the ResourceOperationHistory object that was logged for the request. If a history object is found and if its status is something other than in progress, then the proxy returns the operation's results. If the history object indicates that the operation has not yet completed, the proxy will continue polling.

Resource proxies provide some great abstractions that simplify working in the CLI. The polling that is done behind the scenes for resource operations is yet another useful abstraction in that it makes a resource operation request look like a regular, synchronous method call. The polling however, is somewhat limited. We will take a closer look at some of the implementation details to better understand how it all works.

The delay or sleep interval is fixed
The thread in which the proxy is running sleeps for one second before it polls the history object. There is currently no way to specify a different delay or sleep interval. In many cases the one second delay should be suitable, but there might be situations in which a shorter or longer delay is preferred.

The number of polling intervals is fixed
The proxy will poll the ResourceOperationHistory at most ten times. There is currently no way to specify a different number of intervals. If after ten times, the history still has a status of in progress, the proxy simply returns the incomplete results. Or if no history is available, null is returned. In many cases the polling delays and intervals may be sufficient for operations to complete, but there is no guarantee.

The proxy will not poll indefinitely
This is really an extension of the last point about not being able to specify the number of polling intervals. There may be times when you want to block indefinitely until the operation completes. Resource proxies currently do not offer this behavior.

Polling cannot be performed asynchronously
Let's say we want to start ten JBoss servers in succession. We want to know whether or not they start up successfully, but we are not concerned with the order in which they start. In this example some form of asynchronous polling would be appropriate. Let's further assume that each proxy winds up polling the maximum of ten intervals. Each call to server.start() will take a minimum of ten seconds plus whatever time it takes to retrieve and check the status of the ResourceOperationHistory. We can then conclude that it will take over 90 seconds to invoke the start operation on all of the JBoss servers. This could turn out to be very inefficient. In all likelihood, it would be faster to schedule the start operation, have control return back to the script immediately, and then schedule each subsequent operation. Then the script could block until all of the operations have completed.

As an aside, the previous example might better be solved by creating a resource group for the JBoss servers and then invoking the operation once on the entire group. The problems however, still manifest themselves with resource groups. Suppose we want to call operation O1 on resource group G1, followed by a call to operation O2 on group G2, followed by O3 on G3, etc. We are essentially faced with the same problems but now on a larger scale.

There is no uniform Audit Trail API
Scheduling a resource operation, submitting a resource configuration update, deploying content, etc. are generically speaking all operations that involve submitting a request to an agent (or multiple agents in the case of a group operation) for some mutative change to be applied to one or more resources.  In each of the different scenarios, an entry is persisted on the respective audit trails. For example, with a resource operation, a ResourceOperationHistory object is persisted. When deploying a new resource (i.e., a WAR file), a CreateResourceHistory object is persisted. With a resource configuration change, a ResourceConfigurationUpdate is persisted. Each of these objects exposes a status property that indicates whether the request is in progress, has succeeded, or has failed. Each of them also exposes an error message property that is populated if the request fails or an unexpected error occurs.

Unfortunately, there is no common base class shared among these audit trail classes in which the status and error message properties are defined. This makes writing a generic polling solution more challenging, at least if the solution is to be implemented in Java. A solution in a dynamic language, like JavaScript, might prove easier since we can rely on duck typing. We could implement a generic solution that works with a status property, without regard to an object's type.

Conclusion
It is important to understand the work flows and communication patterns described here as well as the current limitations in resource proxies in order to write effective CLI scripts that have consistent behavior and predictable results. Consistent behavior and predictable results certainly do not mean that the same results are produced every time a script is run. It does mean though that given certain conditions, we can make valid assumptions that hold to be true. For example, if we execute a resource operation and then block until the ResourceOperationHistory status has changed to SUCCESS, then we can reasonably assume that the operation did in fact complete successfully.

Many of the work flows in RHQ are necessarily asynchronous, and this has to be taken into account when working with a remote client like the CLI. Fortunately, there are many ways we can look to encapsulate much of this, shielding developers from the underlying complexities while at the same time not limiting developers in how they choose to deal with these issues.
Updating Metric Collection Schedules
One of the primary features of RHQ is monitoring. Metrics can be collected for resources across the inventory, that metric data is aggregated, and then available for viewing in graphs and tables. Measurements are collected at scheduled intervals. These collection schedules are configurable on a per-resource or per-group basis. Collection intervals can be increased or decreased, and metric collections can be turned on or off as well. There might be any number of reasons why you might want to adjust metric collection schedules. One reason might be that collections are occurring too frequently and in turn causing performance degradation on the host machine.

RHQ exposes APIs for updating measurement schedules through the CLI. I find that some of the APIs exposed through the CLI are not the most intuitive or require a thorough understanding of the RHQ domain classes and APIs. Some of the APIs for dealing with measurements fall into this category. I have started working on putting together some utility scripts that can serve as higher-level building blocks for CLI scripts. My aim is to simplify various, common tasks when and where possible. I put together a JavaScript class that offers a more data-driven approach for updating measurement schedules. Let's look at an example.


The first thing we do is create an instance of MeasurementModule. This class expose properties and methods for working with measurements, most notably the method updateSchedules. We call this method on line 15. updateSchedules takes a single object which specifies the measurement schedule changes. That object is defined on lines 5 - 13. It has to define three properties - context, id, and schedules.

context accepts only two values, 'Resource' or 'Group'. This property declares whether the update is for an individual resource or for a resource group.

The value of context determines how id is interpreted. It refers either to a resource id or to a resource group id.

schedules is essentially a map that declares which schedules are to be updated. The keys are the measurement display names as you see them in the RHQ UI. The values can be one of three things. It can be the strings 'enabled' or 'disabled' indicating that that measurement collection should be enabled or disabled respectively. Or the value can be the collection interval specified as an integer. The collection interval is stored in milliseconds. Most collection intervals are on the order of minutes though.

It is a lot easier to read and write 30 minutes as instead of 1800000 milliseconds. To address this, MeasurementModule exposes the interval method to which we declare a reference on line 2 to facilitate in calculating the interval in a more readable way. We use MeasurementModule.time in conjunction with this method. time has three properties - seconds, minutes, hours. We see these in use on lines 9 and 11.

I think (hope) MeausurementModule offers a fairly straightforward approach for updating measurement schedules. It should allow you to make to make programmatic updates without having an in-depth understanding of the underling APIs.

There is at least one additional enhancement that I already intend to make. I want to provide a way to specify an update that applies to multiple schedules. Maybe something along these lines,


In this example we are updating schedules for a compatible group. We specify a collection interval of one hour for Measurement A, disable Measurement B, and all of the rest of the measurements are set to a collection interval of 30 minutes. I see this as a useful feature and will write another post when I have it implemented.

There is one last annoying detail that needs to be discussed before you start using MeausurementModule. The class uses some of the functions described in Utility Functions for RHQ CLI. Those functions are defined in another source file. When using the CLI in interactive mode, you can use the exec command to execute a script. That command (or an equivalent method/function) however is not available in non-interactive mode. I filed a bug for this a little while back. You can track the progress here if you are interested. This means that for now you need to run in interactive mode. Let's walk through a final example tying all of this together. Assume we are have already logged in through the CLI.

rhqadmin@localhost:7080$ exec -f path/to/util.js
rhqadmin@localhost:7080$ exec -f path/to/measurement_utils.js
rhqadmin@localhost:7080$ measurementModule = new MeaurementModule()
rhqadmin@localhost:7080$ // do some stuff with measurementModule

MeausrementModule is defined in the file measurement_utils.js. A few scripts including measurement_utils.js are shiped in the latest release of RHQ which can be downloaded from here. They are packaged in the CLI in the samples directory.
Building an RHQ Client
RHQ exposes a set of APIs that can be used to build a remote client. The CLI is one such consumer of those APIs. The documentation for building your own remote client says to get the needed libraries from the CLI distribution. If you are using a build tool such as Maven, Gradle, Buildr, etc. that provides dependency handling, it would be easier to be provided with the dependency information needed to integrate into your existing build.

Last week I started building my own client and hit a couple snags. The CLI pulls in dependencies from the module rhq-remoting-client-api; so, I hoped that declaring a dependency on that module was all that I needed.



org.rhq
rhq-remoting-client-api
3.0.0



This pulls in a number of libraries. When I started my simple client though, I got the following error,

java.lang.NoClassDefFoundError: EDU/oswego/cs/dl/util/concurrent/SynchronizedLong (NO_SOURCE_FILE:3)

After a little digging I realized that I needed to add the following dependency.


oswego-concurrent
concurrent
1.3.4-jboss-update1


After rebuilding and restarting my client, I got a different exception.

org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Response: OK/200.
at org.jboss.remoting.transport.http.HTTPClientInvoker.useHttpURLConnection(HTTPClientInvoker.java:348)
at org.jboss.remoting.transport.http.HTTPClientInvoker.transport(HTTPClientInvoker.java:137)
at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:122)
at org.jboss.remoting.Client.invoke(Client.java:1634)
at org.jboss.remoting.Client.invoke(Client.java:548)
at org.jboss.remoting.Client.invoke(Client.java:536)
at org.rhq.enterprise.client.RemoteClientProxy.invoke(RemoteClientProxy.java:201)
at $Proxy0.login(Unknown Source)

I started comparing the dependencies that I was pulling in versus what the CLI used. I was definitely pulling in some additional libraries that are not included in the CLI. While the above exception does not convey a whole lot of information, I knew that it was occurring because I had classes on my local classpath that I should be getting from the server. When I managed to get my dependencies to match up with the CLI, I got past the exceptions.

For other folks who want to build their own RHQ client, this dependency situation can be a bit of a mess. Last night I pushed a new module into the RHQ git repo that alleviates this problem. Now you only need the following dependency,


org.rhq
remote-client-deps
4.0.0-SNAPSHOT
pom


With this, you will pull in only those dependencies that are needed to build your own RHQ client. You do not need to declare any additional RHQ dependencies. You can view the source for the remote-cliet-deps module here.
Utility Functions for RHQ CLI
I have done a fair amount of programming in Groovy, and one of the things that I have quickly gotten accustomed to is closures. Among other things, closure provide an elegant solution for things like iteration by encapsulating control flow. The following example illustrates this.

[1, 2, 3].each { println it }

When working in the CLI however, I have to fall back to a for loop. In the case of a JavaScript array, I do not have to worry about a loop counter.

var array = [1, 2, 3];
for (i in array) println(array[i]);

A lot of the remote APIs, particularly query methods, return a java.util.List. In these situations, I have to use a loop counter.

var list = new java.util.ArrayList();
list.add(1);
list.add(2);
list.add(3);
for (i = 0; i < list.size(); i++) println(list.get(i));
Recently I had to write a script in which I was executing a number of criteria queries and then iterating over the results. Needless to say, I quickly found myself missing the methods in languages like Groovy and Ruby that provide control flow with closures; so, I wrote a few utility functions to make things a bit easier.
// Iterate over a JS array
var array = [1, 2, 3];
foreach(array, function(number) { println(number); });

// Iterate over a java.util.Collection
var list = new java.util.ArrayList();
list.add(1);
list.add(2);
list.add(3);
foreach(list, function(number) { println(number); });

// Iterate over a java.util.Map
var map = new java.util.HashMap();
map.put(1, "ONE");
map.put(2, "TWO");
map.put(3, "THREE");
foreach(map, function(key, value) { println(key + " --> " + value); });

// Iterate over an object
var obj = {x: "123", y: "456", z: "678"};
foreach(obj, function(name, value) { println(name + ": " + value); });
The foreach function is fairly robust in that it provides iteration over several different types including JavaScript arrays, Java collections and maps, and generic objects. In the case of an array or collection, the callback function takes a single argument. That argument will be each of the elements in the array or collection. In the case of a map or object, the callback function is passed two arguments. For the map the callback is passed the key and the value of each entry. For a generic object, the callback is passed each of the object's properties' names and values.

The find function iterates over an array, collection, map, or object in the same way that foreach does. The callback function though is a predicate that should evaluate to true or false. Iteration will stop when the first match is found, that is when the predicate returns true, and that value will be returned. Here are a couple examples to illustrate its usage.
// Find first number less than 3
var array = [1, 2, 3]
// prints "1"
println("found: " + find(array, function(number) { number < 3; }));

// Find map entry with a value of 'TWO'
var map = new java.util.HashMap();
map.put(1, "ONE");
map.put(2, "TWO");
map.put(3, "THREE");
var match = find(map, function(key, value) { return value == 'TWO'; });
// prints "found: 2.0 --> TWO"
println("found: " + match.key + " --> " + match.value);

When find is iterating over a generic object or map, the first match will be returned as an object with two properties - key and value. Lastly, there is a findAll function that is similar to find except that it returns all matches in a java.util.List.

These functions can be found in the RHQ git repo at rhq/etc/cli-scripts/util.js. These functions are neither part of nor distributed with the CLI; however, they may be included in a future release so that the functions would be available to any CLI script.
Auto Import Resources into Inventory
There is no way currently in RHQ through the UI to auto-import resources. You have to go to the discovery queue to explicitly select resources to import. Here is a short CLI script for auto-importing resources.

// auto_import.js
rhq.login('rhqadmin', 'rhqadmin');

var resources = findUncommittedResources();
var resourceIds = getIds(resources);
DiscoveryBoss.importResources(resourceIds);

rhq.logout();

// returns a java.util.List of Resource objects
// that have not yet been committed into inventory
function findUncommittedResources() {
var criteria = ResourceCriteria();
criteria.addFilterInventoryStatus(InventoryStatus.NEW);

return ResourceManager.findResourcesByCriteria(criteria);
}

// returns an array of ids for a given list
// of Resource objects. Note the resources argument
// can actually be any Collection that contains
// elements having an id property.
function getIds(resources) {
var ids = [];
for (i = 0; i < resources.size(); i++) {
ids[i] = resources.get(i).id;
}
return ids;
}

In the function findUncommittedResources() we query for Resources objects having an inventory status of NEW. This results in a query that retrieves discovered resources that have been "registered" with the RHQ server (i.e., stored in the database) but not yet committed into inventory.

DiscoveryBoss is one of the remote EJBs exposed by the RHQ server to the CLI. It provides a handful of inventory-related operations. On line six we call DiscoveryBoss.importResources() which takes an array of resource ids.

 In a follow-up post we will use some additional CLI features to parametrize this script so that we have more control of what gets auto-imported.
RHQ’s Powerful New Search Facility

Search was developed to enable users to gain deeper insight, more quickly, into their enterprise by supporting a sophisticated method of querying system state. Some of the notable features that this powerful facility brings are:

  • Arbitrarily Complex Search Expressions
  • Search Suggestions / Auto-Completion / Search Assist
  • Results Matching / Highlighting
  • User Saved Searches

Take a look at the end-user docs (with screen shots) here. Or, if you want to interact with the Search facilities up close & personal, you can download the latest RHQ binaries here.

Considering the SearchBar was written with the primary purpose of being extensible, it will surely become a much more pervasive concept across RHQ in the future. So, let me know what you liked and/or what you would improve, especially if you can think of other features you’d like to see added. You can either post back here or subscribe to the RHQ developer mailing list.


Tagged: rhq
GWT Compilation Performance

A few weeks ago I noticed that the coregui module of RHQ, which is the next generation web interface written in GWT (SmartGWT to be precise), started to have its compilation slow down…noticeably. Not only did the overall time to compile the module increase, but during the build my box seemed to more or less locked up. Even mouse gestures were choppy. So…I decided to investigate.

I started by writing a script that would compile the coregui module for me. It was important to gauge how the different arguments to the maven gwt:compile goal (http://mojo.codehaus.org/gwt-maven-plugin/compile-mojo.html) would affect the system during the build. After reading through the documentation, I decided that ‘localWorkers’ and ‘extraJvmArgs’ were the two most important parameters to test different permutations for. Here is the script I came up with:


#!/bin/sh

# INTERVAL - the minimum amount of memory to use for gwt:compile
# STEPS    - the number of increments of interval
#            e.g. 4 steps of 256 interval = tests of 256, 512, 768, 1024
#            e.g. 5 steps of 200 interval = tests of 200, 400, 600, 800, 1000
# RUNS     - number of iterations at each (INTERVAL,STEP)

INTERVAL=256
STEPS=4
RUNS=10

declare -a AVERAGES
declare -a VARIANCES

function build
{
    workers=$1
    memory=$2

    dirName="workers-${workers}-memory-${memory}"
    vmArgs="-Xms${memory}M -Xmx${memory}M"

    mvnArgs="-Pdev -DskipTests -o"
    gwtWorkers="-Dgwt-plugin.localWorkers=\"$workers\"" 
    gwtJvmArgs="-Dgwt-plugin.extraJvmArgs=\"$vmArgs\""
    gwtArgs="$gwtWorkers $gwtJvmArgs"

    cmd="mvn $mvnArgs $gwtArgs gwt:clean install"
    echo "Executing $cmd"

    total_seconds=0
    rm -rf "target/runs/$dirName"
    mkdir -p "target/runs/$dirName"
    declare -a raws
    for run in `seq $RUNS`
    do
        outputFile="target/runs/$dirName/output.${run}.log"

        before=$(date +%s)
        eval "$cmd" > $outputFile
        after=$(date +%s)

        elapsed_seconds=$(($after - $before))
        raw[$run]=$elapsed_seconds
        total_seconds=$(($total_seconds + $elapsed_seconds))

        echo "Run $run Took $elapsed_seconds seconds"
        echo "Execution details written to $outputFile"
    done 

    average_seconds=$(expr $total_seconds / $RUNS)

    let "index = (($workers - 1) * 4) + ($memory / $INTERVAL) - 1"
    AVERAGES[$index]=$average_seconds

    sum_of_square_deltas=0
    for run in `seq $RUNS`
    do
       let "sum_of_square_deltas += (${raw[$run]} - $average_seconds)**2"
    done
    let "variance = $sum_of_square_deltas / ($RUNS - 1)"
    VARIANCES[$index]=$variance

    echo "Run Total: $total_seconds seconds"
    echo "Run Average: $average_seconds seconds"
    echo "Run Variance: $variance"
}

function run
{
    for workers in `seq 4`
    do
        for offset in `seq $STEPS`
        do
            memory=$(($INTERVAL * $offset))
            build $workers $memory
        done
    done
}

function print
{
    echo "Results"
    printf "          "
    for headerIndex in `seq 4`
    do 
        printf "% 12d" $headerIndex
    done
    echo ""

    for workers in `seq 4`
    do
        let "memory = $workers * $INTERVAL"
        printf "% 10d" $memory
        for offset in `seq $STEPS`
        do
            let "index = ($workers - 1) * 4 + ($offset - 1)"
            printf "% 6d" ${AVERAGES[index]}
            printf "% s" "("
            printf "% 4d" ${VARIANCES[index]}
            printf "% s" ")"
        done
        echo ""
    done
}

if [[ $# -ne 1 ]]
then
    echo "Usage: $0 "
    exit 127
fi

rhq_repository_path=$1
cd $rhq_repository_path/modules/enterprise/gui/coregui

echo "Performing timings..."
run
print
echo "Performance capture complete"


In short, this will build the coregui module over and over again, passing different parameters to it for memory (JVM arguments Xms/Xmx) and threads (GWT compiler argument ‘localWorkers’). And here are the results:

  1 2 3 4
256 229(6008) 124(4) 124(27) 124(12)
512 141(193) 111(76) 113(82) 114(25)
768 201(5154) 115(98) 123(57) 195(2317)
1024 200(2352) 125(83) 199(499) 270(298)

The columns refer to the number of ‘localWorkers’, effectively the number of concurrent compilation jobs (one for each browser/locale). The rows refer to the amount of Xms and Xmx given to each job. Each row represents statistics for the build being executed ten times using the corresponding number of localWorkers and memory parameters. The primary number represents the average time in seconds it took to compile the entire maven module. The parenthetical element represents the variance (the square of the standard deviation).

So what does this grid tell us? Actually, a lot:

  • End-to-end compilation time suffers when using a single localWorker (i.e., only one permutation of browser/locale is compiled at a time). Not only does it suffer, but it has a high variance, meaning that sometimes the build is fast, and sometimes it isn’t. This makes sense because the compilation processes are relatively independent and generally don’t contend for shared resources. This implies the compilation is a natural candidate for a concurrent/threaded solution, and thus forcing serialized semantics can only slow it down.
  • The variance for 2 or more localWorkers is generally low, except for the lower right-hand portion of the grid which starts to increase again. This also makes sense because, for the case of 4 threads and 1GB memory each, these concurrent jobs are exhausting the available system memory. This box only had 4GB ram, and so all of the physical memory was used, which starts to spill over to swap, which in turn makes the disk thrash (this was verified using ‘free’ and ‘iostat’). Regardless of what other activities were occurring on the box at the time as well as how the 4 threads are interleaved, it has an effect on the build because approximately ½ GB was being swapped to my 7200rpm disk during the compilation. (Granted, I could mitigate this variance by replacing my spindles with an SSD drive (which i may end up doing) but it is nonetheless important to be mindful of the amount of memory you’re giving to the entire compilation process (all workers threads) relative to the amount of available physical RAM.)
  • The 512MB row has values which are relative minimums with respect to the other timings for the same number of localWorkers given different memory parameters. This indicates that 512 is the sweet spot in terms of how much memory is needed to compile the modules. I would surmise that the cost of allocating more memory (256MB or 512MB for each of 2, 3, or 4 worker threads) and/or the cost of GC largely accounts for the slightly decreased performance with other memory settings. And then, as mentioned above, with lots of threads and very high memory, swapping to disk starts to dominate the performance bottleneck.
  • Aside from end-to-end running time, it’s also important on a developer system to be able to continue working while things are building in the background. It should be noted that each worker thread was pegging the core it ran on. For that reason, I’m going to avoid using 3 or 4 workers on my quad-core box because my entire system crawls due to all cores being pegged simultaneously…which is exacerbated in the case of high total memory with 3 or 4 workers causing swapping and disk thrashing alongside all my cores being pegged.

Conclusions:

On my quad-core box with 4 gigs of RAM, the ideal combination is 2 localWorkers with 512MB. The end-to-end build will consistently (recall the low variance) complete just as quickly as any other combination (recall the low average running time), and it won’t peg my box because only half of my cores will be used for the GWT compilation, leaving the other half for other processes i have running…eclipse, thunderbird, chrome, etc.

So what happens now if I take and move this script over to a larger box? This new box is also a quad core, but has a faster processor, and more than twice the amount of RAM (9 gigs). From everything deduced thus far, can you make an intelligent guess as to what might happen?

….

I anticipated that with more than twice the amount of RAM, that the worse-case permutation (4 localWorkers with 1GB of Xms/Xmx each) would cause little to no swapping. And with a more powerful processor, that the average running times would come down. And that is exactly what happened:

  1 2 3 4
256 153(24) 98(0) 96(1) 95(1)
512 150(23) 96(1) 96(2) 95(1)
768 149(32) 97(1) 96(1) 95(0)
1024 149(24) 96(0) 96(0) 95(1)

With nearly non-existent variance, it’s easy to see the net effect of having plenty of physical resources. When the processes never have to go to swap, they can execute everything within physical RAM, and they have much more consistent performance profiles.

As you can see, adding more cores does not necessarily yield large decreases for the end-to-end compilation time. This can be explained because (at the time of this writing) RHQ is only compiling 6 different browser/locale permutations. As more localizations are added in the future, the number of permutations will naturally increase, which will make the effect of using more cores that much more dramatic (in terms of decreasing end-to-end compilation times).

Unfortunately, the faster processor on the larger box still got pegged when compiling coregui. So since 3 or 4 localWorkers doesn’t result in dramatically improved end-to-end compilation times, it’s still best to use 2 localWorkers on this larger box. I could, however, get away with dropping the memory requirements to 256MB for each worker, since the faster hardware seems to even out the the performance profile of workers given different memory parameters holding the localWorkers constant.

Lessons learned:

  • In any moderately sized GWT-based web application, the compilation may take a considerable portion of each core it executes on, perhaps even pegging them at 100% for a portion of the compile. Thus, if you want your development box to remain interactive and responsive during the compilation, make sure to set the localWorkers parameter to something less than the number of cores in your system.
  • Don’t throw oodles of memory at the GWT compilation process and expect to get a speedup. Too much Xms/Xmx will cause the worker threads to spill out of main memory and into swap, which can have a detrimental affect on the compilation process itself as well as other tasks on the box since I/O requests explode during that time. Modify the script presented here to work with your build, and obtain reasonable timings for your GWT-based web application at different memory levels. Then, only use as much RAM as is required to make the end-to-end time reasonable while avoiding swap.
  • Provide reasonably conservative default values for localWorkers and Xms/Xmx so as not to give a negative impression to your project’s contributors. Err on the side of a slow build rather than a build that crushes the machine it runs on.
  • Parameterize your builds so that each developer can easily override (in a local configuration file) his or her preferred values for localWorkers and Xms/Xmx. Provide a modified version of the script outlined here, to allow developers to determine the ideal parameter values on each of their development boxes.

Other tips / tricks to keep in mind:

  • Only compile your GWT application for the browser/locale permutation you’re currently developing and testing on. There’s no need to compile to other languages/locales if you’re only going to be testing one of them at a time. And there’s no need to compile the applications for all browser flavors if the majority of early testing is only going to be performed against one of them.

Recall at the start of this article I was investigating why my box seemed to be non-responsive during the compilation of coregui. Well it turns out that one developer inadvertently committed very high memory settings, which as you’ve now seen can cause large amounts of swapping when localWorkers is also high. The fix was to lower the default localWorkers to 2, and reduce the memory defaults to 512MB. As suggested above, both values are parameterized in the current RHQ build, and so developers can easily override either or both to their liking.


Tagged: gwt, java
RHQ and Customizable Dashboards

For the second in the series I've uploaded a demo of the latest prototype dashboard system for the next generation RHQ user interface. This new UI is being built upon the GWT and SmartGWT projects to provide a richer experience and quicker access to the vast amount of information gathered by RHQ. While we are currently finalizing the RHQ 3.0 release this new user interface will make its debut in the 4.0 series that will start into milestone releases shortly after 3.0.

Exposing domain objects to GWT

To continue the discussion on our use of GWT for RHQ I wanted to explain our success in directly utilizing the domain objects of RHQ in our client side code. With GWT allowing us to write our UI view code directly in Java there's even more advantage in direct access than with some of our other client technologies. We were able to reasonably easily get our domain module to GWT compile making it possible to use these objects in the generated client-side JavaScript and to serialize them between the tiers. A little bit of maven setup and we're of developing code that will run in the browser that is as direct and rich as swing development.

Because we're using JPA and Hibernate, we still have the need to prepare our entities for serialization. We've been doing this with a custom built utility for a while now, I've also more recently discovered the Gilead project's more robust solution to this. We also have JAXB bindings for our domain objects allowing JAX-WS and JAX-RS style integration. This let's our server use domain object to interact with 1) our agent, 2) or CLI / Java Client 3) WebServices clients over JAXB / JAX-WS 4) Restful clients with JAX-RS and now 5) the web browser with GWT.

Additionally, we've built a pattern for data queries that delivers a domain specific solution similar to Hibernate's criteria query concept that layers in our domain data security model. This greatly simplifies our client programming model by reducing the number of custom use APIs that are developed across the UI construction. In porting our UI to GWT we're often able to change the data access model from our older custom UI APIs over to criteria queries and simplify the code at the same time. All the while giving us new opportunities to expose our data in new ways... Such as a prototype configuration comparison tool.

New RHQ UI Technology
pp>For the next version of RHQ we have been prototyping with the GWT UI technologies that I hope will improve the capability and usability of the RHQ project. The development is still in early stages and we're planning to roll out the new UI technology over the course of several releases along with the other major feature development. New features will likely to be the first to be released in the new technology.

I've put together a short screencast of this new UI and I plan to show some more work as it happens and share some experiences with the technology. My impression so far is that this will open up some new possibilities in our designs for complex workflows and information presentation.

While nothing is set in stone, I'm seeing good results with the use of SmartGWT for a component library. Being able to directly utilize and manipulate our domain objects in the browser makes for some nice reusability and quick development.

Git – Local Tracking Branches

Whenever I clone a new copy of the RHQ repository from git, I’m placed in a local tracking branch of master. However, I rarely (if ever) want to work directly in master. Instead, I want to work in one of the numerous features-branches we have for the project.

I could create local tracking branches for the lines of development I’m interested in following, but that would be tedious considering there are nearly 3 dozen to choose from these days. Not only that, but I would need to repeat those commands each time I clone the repository (which I sometimes do so I can work in two different branches simultaneously as opposed to using ‘git stash’ and flipping back and forth between them), not to mention that I’d need to repeat those commands on all machines I work with (my home desktop, my work desktop, and my work laptop).

So I figured I would automate this by writing a quick shell script that would interrogate git, gather the lists of remote branches, and create local tracking branches automatically. I wanted to write the script such that it would set up local tracking branches for any newly created remote branches since the last time the script was run. Also, to keep things simple, I wanted the name of my local tracking branches to mirror the names of the remote branches. With those things in mind, I came up with:

#!/bin/sh

if [[ $# -ne 1 ]]
then
   echo "Usage: $0 "
   exit 127
fi

cd $1

# get all remotes except HEAD
remotes=`git branch -r | grep -v HEAD`
locals=`git branch -l`

locals_tracked=0;
total_remotes=0;
for remote_branch in $remotes 
do
   let "total_remotes++"

   # strip 'origin/' off of the remote_branch URL form
   local_branch=${remote_branch##origin/}
   
   already_had_local_branch=0
   for existing_local_branch in $locals 
   do
      if  [[ $existing_local_branch == $local_branch ]] 
      then
         already_had_local_branch=1
         break
      fi
   done

   if [[ $already_had_local_branch == 0 ]] 
   then
      git branch --track $local_branch $remote_branch
      let "locals_tracked++"
   else
      echo "Already had local branch '$local_branch'"
   fi
done

echo "Found $total_remotes remote branches"
echo "Created $locals_tracked local tracking branches"

And here is how I use it. First, I clone the RHQ repository:

[joseph@marques-redhat] git clone ssh://git.fedorahosted.org/git/rhq/rhq.git rhq-repo

Next I run my script, passing as an argument the name of the directory I just created to store my local copy of the repository:

[joseph@marques-redhat] ./git-setup-locals.sh rhq-repo

The script will show you output along the lines of:

Branch agentPlugin set up to track remote branch agentPlugin from origin.
...
Already had local branch 'master'
...

And finally print some summary information at the bottom:

Found 34 remote branches
Created 33 local tracking branches

So what is this good for? Well, it enables me to quickly bounce around multiple lines of development with ease (using ‘git checkout <branch>’) and without having to worry about whether or not I have a local tracking branch setup for that feature-branch yet.

—–

In the grand schema of things, this single issue isn’t a huge win. However, if you’re not mindful of the little things, enough of them can add up over time and start to have a real effect on productivity. So I tend to automate these things sooner rather than later so that I can forget about them and more easily focus on the code itself or the business problem at hand.


Tagged: scm
Web Development Tips – automate the little things

I recall a colleague of mine mentioning several weeks ago that it’s annoying to have to log into RHQ every time you redeploy UI code that causes portal-war’s web context to reload. I completely agreed at the time, but it wasn’t until today that I finally got annoyed enough to look for a workaround myself. Here’s the solution I ended up with:

1) Use FireFox
2) Download and install GreaseMonkey
3) Install the AutoLogin script
4) Log into “http://localhost:7080/Login.do&#8221; and make sure to tell FF to remember your password
5) Test that auto-login is working properly by logging out of the application…you should be forwarded to the login page, which FF will automatically fill in with your saved credentials, and the grease monkey script will perform the login for you

This should also work when you get logged out due to session expiry. The expiry handler will redirect you back to /Login.do, which will now automatically log you back in and – on a best effort basis – redirect you back to the last “valid” page you were on. RHQ has a mechanism for recording the last couple of pages you visited (see WebUserTrackingFilter) and will try them in most-recently-visited order until it finds a page that doesn’t blow up with JSF’s “classic” ViewExpiredException. I discuss the details of how this mechanism works in my other post.

Note: if you ever want to log into localhost with a different user, all you have to do is click the GreaseMonkey icon (on the far right-hand side of the status bar at the bottom of your browser) and you’ll temporarily disable the AutoLogin script from executing.

How would you solve this? How have you solved this? I’m eager to read your post backs.


Tagged: webdev
Autocomplete and the RHQ CLI

Autocomplete is a feature that I can not live without. It must be the curse of using better and better IDEs and phones. Not that I'm too lazy to type out the rest of a command, but something drives me nuts about the inefficiency of having to do it when I could use that time to do something more useful. So when it came time to write the command line interface for RHQ last year I knew it was a feature we had to get in there somehow.

The general concept of the RHQ CLI is that we're exposing the remote APIs for the server directly into a Java6 Scripting Engine context. This was the best way I found to provide full interactivity on the command line. It allows us to import classes, put utility classes in the context and then let the script writer build and keep Java objects as state during a session. All while providing the dynamic, weakly-typed environment of a scripting language. In our case, we went with JavaScript because it ships as a default implementation of JSR 223 in JDK6+ and had a good interactivity model that made it suitable for line-by-line execution or executing as a full script. We also support executing individual commands or expressions against the shell script so it can be called for bash, etc.

We'll start by logging in to the CLI which opens

bin/rhq-cli.sh -u rhqadmin -p rhqadmin
RHQ - RHQ Enterprise Remote CLI 1.4.0-SNAPSHOT
Remote server version is: 1.4.0-SNAPSHOT(0)
Login successful
rhqadmin@localhost:7080$

First thing we'll do is hit tab to see the default completions. This will display all the objects in the scripting context. To start with, the RHQ CLI will put the service proxies for each of the RHQ remote interfaces into the context. This makes it easy to see the available services. You'll notice a few other objects in context that will help in your use of the CLI. There's a pretty printer, some CSV export utilities, etc. Another thing you'll notice is that the tab completions will show up in multiple columns that properly utilize the width of your terminal window. This is thanks to the use of the excellent JLine library which also provides the infrastructure on which I built the auto completion technology.

Next we'll take a look at one of the services. We'll start with the ResourceManager that will help us lookup and find Resources that are being managed and monitored by RHQ. Typing "ResourceM" <tab> will complete out the service name and <tab> will add a period for calling a method on that interface. One final <tab> will list out the methods that are available on that service.

 findChildResources            findResourceComposites        findResourceLineage
findResourcesByCriteria       getLiveResourceAvailability   getParentResource
getResource                   liveResourceAvailability      parentResource
resource                      toString                      uninventoryResources

If we hit <tab> again it will go to an alternative completion mode and display full method syntaxes.

       ResourceAvailability getLiveResourceAvailability(int resourceId)
             List findResourceLineage(int resourceId)
         PageList findResourcesByCriteria(ResourceCriteria criteria)
                       void uninventoryResources(int[] resourceIds)
         PageList findChildResources(int resourceId, PageControl pageControl)
                     String toString()
                   Resource getResource(int resourceId)
                   Resource getParentResource(int resourceId)
PageList findResourceComposites(ResourceCategory category, String typeName, int parentResourceId, String searchString, PageControl pageControl)

All of this is accomplished through some basic syntax parsing and the use of reflection. Since we have the objects in the context we can directly introspect them and build up syntax and signatures quite easily. Next we'll try out a criteria based search. These criteria searches are a pattern we've built up that provide some similar capabilities to hibernate criteria lookups, but in a way that is more focused on the problem and that enforces and layers in the RHQ security model. We build the criteria and set some options.

 
rhqadmin@localhost:7080$ var criteria = new ResourceCriteria();
rhqadmin@localhost:7080$ criteria.addFilterName('ghinkle')
rhqadmin@localhost:7080$ criteria.fetchParentResource(true);

Now when we go to call the service interface's findByCriteria method you'll notice that you can tab complete the criteria parameter to method without typing anything. Since we have the criteria object in the script context and can introspect the method's signature we can do the fancy bit of telling you what variables fit the parameters to a method. A feature that first got me hooked on the IntelliJ Idea IDE.

Now on to the advanced bits. The service interfaces are powerful, but also somewhat low level. We want to make the CLI easy to use for interacting with the management inventory. In order to do so I decided to build a proxy object for a resource that could expose management capabilities that are specific to that Resource. This let's be request a proxy object and then directly look up statistics or run operations while behind the scenes the proper service interfaces are being used and security checks are being properly done. First we build the resource proxy given a resource's ID that we can find from the previous search.

rhqadmin@localhost:7080$ var platform = ProxyFactory.getResource(10001);

Now that we have the proxy we can <tab> autocomplete against it to see what features it has including metrics and operations.

rhqadmin@localhost:7080$ platform.

OSName                OSVersion             architecture          children              contentTypes
createdDate           description           freeMemory            freeSwapSpace         getChild
getMeasurement        hostname              id                    idle                  manualAutodiscovery
measurements          modifiedDate          name                  operations            resourceType
systemLoad            toString              totalMemory           totalSwapSpace        usedMemory
usedSwapSpace         userLoad              version               viewProcessList       waitLoad

In order to make the autocomplete work we haven't hacked it to utilize the resource metadata, instead I've dynamically constructed the Proxy instances Class using Javassist. This system actually generates a real implementation of the dynamically built proxy class that can be interacted with in the script environment like any other Java object. This means no hacking of the script execution engine to fake these calls or having to use map lookup methods when what you really want is a proper interface. It also lets the autocomplete operate using reflection like everywhere else. The other feature that we have in the script environment is an automatic pretty printer that will pretty print the results of any line. Just typing "platform" here will print out a nice property format of the object and its values.

The final bit of autocomplete fanciness is some specific support for certain classes that let us take advantage of the fact that this is a live environment. For example when you have a java.util.Map and want to lookup a specific key, the autocompletion system can offer the valid map keys as inputs to the lookup as the autocomplete below demonstrates.

rhqadmin@localhost:7080$ var map = new java.util.HashMap()

rhqadmin@localhost:7080$ map.put("a","alpha")

rhqadmin@localhost:7080$ map.put("b","beta");

rhqadmin@localhost:7080$ map.get('

'a'   'b'

The power of an interactive script execution environment combined with autocompletion and simple remote API calling makes the RHQ command line interface a powerful means for automation, integration and custom features. You can take a look at our autocompletor class to see how it does the job. An the custom class generator and the proxy implementation class.

JON 2.2 Installation Hints

I thought it would be a good idea to explain the new distribution model for the JBoss Operations Network. With the 2.2 version we have shifted to a base JON server distribution that includes the agent and separate plugin packs that enable support for our platform products such as EWS (tomcat), EAP (JBossAS) and SOA-P. The old model just had separate Server and Agent installs with the plugins included, but with the addition of more products supported through JON we want to give the ability to download the specific desired plugin support and install that alone.

One key to this model is that those expecting an agent download will no longer see one available. Once the server is installed the agent can be downloaded from the administration menu within the JON web ui. For those looking to use wget to download via CLI you can use something like "wget http://<server-host-name>:7080/agentupdate/download".

Plugin packs should be installed according to the readme that accompanies them. This amounts to copying the downloaded plugins to <server-installation>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins. From there the plugins will be downloaded by agents the next time they start up or by running the "plugins update" command from the agent prompt or the "Update Plugins" operational command from within the web ui. This feature can also be used with groups of agent's allowing you to update all your agents at once.

New Beta of Jopr Managment Platform with Tomcat management

Earlier this week, the Jopr/RHQ team put out a new beta of our upcoming 2.2 release with some substantial new features. We have available a quick visual tour of the upcoming changes. The release includes bug fixes, performance enhancements, many new features and UI improvements. The release can be downloaded from our Release Page.

A list of some of the major items follow.

  • Standalone Apache Tomcat Management
  • New user interface with rich application menus
  • New Tree oriented navigation of managed resources
  • Cluster oriented tree views for managing groups of resources
  • Group configuration editing
  • Subsystem views for cross sytem views of critical information
  • New metrics analysis system for quick discovery of abnormal conditions
  • DynaGroups query enhancements
  • Configuration change detection for auditing external edits
  • Availability history and metrics display
  • Agent auto-upgrades will allow future upgrades to be automatic for agents

Try it and give us feedback on our forums.

Jopr and the angry demo gods

To those who were disappointed that the Jopr plugin development demonstration was cut short I wanted to apologize. It seems like my laptop might have overheated (wouldn't reboot until it cooled a few minutes)... but at least Jopr recorded the temp climbing and climbing during the presentation until it gave out. Not sure if Fedora10 and this video driver are getting along all that well.


One note, there's a bug in this experimental plugin that seems to be reading out Fahrenheit when it should be Celsius... gotta fix that.

For those of you interested in learning more about plugin development for Jopr and RHQ I'm going to list out some useful links and sources of information. Also, I'm going to try to put together a video demonstration of what I was going to cover since it has some useful visual demonstrations of how plugin code and configuration fit together.

Webinar on Management with the Jopr project

I'll be delivering a Webinar tomorrow at 2PM EST on the use and extension of the RHQ / Jopr projects. You can register at: http://inquiries.redhat.com/go/redhat/JOPRWebinar. I'll presenting on using these technologies to monitor your environment and applications and on how easy it is to get started with your first custom plugin.

Visualization Prototype

This rather low quality flash recording shows some of the features of the prototype I've been experimenting with for richer browsing of monitoring data exposed through RHQ and Jopr. Just a start.

Jopr: Open source JBoss management

I am pleased to announce the release of a new open source enterprise management solution for JBoss Application Server called Jopr. This project is the open source form of our JBoss Operations Network product built on the RHQ platform that I've written about several times. Jopr will act as the upstream, open source project to the Operations Network and provide a place for the community to collaborate on the use and development of this technology for implementing middleware management.

The RHQ project infrastructure provides a base management platform for discovering and tracking managed enterprise elements in a shared inventory. The project provides a base for the development of different aspects of management such as monitoring, configuration, content deployment, alerting and operational control. By building these solutions in a shared infrastructure you can correlate the performance and problems between varying solutions in an environment. Jopr and the JBoss Operations Network build on this base to provide JBoss Application Server management on top of the basic OS and project monitoring provided in the base platform, adding in elements like app server discovery, hibernate query monitoring, ejb method timings and JBoss Messaging statistics. They also benefit from the platform's centralized inventory model, fine grained security, advanced event and auditing models and extensibility.

Jopr provides JBoss AS auto-discovery, monitoring, configuration, log tracking and other elements of Operations Network, in an unsupported form for the community. This will allow JBoss users to work in that community to build administrative solutions for the JBoss product suite while the Operations Network continues to provide a supported, certified solution to management in critical environments as well as the added value of the managed deployment of cumulative patches. Best of all, this is all built on an open source platform that can be fully extended to other types of services, down to application and business specific monitoring and management, or to entirely new resource to be managed. Plugins written to the RHQ platform specification will work against Jopr and ON.

Jopr is being released initially as version 2.1 to identify the fact that it is based off technology that has been under development and release for many years as a commercial product. It also currently maps to the recently released JBoss Operations Network 2.1 release and the RHQ 1.1 release all executed wonderfully by a dedicated team of developers. In the future I'll write more about the roadmap and direction for these three projects, but these releases are a great milestone in providing the necessary tools for administrators of open source middleware.

Hardware Monitoring with RHQ

In the realm of extremely early prototype I've started an RHQ plugin that is meant to provide monitoring for some useful aspects of hardware. It only does anything useful on linux with IBM hardware or if you have smartmontools installed. The idea will be to collect some standard hardware info in a common model that isn't specific to a certain hardware. Hopefully letting you compare cpu temperatures across you're entire data center. It also utilizes smolt to gather a bit of other hardware info.

Chassis monitoring

Smart based disk monitoring

Hudson Monitoring Plugin for RHQ

I've gone ahead and committed a new plugin for RHQ that provides monitoring of a Hudson continuous integration server. It's not a particularly complex plugin, but it gives a decent example of using JSON within a plugin. You'll manually inventory the server by pointing at its install URI and it'll discover and track the build statuses of all the projects running on that server. There's more metrics it could be exposing, but its a start.

Viewing lots of monitoring data

In trying to find better ways to view large amounts of data I stumbled across a project called TreeViz that has some nice ways to render hierarchical data. I hooked a few of these up to RHQ to help show the availability of thousands of resources in a single snapshot. One I really like is the sunburst display.

With this view you can click on and drill into sub-trees that then show on the outer rim while still showing the selected context across the entire inventory in the inner rim. The image below shows the same information in a hyperbolic tree view that can be investigated by dragging.

I'll be experimenting with some of these alternative views in my spare time to see if we can include a few out of the box in a useful form.

A couple more new RHQ plugins

I've checked in two more new RHQ plugins to publish. These are also in the experimental stage right now. There's an Oracle plugin with some basic monitoring and an OpenSSH plugin that also experiments with augeas for configuration management.

Also, I'll be at Red Hat Summit next week talking about RHQ and the JBoss Operations Network and some of the cools stuff we are working on.

A couple new RHQ plugins

I finally got a chance to checkin a couple more plugins that I had been working on. The virt plugin uses libvirt to do a bit of xen monitoring and management. Just the basics so far... inventory and monitoring for the host and guests and the ability to create and configure certain things for existing guests, plus virtual device and network monitoring. It uses the very cool JNA project to make calls from the java plugin to libvirt.

I also committed a simple Jira monitoring plugin, though it doesn't do a whole lot as the Jira SOAP API is a bit weak.

Any ideas for other useful plugins? We're taking requests.

JON at JavaOne

JavaOne was a pretty good time this year. I got to demo the newly released versions of JON (2.0) and RHQ (1.0) to nice crowds and met plenty of people interested in the platform. The JBoss party was decent too even if the marketing guys did put our demo on a huge screen. Somehow things get funny on a 12 foot screen. We did manage to turn it into a drinking game at least. We also have JON pint glasses now! Time to recover, get back to the east coast and get back to work.

Advanced Plugins?

The RHQ plugin system is called AMPS or Advanced Management Plugin System. Don't blame me over the name... Mazz came up with it. Anyway, why is it advanced? On the outside it seems pretty plain vanilla. You've got an XML file to define the metadata and you write some code to do the work. Pretty typical. And it may not seem all that sexy, but I do think it demonstrates some new enhancements to a field that's had some pretty tough extension models to date. Its not perfect and we have plans to enhance, tune, simplify, etc. but I'll use a couple posts to cover some of the high-level details and hopefully some of you will find it useful in building your own RHQ plugins.

To start with the plugin system removes the need for many types of boiler plate code. A very basic example is that the user can completely customize the intervals to which a metric is collected. For example, there are a couple dozen metrics that can be collected on PostgreSQL tables. The user could have hundreds or thousands of these in many databases. The centralized server system lets the user choose how often to collect each of these metrics down to a specific resource... but also simplifies setup by shipping with defaults and letting the admin specify a global template interval or to specify an interval on an arbitrary group of tables. Combined with the advance rules-based grouping system we could easily turn on table-size monitoring every 5 minutes for all tables that contain the string "measurement" in their name and that are running on databases in production.

So why is this good for the plugin system. Well the point here is the integration. We don't make users setup cron, a collection configuration file, or run their own threads for collection... but further than that, we can automatically group the collection of metrics to allow the plugin to optimize retrieval. Instead of making a round trip to the database for every metric we can use the API to pass the plugin a list of requested metrics. Through the infrastructure, we're able to do automated grouping. Let's say you've got a metric collected every minute, one collected every two and then 5 collected every five. We'll be able to group those into patterns where each minute only one round-trip collection is made to the database for just the right set of metrics and all without the plugin having to worry about caching or stale data or other miscellaneous junk. It's part of our model to make writing plugins fun and enjoyable.

Our plugin system covers metadata definition of arbitrarily deep trees of managed resources. For those resources it covers the cross-platform auto-discovery of those resources, metric measurement, configuration reading and editing, operations with structured parameters and return values, arbitrary events and log watching and software and content discovery, installation and deployment.

This is just a very small example, but I'll get into more in future posts and introduce some of the plugins I'm working on in spare time.

Performance modeling

How to test framework systems that have varying usage is something we've hit with the RHQ project. Plugins to the platform could expose many types of data at varying frequencies and scales and so we've developed a flexible framework for performance testing. The nice thing about this system is its relative simplicity. We were able to create a standard RHQ plugin that can simulate large complex inventories of resources with just a few settings. How many server resources? How many child resources for each of those and what kinds of metrics and other data do they expose.

Additionally, we've got a system for launching copies of our agents that each look like entirely separate systems to the server. In this way we can simulate the data flow of hundreds or thousands of boxes with a relatively small setup. To the server each has separately managed connections, but we can run many more agents this way than starting separate VMs. The performance and agent launching systems are completely configurable allowing us to dial up and down the load and then we can utilize the instrumentation built into RHQ itself to let us see how critical subsystems are performing. How long is it taking to persist large data sets? How long is it taking to check alert conditions and dampening filters?

With a relatively small level of effort we've been able to increase our confidence in scalability by orders of magnitude. The instrumentation also helps us quickly identify conditions that could become a problem. All good practices if your building systems that need high performance.

More EJB3/Hibernate Utilities

Following on from the previous entry on browsing around your EJB3 entity model I've put together a little utility JSP page that will help you test and understand your JPQL queries and how they translate into database interaction. A common issue on larger EJB3 projects is losing track of what is lazy vs. eager loaded and what the real database impact is of any given query. It can be difficult to map these back to the actual SQL that runs against the database and cumbersome to utilize debug logging to find what you are looking for.

The hibernate.jsp is a standalone page that will lookup the entity manager from JNDI and use some hibernate APIs to translate the query into SQL and execute. Along the way it will give you parameter entry boxes for any query parameters and also count and track the total number of database roundtrips involved in a particular query and time the whole thing. You can also enter execute named queries.

EJB3 Drive Through

The RHQ Project utilizes EJB3 for persistence of around around 100 different entities in our model. As a generic management solution, it stores information about the things your managing {Resources, Users, Groups, Configurations and Measurement data} as well as the large amounts of meta-data it takes to provide an extensible platform. Our model centers around these two areas. A hierarchical resource model and a hierarchical type model that is delivered by plugins. Some of our objects have many connected pieces of information which makes finding data quickly during development rather difficult by normal database queries. Particularly at times when we were still completing the user interfaces to some of these systems it was useful to have a alternate way to browse around the theoretical JPA object graph.

To that end I put together a JSP and a stateless session bean that gives you that view. This code is written to access our beans, but it should be fairly easy to construct a more generic version.

There are two real views in the browser. First the list view shows you a list of all the resources of a specific entity type. There is a quick link on the landing page to the list of the most important entity types that will display one of these lists. I haven't gotten around to pagination support so you'll only see the first 100.

The other view shows a specific entity instance with a top table showing all the simple attributes as their toString values or *-to-one links as a link to that entity's detail view.

The bottom of the detail page shows the list of *-to-many references with each show the relations on the other side as links. Through these links it is possible browse all the way around an EJB3 domain model seeing the field values on specific entities or getting an idea for how they are linked. It can also be useful as a tool to learning the model. It required no changes to the entities and merely uses JPQL queries, reflection and a quick Hibernate call to ensure the data is loaded.

So what's next? Well, it was a quick tool hacked together for a project several years ago that turned out to be useful to me. It is very minimalist (ugly), doesn't support pagination or querying and isn't particularly smart about data display, but it may be useful to you on your JPA project as a development helper.

Information Visualization

I stumbled across this timeline visualization project from MIT recently and thought it would make for a pretty neat way to combine the many types of events collected by RHQ. Managed resources can have configuration change events, operations that are executed, events that are collected and alerts against those and collected monitoring data.

This example shows a Windows box with some alerts being fired at different severities, some operations begin executed successfully and some events collected from the Windows Event Log. The simile timeline allows the user to see a timeline in several bands of different granularity and then to click and drag across them to see different time periods. Dragging across the coarser bands gets you there faster. It also let's me setup different icons for the events and severities so that you can see the information at a glance and can even filter and highlight events by content.

Being open

Oddly enough, this blog went quiet when I went to for an open source company (JBoss). In my prior life in the consulting world, I worked on tools in open source that were generally orthogonal to the business goals and so it was pretty easy to work on and talk publicly about MC4J or minor projects. At JBoss, I would say "I used to be the only open source guy at closed-source companies and now I'm the only closed source guy at an open source company". Its not that I couldn't blog, but I wouldn't be able to blog about what interested me as what interested me was the closed source project I was working on.

I'm happy to say that I'm finally being paid to work on open source. Red Hat announced its cooperation on the RHQ project which constitutes much of what I've been involved with for the past three years. RHQ is an infrastructure project for building management technology and a pretty good one at that. This is just the first step now that we're finally out in the open, but I'll save some for future posts. For now, RHQ is our shot at Management Middleware. (Note: that's not middleware management, though that of course is a big part of what we do with this.