A JBoss Project
Red Hat

Read about it here or directly go to RHQ on GitHub

RHQ is an enterprise management solution for JBoss middleware projects, Tomcat, Apache Web Server, and numerous other server-side applications.

RHQ provides administration, monitoring, alerting, operational control and configuration in an enterprise setting with fine-grained security and an advanced extension model.

 

About the Project

Getting Involved If you wish to get involved as a contributor to RHQ, please visit #rhq channel on Freenode IRC and get to know people.
Developers Our developers are always looking for the community to get involved. Whether it is ideas for improvement, documentation, contributed plugins or core development. Check the Contributions page on the RHQ wiki
Community Our user mailing list and our developer mailing list are the main channels of communication between all community members. You can also join the team on IRC (#rhq on irc.freenode.net).
Knowledge User docs and developer resources can be found on the RHQ wiki.
Project Status RHQ uses the Red Hat Bugzilla issue tracker to organize and prioritize tasks. Development effort is done in RHQ Project which includes Jopr Project that is specific to JBoss technology management.
RHQ Project | Open Issues | Source code GIT repository
Professional Support Red Hat delivers the enterprise Support, Consulting, and Training that you need whether you are testing a proof of concept, deploying a mission-critical application, or rolling out JBoss Middleware products across your enterprise. The JBoss Operations Network a fully supported enterprise product for monitoring and managing JBoss middleware products that is based on RHQ.

RHQ

blogs about the RHQ project
Changing the Endpoint Address of an RHQ Storage Node
Jul 20, 2014 11:18 AM by John Sanda
There is very limited support for changing the endpoint address of a storage node. In fact the only way to do so is by undeploying and redeploying the node with the new address. And in some cases, like when there is only a single storage node, this is not even an option. BZ 1103841 was opened to address this, and the changes will go into RHQ 4.13.

Changing the endpoint address of a Cassandra node is a routine maintenance operation. I am referring specifically to the address on which Cassandra uses for gossip. This address is specified by the listen_address property in cassandra.yaml. The key thing when changing the address is to ensure that the node's token assignments do not change. Rob Coli's post on changing a node's address provides a nice summary of the configuration changes involved.

With CASSANDRA-7356 however, things are even easier. Change the value of listen_address and restart Cassandra with the following system properties defined in cassandra-env.sh,

  • -Dcassandra.replace_address=<new_address>
  • -Dcassandra.replace_address_first_boot=true 
The seeds property in cassandra.yaml might need to be updated as well. Note that there is no need to worry about the auto_bootstrap, initial_token, or num_tokens properties.

For the RHQ Storage Node, these system properties will be set in cassandra-jvm.properties. Users will be able to update a node's address either through the storage node admin UI or through the RHQ CLI. One interesting to note is that the RHQ Storage Node resource type uses the node's endpoint address as its resource key. This is not good. When the address changes, the agent will think it has discovered a new Storage Node resource. To prevent this we can add resource upgrade support in the rhq-storage plugin, and change the resource key to use the node's host ID which is a UUID that does not change. The host ID is exposed through the StorageServiceMBean.getLocalHostId JMX attribute.

If you interested in learning more about the work involved with adding support for changing a storage node's endpoint address, check out the wiki design doc that I will be updating over the next several days.
Upgrading to RHQ 4.9
Sep 9, 2013 8:30 AM by John Sanda
RHQ 4.8 introduced the new Cassandra backend for metrics. There has been a tremendous amount of work since then focused on the management of the new RHQ Storage Node. We do want to impose on users the burden of managing a second database. One of our key goals is to provide robust management such that Cassandra is nothing more than an implementation detail for users.

The version of Cassandra shipped in RHQ 4.8 includes some native libraries. One of the main uses for those native libraries is compression.  If the platform on which Cassandra is running has support for the native libraries, table compression will be enabled. Data files written to disk will be compressed.

All of the native libraries have been removed from the version of Cassandra shipped in RHQ 4.9. The reason for this change is to ensure RHQ continues to provide solid cross-platform support. The development and testing teams simply do not have the bandwidth right now to maintain native libraries for all of the supported platforms in RHQ and JON.

The following information applies only to RHQ 4.8 installs.

Since RHQ 4.9 does not ship with native those compression libraries, Cassandra will not be able to decompress the data files on disk.

Compression has to be disabled in your RHQ 4.8 installation before upgrading to 4.9. There is a patch which you will need to run prior to upgrading. Download rhq48-storage-patch.zip and follow the instructions provided in rhq48-storage-patch.sh|bat.

I do want to mention that we will likely re-enable compression using a pure Java compression library in a future RHQ release.
Why I am ready to move to CQL for Cassandra application development
Oct 17, 2012 8:22 AM by John Sanda
Earlier this year, I started learning about Cassandra as it seemed like it might be a good fit as a replacement data store for metrics and other time series data in RHQ. I developed a prototype for RHQ. I used the client library Hector for accessing Cassandra from within RHQ. I defined my schema using a Cassandra CLI script. I recall when I first read about CQL. I spent some time deliberating over whether to define the schema using a CLI script or using a CQL script. Although I was intrigued but ultimately decided against using CQL. As the CLI and the Thrift interface were more mature, it seemed like the safer bet. While I decided not to invest any time in CQL, I did make a mental note to revisit it at a later point since there was clearly a big emphasis within the Cassandra community for improving CQL. That later point is now, and I have decided to start making extensive use of CQL.

After a thorough comparative analysis, the RHQ team decided to move forward with using Cassandra for metric data storage. We are making heavy use of dynamic column families and wide rows. Consider for example the raw_metrics column family in figure 1,

Figure 1. raw_metrics column family

The metrics schedule id is the row key. Each data point is stored in a separate column where the metric timestamp is the column name and the metric value is the column value. This design supports fast writes as well as fast reads and works particularly well for the various date range queries in RHQ. This is considered a dynamic column family because the number of columns per row will vary and because column names are not defined up front. I was quick to rule out using CQL due to a couple misconceptions about CQL's support for dynamic column families and wide rows. First, I did not think it was possible to define a dynamic table with wide rows using CQL. Secondly, I did not think it was possible to execute range queries on wide rows.

A couple weeks ago I came across this thread on the cassandra-users mailing list which points out that you can in fact create dynamic tables/column families with wide rows. And conveniently after coming across this thread, I happened to stumble on the same information in the docs. Specifically the DataStax docs state that wide rows are supported using composite column names. The primary key can have multiple components, but there must be at least one column that is not part of the primary key. Using CQL I would then define the raw_metrics column family as follows,

This CREATE TABLE statement is straightforward, and it does allow for wide rows with dynamic columns. The underlying column family representation of the data is slightly different from the one in figure 1 though.

Figure 2. CQL version of raw_metrics column family
Each column name is now a composite that consists of the metric timestamp along with the string literal, value. There is additional overhead on reads and writes as the column comparator now has to compare the string in addition to the timestamp. Although I have yet to do any of my own benchmarking, I am not overly concerned by the additional string comparison. I was however concerned about the additional overhead in terms of disk space. I have done some preliminary analysis and concluded that the difference with just storing the timestamp in the column name is negligible due to compression of SSTables which is enabled by default.

My second misconception about executing range queries is really predicated on the first misconception. It is true that you can only query named columns in CQL; consequently, it is not possible to perform a date range query against the column family in figure 1. It is possible though to execute a date range query against the column family in figure 2.

RHQ supports multiple upgrade paths. This means that in order to upgrade to the latest release (which happens to be 4.5.0 at the time of this writing), I do not have to first upgrade to the previous release (which would be 4.4.0). I can upgrade from 4.2.0 for instance. Supporting multiple upgrade paths requires a tool for managing schema changes. There are plenty of such tools for relational databases, but I am not aware of any for Cassandra. But because we can leverage CQL and because there is a JDBC driver, we can look at using an existing tool instead of writing something from scratch. I have done just that and working on adding support for Cassandra to Liquibase. I will have more on that in future post. Using CQL allows us to reuse existing solutions which in turn is going to save a lot of development and testing effort.

The most compelling reason to use CQL is the familiar, easy to use syntax. I have been nothing short of pleased with Hector. It is well designed, the online documentation is solid, and the community is great. Whenever I post a question on the mailing list, I get responses very quickly. With all that said, contrast the following two, equivalent queries against the raw_metrics column family.

RHQ developers can look at the CQL version and immediately understand it. Using CQL will result in less, easier to maintain code. We can also leverage ad hoc queries with cqlsh during development and testing. The JDBC driver also lends itself nicely to applications that run in an application as RHQ does.

Things are still evolving both with CQL and with the JDBC driver. Collections support is coming in Cassandra 1.2. The JDBC driver does not yet support batch statements. This is due to the lack of support for it the server side. The functionality is there in the Cassandra trunk/master branch, and I expect to see it in the 1.2 release. The driver also currently lacks support for connection pooling. These and other critical features will surely make their way into the driver. With the enhancements and improvements to CQL and to the JDBC driver, adding Cassandra support to Hibernate OGM becomes that much more feasible.

The flexibility, tooling, and ease of use make CQL a very attractive option for working with Cassandra. I doubt the Thrift API is going away any time soon, and we will continue to leverage the Thrift API through Hector in RHQ in various places. But I am ready to make CQL a first class citizen in RHQ and look forward to watching it continue to mature into a great technology.
Setting up a local Cassandra cluster using RHQ
Jul 1, 2012 9:50 PM by John Sanda
As part of my ongoing research into using Cassandra with RHQ, I did some work to automate setting up a Cassandra cluster (for RHQ) on a single machine for development and testing. I put together a short demo showing what is involved. Check it out at http://bit.ly/N3jbT8.
Aggregating Metric Data with Cassandra
Jun 16, 2012 7:21 AM by John Sanda

Introduction

I successfully performed metric data aggregation in RHQ using a Cassandra back end for the first time recently. Data roll up or aggregation is done by the data purge job which is a Quartz job that runs hourly. This job is also responsible for purging old metric data as well as data from others parts of the system. The data purge job invokes a number of different stateless session EJBs (SLSBs) that do all the heavy lifting. While there is a still a lot of work that lies ahead, this is a big first step forward that is ripe for discussion.

Integration

JPA and EJB are the predominant technologies used to implement and manage persistence and business logic. Those technologies however, are not really applicable to Cassandra. JPA is for relational databases and one of the central features of EJB is declarative, container-managed transactions. Cassandra is neither a relational nor a transactional data store. For the prototype, I am using server plugins to integrate Cassandra with RHQ.

Server plugins are used in a number of areas in RHQ already. Pluggable alert notifcation senders is one of the best examples. A key feature of server plugins is the encapsulation made possible by the class loader isolation that is also present with agent plugins. So let's say that Hector, the Cassandra client library, requires a different version of a library that is already used by RHQ. I can safely use the version required by Hector in my plugin without compromising the RHQ server. In addition to the encapsulation, I can dynamically reload my plugin without having to restart the whole server. This can help speed up iterative development.

Cassandra Server Plugin Configuration
You can define a configuration in the plugin descriptor of a server plugin. The above screenshot shows the configuration of the Cassandra plugin. The nice thing about this is that it provides a consistent, familiar interface in the form of the configuration editor that is used extensively throughout RHQ. There is one more screenshot that I want to share.

System Settings
This is a screenshot of the system settings view. It provides details about the RHQ server itself like the database used, the RHQ version, and build number. There are several configurable settings, like the retention period for alerts and drift files and settings for integrating with an LDAP server for authentication. At the bottom there is a property named Active Metrics Server Plugin. There are currently two values from which to choose. The first is the default, which uses the existing RHQ database. The second is for the new Cassandra back end. The server plugin approach affords us a pluggable persistence solution that can be really useful for prototyping among other things. Pluggable persistence with server plugins is a really interesting topic in and of itself. I will have more to say on that in future post.

Implementation

The Cassandra implementation thus far uses the same buckets and time slices as the existing implementation. The buckets and retention periods are as follows:

Metrics Data BucketData Retention Period
raw data7 days
one hour data2 weeks
6 hour data1 month
1 day data1 year

Unlike the existing implementation, purging old data is accomplished simply by setting the TTL (time to live) on each column. Cassandra takes care of purging expired columns. The schema is pretty straightforward. Here is the column family definition for raw data specified as a CLI script:


The row key is the metric schedule id. The column names are timestamps and column values are doubles. And here is the column family definition for one hour data:


As with the raw data, the schedule id is the row key. Unlike the raw data however, we use composite columns here. All the buckets with the exception of the raw data, store computed aggregates. RHQ calculates and stores the min, max, and average for each (numeric) metric schedule. The column name consists of a timestamp and an integer. The integer identifies whether the value is the max, min, or average. Here is some sample (Cassandra) CLI output for one hour data:


Each row in the output reads like a tuple. The first entry is the column name with a colon delimiter. The timestamp is listed first followed by the integer code to identify the aggregate type. Next is the column value, which is the value of the aggregate calculation. Then we have a timestamp. Every column has a timestamp in Cassandra has a timestamp. It is used for conflict resolution on writes. Lastly, we have the ttl. The schema for the remaining buckets is similar the one_hour_metric_data column family so I will not list them here.

The last implementation detail I want to discuss is querying. When the data purge job runs, it has to determine what data is ready to be aggregated. With the existing implementation that uses the RHQ database, queries are fast and efficient using indexes. The following column family definition serves as an index to make queries fast for the Cassandra implementation as well:


The row key is the metric data column family name, e.g., one_hour_metric_data. The column name is a composite that consists of a timestamp and a schedule id. Currently the column value is an integer that is always set to zero because only the column name is needed. At some point I will likely refactor the data type of the column  value to something that occupies less space. Here is a brief explanation of how the index is used. Let's start with writes. Whenever data for a schedule is written into one bucket, we update the index for the next bucket. For example, suppose data for schedule id 123 is written into the raw_metrics column family at 09:15. We will write into the "one_hour_metric_data" row of the index with a column name of 09:00:123. The timestamp in which the write occurred is rounded down to the start of the time slice of the next bucket. Further suppose that additional data for schedule 123 is written into the raw_metrics column family at times 09:20, 09:25, and 09:30. Because each of those timestamps gets rounded down to 09:00 when writing to the index, we do not wind up with any additional columns for that schedule id. This means that the index will contain at most one column per schedule for a given time slice in each row.

Reads occur to determine what data if any needs to be aggregated. Each row is in the index is queried. After a column is read and the data for the corresponding schedule is aggregated into the next bucket, that column is then deleted. This index is a lot like a job queue. Reads in the existing implementation that use a relational database should be fast; however, there is still work that has to be done to determine what data if any needs to be aggregated when the data purge job runs. With the Cassandra implementation, the presence of a column in a row of the metrics_aggregates_index column family indicates that data for the corresponding schedule needs to be aggregated.

Testing

I have pretty good unit test coverage, but I have only done some preliminary integration testing. So far it has been limited to manual testing. This includes inspecting values in the database via the CLI or with CQL and setting break points to inspect values. As I look to automate the integration testing, I have been giving some thought to how metric data is pushed to the server. Relying on the agent to push data to the server is sub optimal for a couple reasons. First, the agent sends measurement reports to the server once a minute. I need better control of how frequently and when data is pushed to the server.

The other issue with using the agent is that it gets difficult to simulate older metric data that has been reported over a specified duration, be it an hour, a day, or a week. Simulating older data is needed for testing that data is aggregated into 6 hour and 24 hour buckets and that data is purged at appropriate times.

RHQ's REST interface is a better fit for the integration testing I want to do. It already provides the ability to push metric data to the server. I may wind up extending the API, even if just for testing, to allow for kicking off the aggregation that runs during the data purge job. I can then use the REST API to query the server and verify that it returns the expected values.

Next Steps

There is still plenty of work ahead.I have to investigate what consistency levels are most appropriate for different operations. There is a still a large portion of the metrics APIs that needs to be implemented, some of the more important ones being query operations used to render metrics graphs and tables. The data purge job is not the best approach going forward for doing the aggregation. Only a single instance of the job runs each hour, and it does not exploit any of the opportunities that exist for parallelism. Lastly and maybe most importantly, I have yet to start thinking about how to effectively manage the Cassandra cluster with RHQ. As I delve into these other areas I will continue sharing my thoughts and experiences.
View more rhq
         

Desktop wallpaper | Project Swag

RHQ on Twitter

No tweets available.