Bob Pasker

Subscribe to Bob Pasker: eMailAlertsEmail Alerts
Get Bob Pasker: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Java EE Journal, Java Developer Magazine

J2EE Journal: Article

Multicore Systems

Ready or not, here they come

Every major chip manufacturer has delivered or announced a roadmap for multicore chips that have multiple CPUs on the same piece of silicon. Systems developers are now designing these chips into their entire product line. For Java platform developers, Symmetric Multiprocessing Systems (SMP) should be hidden well below the hardware abstraction layer, but not all applications will get equal benefits from SMP without understanding what's going on under the hood.

This article discusses strategies for achieving the best bang for the buck out of SMP systems. We'll look at design patterns for parallel programming and locking, debugging and profiling, large memory footprints and the effects on garbage collection, as well as tuning and capacity planning.

Java was originally designed as a language for set-top box applications and later became a vehicle for the HotJava browser. Even though Java had first-class support for multithreaded programming (GUI applications have long been multithreaded), application and app server developers continued, with good reason, to treat threads as scarce and expensive resources because the overhead of context switches, per-thread memory, and synchronization was quite noticeable in the sweet spot of server technology - one to four CPUs with up to eight gigabytes of memory. Applications that could not fit into a single instance were deployed as clusters of such instances. Even larger SMP systems, such as Sunfire and Power series boxes, are typically divided into four CPU partitions because of their NUMA architectures and the GC problems associated heaps greater than 4 gig. The literature on scalable Java applications is therefore filled with design patterns and sample implementations of worker thread pools to reduce the number of actual threads in a system, culminating in the addition of of Java5's java.util.concurrent.ThreadPoolExecutor (www.onjava.com/pub/a/onjava/2004/09/01/nio.html  http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf  www-128.ibm.com/developerworks/library/j-jtp0730.html). Rather than being hidden well below the JVM abstraction layer, threads have become an integral and ongoing design point and a tuning headache for developers.

With the advent of better hardware interconnects and multicore chip technologies, much larger flat-memory SMP machines are becoming available, and the debate over threads and threading for scalable Java applications is being reviewed. Our experience at Azul with hundreds of different end-user applications and ISV products running on our Java Compute Appliances has show that although these applications work perfectly on our Java-licensed platform, many of these apps have inherent scalability problems that prevent a single instance from fully utilizing an entire system. The ability to host up to 120 JVMs on the largest appliance concurrently notwithstanding, we have developed a number of strategies to scale a single instance of Java server application.

These strategies span the gamut from configuration changes to devising entirely new algorithms, and I will even propose some additional changes to the Java class libraries to make them scale better.

Since Azul appliances have two orders of magnitude - more CPUs and memory, the first order of business is to increase the amount of hardware resources available to the application instance. This consists of nothing more than modifying the app server configuration to increase the number of thread-constrained resources: worker threads in the thread pools, adding more MDB listeners, servlet pool instances, etc., and restarting the server. Applications react favorably to this kind of app server tuning, and they take only minutes to implement. There might also be application-defined resources that can be expanded, so take a look at how you use threads and memory to see if there are similar configuration changes that can be made.

Another easy way to take better advantage of a large SMP system is to increase the heap size of the application well beyond the rule-of-thumb numbers (www.128.ibm.com/developerworks/eserver/library/ es-was-zseriesfaq.html#ques5, http://java.sun.com/docs/hotspot/gc1.4.2/faq.html, http://dev2dev.bea.com/pub/a/2004/01/chow_deisher.html). Azul appliances have such a large memory footprint that allocating dozens of gigabytes of memory doesn't seem unreasonable. The caveat of large heap sizes for Java applications has been that garbage collection has traditionally subjected applications to infamously long pauses, resulting in unacceptable application response time. Over 90 percent of the large applications we see have heap sizes of fewer than two gigabytes. The way large applications get away with such small heaps sizes is by running multiple copies of the application in a cluster, either on the same machine or on different ones. Such "heap partitioning" has the effect of splitting the heap into small pieces, and a garbage collection (GC) pause in any one server doesn't affect the users on other member of the cluster.

We have only seen a handful of systems with heaps that exceed five gigabytes. These large-memory applications must have their GC parameters tuned to the nth degree, and often, small software or configuration changes can require a lengthy retuning cycle. Interestingly, we have seen two applications with almost 100 gigabyte heaps, and those are on large NUMA SMP systems with very expensive 4 gig DIMMs. The reason they can run with such large heaps is because the heap is large enough that over the course of a workweek, the application never uses enough heap to provoke a full GC cycle, and the application must be restarted every Saturday! On the Azul platform, such shenanigans are unnecessary because of our Pauseless Garbage Collection (PGC) technology that eliminates response time-busting GC pauses (www.azulsystems.com/products/whitepaper_abstract.html). Because of PGC, the max heap size for an Azul JVM is 96 gigabytes, and applications can fully utilize the entire heap. Since the Azul appliances also have an abundance of processor cores, GC will run concurrently with the application, on a set of parallel GC threads, and not take valuable processing cycles away from the application. Increasing the maximum heap size is also a simple configuration fix that can be done on the command line.

"Add more memory" (or in Java's case "add more heap") has long been a maxim of smart performance engineers, but what to do with it? For J2EE applications, the additional memory can be used to increase the number of threads to handle more work, and to increase HttpSession and ResultSet caches to reduce the amount of time waiting for the database to respond with a corresponding reduction in database load (www.javaperformancetuning.com/tips/jdbc_caching.shtml). Most systems also have application-specific caches or off-the-shelf caching products that can be enlarged to improve cache hit rates and lower database I/O rates.

Once the configuration has been adjusted to increase the amount of resources available to the application, the instance may still suffer from internal resource constraints. Often this turns out to be one or more synchronized methods that become a point of contention because JVM implementations use pessimistic locking to implement synchronized methods. The Azul platform can also help here because it has support for Optimistic Thread Concurrency (OTC) that permits multiple threads to enter a single synchronized method (www.azulsystems.com/products/whitepaper_abstract.html). Rather than trying to prevent conflicts with pessimistic locking, OTC will detect and repair actual memory conflicts between threads. Wherever there are no memory conflicts between threads, OTC will permit all of the threads to be in the synchronized method simultaneously. The result is that applications will see more concurrency in the application, and the need for fancy, fine-grained, multilevel or reader/writer locking strategies is therefore reduced, thus resulting in faster development time and less application tuning. However there are also some pitfalls to watch out for as well. In particular, be careful about performing I/O or long-running tasks in a synchronized method held because this will reduce concurrency in the application by making waiters wait longer for the holder to release the lock. Doing I/O inside a synchronized method also defeats OTC (www-128.ibm.com/developerworks/java/library/j-threads2.html). The Azul porduct also includes a JVM dashboard that will display the wait time and queue depth for the locks in the system. By looking at the top contended locks, you can zoom in on which parts of your application needs to be examined and improved to get more concurrency and therefore more work done.

More Stories By Bob Pasker

Bob Pasker is the deputy CTO of Azul Systems. He has been designing and developing networking, communications, transaction processing, and database products for 25 years. As one of the founders of WebLogic, the first independent Java company (acquired by BEA Systems in 1998), he was the chief architect of the WebLogic Application Server, which today still dominates the market. Bob has provided technical leadership and management for numerous award-winning technologies, including the TribeLink series of routers and remote access devices, and the TMX transaction processing system.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
SYS-CON Belgium News Desk 01/31/06 05:17:36 PM EST

Every major chip manufacturer has delivered or announced a roadmap for multicore chips that have multiple CPUs on the same piece of silicon. Systems developers are now designing these chips into their entire product line. For Java platform developers, Symmetric Multiprocessing Systems (SMP) should be hidden well below the hardware abstraction layer, but not all applications will get equal benefits from SMP without understanding what's going on under the hood.

SYS-CON Belgium News Desk 01/31/06 05:03:04 PM EST

Every major chip manufacturer has delivered or announced a roadmap for multicore chips that have multiple CPUs on the same piece of silicon. Systems developers are now designing these chips into their entire product line. For Java platform developers, Symmetric Multiprocessing Systems (SMP) should be hidden well below the hardware abstraction layer, but not all applications will get equal benefits from SMP without understanding what's going on under the hood.