Java开发网 - Java 理论与实践：并发在一定程度上使一切变得简单

Java开发网

您没有登录

» Java开发网 » 技术文章库

打印话题 寄给朋友 订阅主题

作者

Java 理论与实践：并发在一定程度上使一切变得简单

palatum

CJSDN高级会员

发贴: 451
积分: 80

于 2003-04-14 15:28

Java 理论与实践：并发在一定程度上使一切变得简单
级别：中级

Brian Goetz（brian@quiotix.com）
首席顾问，Quiotix Corp
2003 年 4 月

对于每个项目，象许多其它应用程序基础结构服务一样，通常无需从头重新编写并发实用程序类（如工作队列和线程池）。这个月，Brian Goetz 将介绍 Doug Lea 的 util.concurrent 包，这是一个高质量的、广泛使用的、并发实用程序的开放源码包。可以通过本文的论坛提出您对本文的想法，以飨笔者和其他读者。（您也可以单击本文顶部或底部的“讨论”参加论坛。）
当项目中需要 XML 解析器、文本索引程序和搜索引擎、正则表达式编译器、XSL 处理器或 PDF 生成器时，我们中大多数人从不会考虑自己去编写这些实用程序。每当需要这些设施时，我们会使用商业实现或开放源码实现来执行这些任务原因很简单 — 现有实现工作得很好，而且易于使用，自己编写这些实用程序会事倍功半，或者甚至得不到结果。作为软件工程师，我们更愿意遵循艾萨克·牛顿的信念 — 站在巨人的肩膀之上，有时这是可取的，但并不总是这样。（在 Richard Hamming 的 Turing Award 讲座中，他认为计算机科学家的“自立”要更可取。）

探究重复发明“车轮”之原因
对于一些几乎每个服务器应用程序都需要的低级应用程序框架服务（如日志记录、数据库连接合用、高速缓存和任务调度等），我们看到这些基本的基础结构服务被一遍又一遍地重写。为什么会发生这种情况？因为现有的选择不够充分，或者因为定制版本要更好些或更适合手边的应用程序，但我认为这是不必要的。事实上，专为某个应用程序开发的定制版本常常并不比广泛可用的、通用的实现更适合于该应用程序，也许会更差。例如，尽管您不喜欢 log4j，但它可以完成任务。尽管自己开发的日志记录系统也许有一些 log4j 所缺乏的特定特性，但对于大多数应用程序，您很难证明，一个完善的定制日志记录包值得付出从头编写的代价，而不使用现有的、通用的实现。可是，许多项目团队最终还是自己一遍又一遍地编写日志记录、连接合用或线程调度包。

表面上看起来简单
我们不考虑自己去编写 XSL 处理器的原因之一是，这将花费大量的工作。但这些低级的框架服务表面上看起来简单，所以自己编写它们似乎并不困难。然而，它们很难正常工作，并不象开始看起来那样。这些特殊的“轮子”一直处在重复发明之中的主要原因是，在给定的应用程序中，往往一开始对这些工具的需求非常小，但当您遇到了无数其它项目中也存在的同样问题时，这种需求会逐渐变大。理由通常象这样：“我们不需要完善的日志记录／调度／高速缓存包，只需要一些简单的包，所以只编写一些能达到我们目的的包，我们将针对自己特定的需求来调整它”。但情况往往是，您很快扩展了所编写的这个简单工具，并试图添加再添加更多的特性，直到编写出一个完善的基础结构服务。至此，您通常会执著于自己所编写的程序，无论它是好是坏。您已经为构建自己的程序付出了全部的代价，所以除了转至通用的实现所实际投入的迁移成本之外，还必须克服这种“已支付成本”的障碍。

并发构件的价值所在
编写调度和并发基础结构类的确要比看上去难。Java 语言提供了一组有用的低级同步原语：wait()、 notify() 和 synchronized，但具体使用这些原语需要一些技巧，需要考虑性能、死锁、公平性、资源管理以及如何避免线程安全性方面带来的危害等诸多因素。并发代码难以编写，更难以测试 — 即使专家有时在第一次时也会出现错误。Concurrent Programming in Java（请参阅参考资料）的作者 Doug Lea 编写了一个极其优秀的、免费的并发实用程序包，它包括并发应用程序的锁、互斥、队列、线程池、轻量级任务、有效的并发集合、原子的算术操作和其它基本构件。人们一般称这个包为 util.concurrent（因为它实际的包名很长），该包将形成 Java Community Process JSR 166 正在标准化的 JDK 1.5 中 java.util.concurrent 包的基础。同时，util.concurrent 经过了良好的测试，许多服务器应用程序（包括 JBoss J2EE 应用程序服务器）都使用这个包。

填补空白
核心 Java 类库中略去了一组有用的高级同步工具（譬如互斥、信号和阻塞、线程安全集合类）。Java 语言的并发原语 — synchronization、wait() 和 notify() — 对于大多数服务器应用程序的需求而言过于低级。如果要试图获取锁，但如果在给定的时间段内超时了还没有获得它，会发生什么情况？如果线程中断了，则放弃获取锁的尝试？创建一个至多可有 N 个线程持有的锁？支持多种方式的锁定（譬如带互斥写的并发读）？或者以一种方式来获取锁，但以另一种方式释放它？内置的锁定机制不直接支持上述这些情形，但可以在 Java 语言所提供的基本并发原语上构建它们。但是这样做需要一些技巧，而且容易出错。

服务器应用程序开发人员需要简单的设施来执行互斥、同步事件响应、跨活动的数据通信以及异步地调度任务。对于这些任务，Java 语言所提供的低级原语很难用，而且容易出错。util.concurrent 包的目的在于通过提供一组用于锁定、阻塞队列和任务调度的类来填补这项空白，从而能够处理一些常见的错误情况或者限制任务队列和运行中的任务所消耗的资源。

调度异步任务
util.concurrent 中使用最广泛的类是那些处理异步事件调度的类。在本专栏七月份的文章中，我们研究了 thread pools and work queues，以及许多 Java 应用程序是如何使用“Runnable 队列”模式调度小工作单元。

可以通过简单地为某个任务创建一个新线程来派生执行该任务的后端线程，这种做法很吸引人：

new Thread(new Runnable() { ... } ).start();

虽然这种做法很好，而且很简洁，但有两个重大缺陷。首先，创建新的线程需要耗费一定资源，因此产生出许许多多线程，每个将执行一个简短的任务，然后退出，这意味着 JVM 也许要做更多的工作，创建和销毁线程而消耗的资源比实际做有用工作所消耗的资源要多。即使创建和销毁线程的开销为零，这种执行模式仍然有第二个更难以解决的缺陷 — 在执行某类任务时，如何限制所使用的资源？如果突然到来大量的请求，如何防止同时会产生大量的线程？现实世界中的服务器应用程序需要比这更小心地管理资源。您需要限制同时执行异步任务的数目。

线程池解决了以上两个问题 — 线程池具有可以同时提高调度效率和限制资源使用的好处。虽然人们可以方便地编写工作队列和用池线程执行 Runnable 的线程池（七月份那篇专栏文章中的示例代码正是用于此目的），但编写有效的任务调度程序需要做比简单地同步对共享队列的访问更多的工作。现实世界中的任务调度程序应该可以处理死线程，杀死超量的池线程，使它们不消耗不必要的资源，根据负载动态地管理池的大小，以及限制排队任务的数目。为了防止服务器应用程序在过载时由于内存不足错误而造成崩溃，最后一项（即限制排队的任务数目）是很重要的。

限制任务队列需要做决策 — 如果工作队列溢出，则如何处理这种溢出？抛弃最新的任务？抛弃最老的任务？阻塞正在提交的线程直到队列有可用的空间？在正在提交的线程内执行新的任务？存在着各种切实可行的溢出管理策略，每种策略都会在某些情形下适合，而在另一些情形下不适合。

Executor
Util.concurrent 定义一个 Executor 接口，以异步地执行 Runnable，另外还定义了 Executor 的几个实现，它们具有不同的调度特征。将一个任务排入 executor 的队列非常简单：

Executor executor = new QueuedExecutor();
...
Runnable runnable = ... ;
executor.execute(runnable);

最简单的实现 ThreadedExecutor 为每个 Runnable 创建了一个新线程，这里没有提供资源管理 — 很象 new Thread(new Runnable() {}).start() 这个常用的方法。但 ThreadedExecutor 有一个重要的好处：通过只改变 executor 结构，就可以转移到其它执行模型，而不必缓慢地在整个应用程序源码内查找所有创建新线程的地方。QueuedExecutor 使用一个后端线程来处理所有任务，这非常类似于 AWT 和 Swing 中的事件线程。QueuedExecutor 具有一个很好的特性：任务按照排队的顺序来执行，因为是在一个线程内来执行所有的任务，任务无需同步对共享数据的所有访问。

PooledExecutor 是一个复杂的线程池实现，它不但提供工作线程（worker thread）池中任务的调度，而且还可灵活地调整池的大小，同时还提供了线程生命周期管理，这个实现可以限制工作队列中任务的数目，以防止队列中的任务耗尽所有可用内存，另外还提供了多种可用的关闭和饱和度策略（阻塞、废弃、抛出、废弃最老的、在调用者中运行等）。所有的 Executor 实现为您管理线程的创建和销毁，包括当关闭 executor 时，关闭所有线程，另外还为线程创建过程提供了 hook，以便应用程序可以管理它希望管理的线程实例化。例如，这使您可以将所有工作线程放在特定的 ThreadGroup 中，或者赋予它们描述性名称。

FutureResult
有时您希望异步地启动一个进程，同时希望在以后需要这个进程时，可以使用该进程的结果。FutureResult 实用程序类使这变得很容易。FutureResult 表示可能要花一段时间执行的任务，并且可以在另一个线程中执行此任务，FutureResult 对象可用作执行进程的句柄。通过它，您可以查明该任务是否已经完成，可以等待任务完成，并检索其结果。可以将 FutureResult 与 Executor 组合起来；可以创建一个 FutureResult 并将其排入 executor 的队列，同时保留对 FutureResult 的引用。清单 1 显示了一个一同使用 FutureResult 和 Executor 的简单示例，它异步地启动图像着色，并继续进行其它处理：

清单 1. 运作中的 FutureResult 和 Executor Executor executor = ...
ImageRenderer renderer = ...

FutureResult futureImage = new FutureResult();
Runnable command = futureImage.setter(new Callable() {
public Object call() { return renderer.render(rawImage); }
});

// start the rendering process
executor.execute(command);

// do other things while executing
drawBorders();
drawCaption();

// retrieve the future result, blocking if necessary
drawImage((Image)(futureImage.get())); // use future

FutureResult 和高速缓存
还可以使用 FutureResult 来提高按需装入高速缓存的并发性。通过将 FutureResult 放置在高速缓存内，而不是放置计算本身的结果，可以减少持有高速缓存上写锁的时间。虽然这种做法不能加快第一个线程把某一项放入高速缓存，但它将减少第一个线程阻塞其它线程访问高速缓存的时间。它还使其它线程更早地使用结果，因为它们可以从高速缓存中检索 FutureTask。清单 2 显示了使用用于高速缓存的 FutureResult 示例：

清单 2. 使用 FutureResult 来改善高速缓存 public class FileCache {
private Map cache = new HashMap();
private Executor executor = new PooledExecutor();

public void get(final String name) {
FutureResult result;

synchronized(cache) {
result = cache.get(name);
if (result == null) {
result = new FutureResult();
executor.execute(result.setter(new Callable() {
public Object call() { return loadFile(name); }
}));
cache.put(result);
}
}
return result.get();
}
}

这种方法使第一个线程快速地进入和退出同步块，使其它线程与第一个线程一样快地得到第一个线程计算的结果，不可能出现两个线程都试图计算同一个对象。

结束语
util.concurrent 包包含许多有用的类，您可能认为其中一些类与您已编写的类一样好，也许甚至比从前还要好。它们是许多多线程应用程序的基本构件的高性能实现，并经历了大量测试。util.concurrent 是 JSR 166 的切入点，它将带来一组并发性的实用程序，这些实用程序将成为 JDK 1.5 中的 java.util.concurrent 包，但您不必等到那时侯才能使用它。在以后的文章中，我将讨论 util.concurrent 中一些定制的同步类，并研究 util.concurrent 和 java.util.concurrent API 中的不同之处。

作者

English Version [Re:palatum]

palatum

CJSDN高级会员

发贴: 451
积分: 80

于 2003-04-14 15:29

Java theory and practice: Concurrency made simple (sort of)
An introduction to the util.concurrent package
Level: Intermediate

Brian Goetz (brian@quiotix.com)
Principal Consultant, Quiotix Corp
November 2002

Like many other application infrastructure services, concurrency utility classes such as work queues and thread pools are often needlessly rewritten from scratch for every project. This month, Brian Goetz offers an introduction to Doug Lea's util.concurrent package, a high-quality, widely used, open-source package of concurrency utilities. Share your thoughts on this article with the author and other readers in the discussion forum on this article. (You can also click Discuss at the top or bottom of the article to access the forum.)
Most of us would never think of writing our own XML parser, text indexing and search engine, regular expression compiler, XSL processor, or PDF generator as part of a project that needs one of these utilities. When we need these facilities, we use a commercial or open source implementation to perform these tasks for us, and with good reasons -- the existing implementations do a good job, are easily available, and writing our own would be a lot of work for relatively little (or no) gain. As software engineers, we like to believe that we share Isaac Newton's enthusiasm for standing on the shoulders of giants, and this is sometimes, but not always, the case. (In his Turing Award lecture, Richard Hamming suggested that computer scientists instead prefer to "stand on each other's feet.")

Wheels reinvented, inquire within
When it comes to low-level application framework services such as logging, database connection pooling, caching, and task scheduling, which nearly every server application requires, we see these basic infrastructure services rewritten over and over again. Why is this? It's not necessarily because the existing options were inadequate or because the custom versions are better or more well-suited to the application at hand. In fact, the custom versions are often not any better suited to the application for which they are developed than the widely available, general-purpose implementations, and may well be inferior. For example, while you might not like log4j, it gets the job done. And while homegrown logging systems may have specific features log4j lacks, for most applications you'd be hard-pressed to argue that a full-blown custom logging package would be worth the cost of writing it from scratch instead of using an existing, general-purpose implementation. And yet, many project teams end up writing their own logging, connection pooling, or thread scheduling packages, over and over again.

Deceptively simple
One of the reasons we wouldn't consider writing our own XSL processor is that it would be a tremendous amount of work. But these low-level framework services are deceptively simple, and so writing our own doesn't seem that hard. But they are harder to do correctly than it might first appear. The primary reason these particular wheels keep getting reinvented is that the need for these facilities in a given application often starts small, but grows as you run into the same issues that countless other projects have. The argument usually goes like this: "We don't need a full-blown logging/scheduling/caching package, we just need something simple, so we'll just write something to do that, and it will be tailored for our specific needs." But often, you quickly outgrow the simple facility you've written, and are tempted to add a few more features, and a few more, until you've written a full-blown infrastructure service. And at that point, you're usually wedded to what you've already written, whether it's better or not. You've already paid the full cost of building your own, so in addition to the actual migration cost of moving to a general-purpose implementation, you'd have to overcome the "sunk cost" barrier as well.

A treasure trove of concurrency building blocks
Scheduling and concurrency infrastructure classes are definitely harder to write than they look. The Java language provides a useful set of low-level synchronization primitives -- wait(), notify(), and synchronized -- but the details of using these primitives are tricky and there are many performance, deadlock, fairness, resource management, and thread-safety hazards to avoid. Concurrent code is hard to write and harder to test -- and even the experts sometimes get it wrong the first time. Doug Lea, author of Concurrent Programming in Java (see Resources), has written an excellent free package of concurrency utilities, including locks, mutexes, queues, thread pools, lightweight tasks, efficient concurrent collections, atomic arithmetic operations, and other basic building blocks of concurrent applications. This package, generally referred to as util.concurrent (because the real package name is too long), will form the basis of the java.util.concurrent package in JDK 1.5, being standardized under Java Community Process JSR 166. In the meantime, util.concurrent is well-tested and is used in many server applications, including the JBoss J2EE application server.

Filling a void
A useful set of high-level synchronization tools, such as mutexes, semaphores, and blocking, thread-safe collection classes, was a glaring omission from the core Java class libraries. The Java language's concurrency primitives -- synchronization, wait(), and notify() -- are too low-level for the needs of most server applications. What happens if you need to try to acquire a lock, but time out if you don't get it within a certain period of time? Abort an attempt to acquire a lock if a thread is interrupted? Create a lock that at most N threads can hold? Support multi-mode locking, such as concurrent-read with exclusive-write? Or acquire a lock in one method and release it in another? The built-in locking supports none of these directly, but all of them can be built on the basic concurrency primitives that the Java language provides. But doing so is tricky and easy to get wrong.

Server application developers need simple facilities to enforce mutual exclusion, synchronize responses to events, communicate data across activities, and asynchronously schedule tasks. The low-level primitives that the Java language provides for this are difficult to use and error-prone. The util.concurrent package aims to fill this void by providing a set of classes for locking, blocking queues, and task scheduling, which provide the ability to deal with common error cases or bound the resources consumed by task queues and work-in-process.

Scheduling asynchronous tasks
The most widely used classes in util.concurrent are those that deal with scheduling of asynchronous events. In the July installment of this column, we looked at thread pools and work queues, and how the pattern of "queue of Runnable" is used by many Java applications to schedule small units of work.

It is very tempting to fork a background thread to execute a task by simply creating a new thread for the task:

new Thread(new Runnable() { ... } ).start();

While this notation is nice and compact, it has two significant disadvantages. First, creating a new thread has a certain resource cost, and so spawning many threads, each of which will perform a short task and then exit, means that the JVM may be doing more work and consuming more resources creating and destroying threads than actually doing useful work. Even if the creation and teardown overhead were zero, there is still a second, more subtle disadvantage to this execution pattern -- how do you bound the resources used in executing tasks of a certain type? What would prevent you from spawning a thousand threads at once, if a flood of requests came in all of a sudden? Real-world server applications need to manage their resources more carefully than this. You need to limit the number of asynchronous tasks executing at once.

Thread pools solve both of these problems -- they offer the advantages of improved scheduling efficiency and bounding resource utilization at the same time. While one can easily write a work queue and thread pool that executes Runnables in pool threads (the example code in July's column will do the job), there is a lot more to writing an effective task scheduler than simply synchronizing access to a shared queue. A real-world task scheduler should deal with threads that die, kill excess pool threads so they don't consume resources unnecessarily, manage the pool size dynamically based on load, and bound the number of tasks queued. The last item, bounding the number of tasks queued, is important for keeping server applications from crashing due to out-of-memory errors when they become overloaded.

Bounding the task queue requires a policy decision -- if the work queue overflows, what do you do with the overflow? Throw away the newest item? Throw away the oldest item? Block the submitting thread until space is available on the queue? Execute the new item in the submitting thread? There are a variety of viable overflow-management policies, each of which is appropriate in some situations and inappropriate in others.

Executor
Util.concurrent defines an interface, Executor, to execute Runnables asynchronously, and defines several implementations of Executor that offer different scheduling characteristics. Queuing a task to an executor is quite simple:

Executor executor = new QueuedExecutor();
...
Runnable runnable = ... ;
executor.execute(runnable);

The simplest implementation, ThreadedExecutor, creates a new thread for each Runnable, and provides no resource management -- much like the new Thread(new Runnable() {}).start() idiom. But ThreadedExecutor has one significant advantage: by changing only the construction of your executor, you can move to a different execution model without having to crawl through your entire application source to find all the places where you create new threads. QueuedExecutor uses a single background thread to process all tasks, much like the event thread in AWT and Swing. QueuedExecutor has the nice property that tasks are executed in the order they were queued, and because they are all executed within a single thread, tasks don't necessarily need to synchronize all accesses to shared data.

PooledExecutor is a sophisticated thread pool implementation, which not only provides scheduling of tasks in a pool of worker threads, but also provides flexible pool-size tuning and thread life-cycle management, can bound the number of items on the work queue to prevent queued tasks from consuming all available memory, and offers a variety of available shutdown and saturation policies (block, discard, throw, discard-oldest, run-in-caller, and so on). All the Executor implementations manage thread creation and teardown for you, including shutting down all threads when the executor is shut down, and they also provide hooks into the thread creation process so that your application can manage thread instantiation if it wants to. This allows you to, for example, place all worker threads in a particular ThreadGroup or give them a descriptive name.

FutureResult
Sometimes you want to start a process asynchronously, in the hopes that the results of that process will be available when you need it later. The FutureResult utility class makes this easy. FutureResult represents a task that may take some time to execute and which can execute in another thread, and the FutureResult object serves as a handle to that execution process. Through it, you can find out if the task has completed, wait for it to complete, and retrieve its result. FutureResult can be combined with Executor; you can create a FutureResult and queue it to an executor, keeping a reference to the FutureResult. Listing 1 shows a simple example of FutureResult and Executor together, which starts the rendering of an image asynchronously and continues with other processing:

Listing 1. FutureResult and Executor in action Executor executor = ...
ImageRenderer renderer = ...

FutureResult futureImage = new FutureResult();
Runnable command = futureImage.setter(new Callable() {
public Object call() { return renderer.render(rawImage); }
});

// start the rendering process
executor.execute(command);

// do other things while executing
drawBorders();
drawCaption();

// retrieve the future result, blocking if necessary
drawImage((Image)(futureImage.get())); // use future

FutureResult and caching
You can also use FutureResult to improve the concurrency of load-on-demand caches. By placing a FutureResult into the cache, rather than the result of the computation itself, you can reduce the time that you hold the write lock on the cache. While it won't speed up the first thread to place an item in the cache, it will reduce the time that the first thread blocks other threads from accessing the cache. It will also make the result available earlier to other threads since they can retrieve FutureTask from the cache. Listing 2 is an example of using FutureResult for caching:

Listing 2. Using FutureResult to improve caching public class FileCache {
private Map cache = new HashMap();
private Executor executor = new PooledExecutor();

public void get(final String name) {
FutureResult result;

synchronized(cache) {
result = cache.get(name);
if (result == null) {
result = new FutureResult();
executor.execute(result.setter(new Callable() {
public Object call() { return loadFile(name); }
}));
cache.put(result);
}
}
return result.get();
}
}

This approach allows the first thread to get in and out of the synchronized block quickly, and allows other threads to have the result of the first thread's computation as quickly as the first thread does, with no chance of two threads both trying to compute the same object.

Summary
The util.concurrent package contains many useful classes, some of which you may recognize as better versions of classes you've already written, perhaps even more than once. They are battle-tested, high-performance implementations of many of the basic building blocks of multithreaded applications. util.concurrent was the starting point for JSR 166, which will be producing a set of concurrency utilities that will become the java.util.concurrent package in JDK 1.5, but you don't have to wait until then. In a future article, I'll look at some of the custom synchronization classes in util.concurrent, and explore some of the ways in which the util.concurrent and java.util.concurrent APIs differ.

已读帖子

新的帖子

被删除的帖子