Java开发网 - 诊断 Java 代码：设计轻松的代码维护

Java开发网

您没有登录

» Java开发网 » 技术文章库

打印话题 寄给朋友 订阅主题

作者

诊断 Java 代码：设计轻松的代码维护

palatum

CJSDN高级会员

发贴: 451
积分: 80

于 2003-04-15 17:04

诊断 Java 代码：设计轻松的代码维护

避免不必要的变化和访问以使代码健壮且更易于维护
级别：入门

Eric E. Allen（eallen@cs.rice.edu）
博士研究生，Java 编程语言团队，莱斯大学（Rice University）
2003 年 4 月

本月，Eric Allen 解释了在使代码更易于维护的同时，避免和控制无理由的变化怎么会是保持代码健壮性的关键。他集中讨论了诸如函数样式代码编写之类的概念，以及标记字段、方法和类的方法来处理并防止可变性。Eric 还解释了本任务中单元测试和重构的角色，并提供了协助实现重构的两个工具。在相关论坛中与作者和其他读者分享您对本文的看法。（您也可以单击本文顶部或底部的“讨论”，访问该论坛。）
有效调试源自良好的编程。设计易于维护的程序是程序员面临的最困难挑战之一，其部分原因在于程序通常并不是由那些编写代码的程序员维护的。为了有效维护这样的程序，新程序员必须能够快速了解程序的工作原理，如果程序员能够单独理解整个程序中各个小部分，那么就可以容易地了解程序的工作原理。

通过讨论可变性、可译码性、私有方法、最终方法、最终类、本地代码、单元测试以及重构问题，我们将简述编写程序的一些方法，以帮助使程序更易理解和维护。

可变性和可译码性
首先讨论可变性问题。如果在一个程序的计算期间，其每一部分所处理的数据都没有被该程序的其它、远程部分更改，那么就很容易单独理解该程序的各个部分。

太多信息
例如，请考虑一个使用容器类实例的程序，可以修改其中的成分链接。每次将容器从程序某一部分上的方法传递到该程序其它部分的方法，以及每次调用 new 表达式（其中容器被作为参数传递）时，容器就可能脱离调用方法的控制发生改变。

在我们首先理解调用方法调用的每个方法如何修改容器之前，我们不能真正确保我们理解了调用方法，由此我们诊断错误的能力也就更差。如果这些被调用的每个方法都依次调用其它修改方法，那么维护程序员为了理解单个方法必须阅读的代码总量会迅速增加，多得无法控制。

由于这个原因，对可变容器和不可变容器使用不同类会非常有利。在不可变版本中，容器的字段可以标记成 final。

求助于函数样式
相对于修改旧数据，为构造新数据而进行代码编写称为函数样式，因为程序的方法与数学函数相似，其行为是根据每个输入所返回的输出来单独描述的。

函数样式经常被忽略的优点是相当容易单独理解程序的个别组件。如果方法所操纵的数据决不会被其主体中执行的任何操作改变，那么程序员要理解该方法必须做的就是理解那些操作返回的结果。将之与前面的一个方法调用几个其它方法的方案相对照，那个方案中的其它几个方法都修改这一方法所操作的数据结构。

Java 语言的一个相当好的特性是它允许我们使用 final 关键字（作为类型检查器的伪指令）来声明何时我们要使某个数据成为不可变。

使用 final 关键字来避免变化是“钉住”类的方法行为的一个好方法。每次修改字段时，都有可能改变引用该字段的方法的行为。另外，将字段标记为 final 让阅读程序的其他程序员立即知道：不管整个程序有多大，决不要修改该字段。例如，请考虑下列表示不可变列表的类层次结构。

清单 1. 表示不可变列表的类层次结构

abstract class List {...}
class Empty extends List {...}
class Cons extends List {
private final Object first;
private final List rest;
}

这些类中的所有字段都被标记成 final。要确保这些类的实例不可变，这样做够了吗？不太够。当然，即使字段被标记成 final，该字段本身的组件可能不是 final，记住这一点很重要。当那些组件更改时，引用那些组件的程序的任何部分可能会被修改，而不管字段本身是否改变。在上面的示例中，尽管列表的组成元素不能被修改，但是我们必须检查那些元素本身没有包含可能被修改的非最终字段。

在这种情形中，尽管列表可能包含可变元素，但是我们可以看到存储在给定列表中的元素序列由于以下原因而不可变：Empty 列表（即，长度为零的列表）的实例根本不包含任何元素；因此不能修改它们。Cons（非空列表）实例包含两个字段，都是 final。第一个字段包含该列表的第一个元素，它不能被修改；第二个字段包含一个列表，其中包含所有剩余元素。如果这个列表的内容不可变，那么该包含列表也不可变。

但是包含在这第二个字段中的列表比包含列表的长度小一，所以如果我们知道长度为 n 的所有列表都不可变，那么我们就知道长度为 n + 1 的列表也不可变。因为我们已经知道长度为零的列表不可变，所以我们也知道长度为 1、2、3 等的列表同样不可变。

跟踪与此类似的数据结构连接会很乏味，但当您能确定这种结构的全局特性（诸如不可变性）时，这样做是值得的。

控制变化
防止出现不期望的变化的最佳策略就是尽可能避免所有变化。只有当出现一定要改变的原因时（例如，当这样做大大简化了代码结构时），我们才应该使用它。当可以避免变化时，所产生的好处是巨大的（在较低的维护费用和增强的健壮性方面）。

即使存在一定要改变数据的原因，最好还是设法控制那种变化，从而尽可能限制可能产生的破坏。迭代器和流是数据结构的极佳示例，这些数据结构明确设计成通过允许我们以常规的、定义良好的形式利用一系列元素，而不是明确修改这些元素的某个句柄来控制变化。

私有方法
就如同将字段设置成 final 有助于限制对字段值产生外部影响一样，将它们设置成 private 有助于限制它们对程序其它部分产生的影响。如果字段是私有的，那么我们可以确信该程序的其它部分都不与它直接相关。如果我们除去了该字段，并替换了该类数据的内部表示，那么我们只要关心修正该类内部的方法，以正确访问新数据。

在前面的示例中，请注意类 Cons 的字段是私有的。这样的话，我们就可以通过读方法（getter）及类似方法来控制如何访问那些元素。如果我们列表的未来维护人员有时想要修改列表的内部表示（例如，可以论证在某些平台上，基于数组的列表或许更有效），那么程序员可以这样做，而不必修改或甚至查看那些列表的任何客户机。他只要重写 getter 就可以对新数据采取适当的操作。

最终方法、最终类和理解本地代码
与将字段标记成 final 形成对比的是，将方法标记成 final 通常被指责为与 OO 设计目标不一致，因为这样禁止继承多态性。但是在尝试理解大型程序的行为时，这样有助于了解什么方法没有被重写。

现在良好的 OO 设计涉及使用大量继承，这的确是事实。事实上，继承是许多 OO 设计模式的核心。但是那并不意味着我们应该允许我们编写的每个方法都被重写。通常程序将隐式地依赖某些没有被重写的关键方法。通过将这样的方法标记成 final，我们将允许其他程序员更好地理解调用该方法的表达式行为。

另外，将类标记成 final 会极大提高可译码性。它会真正有助于初步了解程序中哪些类决不会被子类化。事实上，我认为：只有不应被标记成 final 的类才是程序中真正被子类化的类，以及那些有意从外部组件上被子类化的类（作为程序设计的固有部分）。

有人可能认为这个概念会束缚将来的代码维护人员，使他们不能扩展代码。我认为这肯定不会限制他们。如果程序将来的维护人员需要扩展代码以包含以前不存在的子类，那么只要他们拥有对源代码的访问权（如果他们无权访问，那么如何成为该代码的“维护人员”呢？），删除相应类上的 final 关键字并重新编译并不太困难。

同时，那个被添加的关键字充当关于该程序的重要不变量的自动验证文档形式（“自动验证”是因为如果该文档被破坏，那么该程序甚至不会编译）。通过强制开发人员自觉选择何时要删除这样的不变量，我们可以帮助减少错误的引入。

单元测试和变化
单元测试总是能够有助于理解具有副作用的代码。如果一套单元测试充分证明了方法在程序中的作用，那么程序员只要通过阅读其单元测试就可以更迅速理解每个方法。当然，单元测试是否真的涵盖了所有的作用是个大问题。类似于 Clover 的有效范围分析工具在这里可以提供某种程度的帮助。

但是，请注意单元测试本身要比编写严格的函数方法简单得多。要测试严格的函数方法，涉及的全部就是用各种具有代表性的输入调用这些方法，并检查它们的输出（并确保它们在应该抛出异常时能抛出）。

在测试修改数据结构状态的方法时，我们必须首先执行这样的操作，这些操作是将输入数据放入该方法所期望的状态中所需要的，然后在调用该方法之后，检查是否正确执行了客户机所期望的数据的每次修改。

用重构工具封装
在编写新代码时这些技巧很有用，但是当您必须维护几乎不能译码的旧代码时，怎么办呢？重构、重构、还是重构。

尽管重构旧代码很费时，但是这些时间是值得的，特别是所有支持重构的工具现在也支持 Java 代码了。现在已经有许多自动重构 Java 代码的强大工具，这些工具可以自动保存关键的不变量。

重构 Java 代码的一个功能相当齐全的工具是 IDEA 开发环境。该环境对相当多的 Martin Fowler 重构模式提供自动支持。我找到的另一个非常有用的工具是 CodeGuide，它是一个来自德国的 IDE。尽管相对于 IDEA，其自动重构的列表很小，但是它显示了一个极其强大的特性 — 连续编译。当您输入新代码时，CodeGuide 分析它并告知您项目中是否不完整（当然，这产生很短的延迟，它防止对每次击键发出错误信号）。

尽管连续编译对响应产生负面影响，但是在某些上下文中非常值得这样的等待。例如，您可以在字段前输入 final，会立即看到项目中是否不完整。如果无，那么您知道该字段在该程序的任何地方都没有被修改。同样，您可以在字段之前输入 private，那么立即获得对该字段的所有外部访问的列表（以错误的形式）。

CodeGuide 的另一个极佳特性是它用泛型类型对 JSR-14 实验扩展提供了无缝支持（计划正式添加到 Java 1.5）。

尽管为了可译码性而编写代码会花费非常多的时间和精力，但是它会有助于提高代码的生命期和健壮性，而且它可以显著提高面临维护代码任务的那些程序员的生活质量。最后，重构旧代码使之更易维护很费时，但当您下次必须修正错误时您就会知道这样做是值得的

作者

Diagnosing Java code: Design for easy code maintenance [Re:palatum]

palatum

CJSDN高级会员

发贴: 451
积分: 80

于 2003-04-15 17:05

Diagnosing Java code: Design for easy code maintenance

Avoid unnecessary mutation and access to make code robust and easier to maintain
Level: Introductory

Eric E. Allen (eallen@cs.rice.edu)
Ph.D. candidate, Java programming languages team, Rice University
January 2003

This month, Eric Allen explains how avoiding and controlling gratuitous mutation is key to retaining code robustness while making the code easier to maintain. He focuses on such concepts as functional style code crafting and ways of marking fields, methods, and classes to handle and prevent mutability. Also, Eric explains the role of unit testing and refactoring in this task, and offers two tools to aid in refactoring efforts. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)
Effective debugging begins with good programming. Designing a program to be easy to maintain is one of the most difficult challenges a programmer faces, in part because programs are often maintained by programmers other than those who originated the code. To maintain such programs effectively, new programmers have to be able to quickly learn how the program works, a task that's done most easily if small parts of the program can be understood in isolation from the whole.

We'll outline some of the ways that programs can be written to help make them more easily understood and maintained, looking at the issues of mutability, decipherability, private methods, final methods, final classes, local code, unit tests, and refactoring.

Mutability and decipherability
First up is the issue of mutability. Parts of a program are most easily understood in isolation if the data that each part works on is not altered by other, remote parts of a program during a computation.

Too much information
For example, consider a program using an instance of a container class where the constituent links can be modified. Every time the container is passed to a method from one part of a program to another, and every time a new expression is called where the container is passed as an argument, an opportunity exists for the container to be altered away from the control of the calling method.

We can't really be sure of our understanding of the calling method, much less our ability to diagnose a bug, until we first understand how each of the methods it calls modifies the container. If each of these called methods call other modifying methods in turn, the amount of code the maintenance programmer must read to understand a single method can quickly balloon out of control.

For this reason, it can be highly advantageous to use different classes for mutable and immutable containers. In the immutable versions, the fields of the container can be marked as final.

Functional style to the rescue
Writing code so that it constructs new data as opposed to modifying old data is known as functional style because the methods of the program act like mathematical functions whose behavior is described solely in terms of the output returned for each input.

The often overlooked advantage of functional style is that the individual components of the program are far more easily understood in isolation. If the data manipulated by a method is never altered by any of the operations performed in its body, then all a programmer has to do to understand that method is understand the results returned by those operations. Compare this to the scenario above in which a method calls several other methods, each of which modify the very data structures the method operates on.

One nice feature of the Java language is that it allows us to use the final keyword, as a directive to the type checker, to declare when we want certain data to be immutable.

Avoiding mutation with the final keyword is a good way to nail down the behavior of a class's methods. Every time a field is modified, it has the potential to alter the behavior of the methods that refer to it. Additionally, marking a field as final lets other programmers that read the program know instantly that the field is never modified, no matter how large the whole program is. For example, consider the class hierarchy in the following for representing immutable lists.

Listing 1. Class hierarchy for representing immutable lists

abstract class List {...}
class Empty extends List {...}
class Cons extends List {
private final Object first;
private final List rest;
}

All fields in these classes are marked as final. Is that enough to ensure that instances of these classes are immutable? Not quite. Of course, even when a field is marked as final, it's important to remember that the components of the field itself may not be final. Any part of the program that refers to those components may be modified when they are altered, regardless of whether the field itself is altered. In the example above, although the constituent elements of the list can't be modified, we have to check that those elements themselves don't contain non-final fields that may be modified.

In this case, although a list may contain mutable elements, we can see that the sequence of elements stored in a given list are immutable by reasoning as follows: instances of Empty lists (that is, lists of length zero) contain no elements at all; therefore, they can't be modified. Instances of Cons (non-empty lists) contain two fields, both final. The first field contains the first element of the list and can't be modified; the second contains a list containing all remaining elements. If the contents of this list is immutable, then so is the containing list.

But the list contained in this second field has a length one less than the length of containing list, so if we knew that all lists of length n were immutable, we'd know that lists of length n + 1 were also immutable. Since we already know that zero-length lists are immutable, we also know that lists of length 1, 2, 3, and so on are also immutable.

Tracing through the connections of a data structure like this can be tedious, but it pays off when you can determine global properties of a such a structure, such as immutability.

Controlling mutation
The best strategy to defend against unexpected mutation is to simply avoid all mutation whenever possible. Only when there is a compelling reason to mutate (such as, when it vastly simplifies the structure of the code) should we make use of it. When mutation can be avoided, the payoff can be enormous (in terms of lower maintenance costs and increased robustness).

Even when there is a compelling reason to mutate data, it's best to try to control that mutation, to limit the potential damage as much as possible. Iterators and Streams are great examples of data structures explicitly designed to control mutation by allowing us to walk over a series of elements in a regular and well-defined fashion, rather than explicitly modifying some handle on the elements.

Private methods
Just as setting fields to final helps limit outside influences on the value of a field, setting them to private helps to limit the influence they have on other parts of the program. If a field is private, we can be certain that no other parts of the program depend on it directly. If we eliminate the field and replace the internal representation of the class data, we need only worry about fixing the methods inside of the class to access the new data properly.

In the earlier example, notice that the fields of class Cons are private. That way, we can control how those elements are accessed through getters and the like. If a future maintainer of our Lists ever wanted to modify our internal representation of Lists (for example, perhaps it turns out that on certain platforms, array-based lists are more efficient), the programmer can do so without modifying or even looking at any of the clients of those lists. He simply has to rewrite the getters to take the appropriate action with the new data.

Final methods, final classes, and understanding code locally
In contrast to marking fields as final, marking a method as final is often claimed to be at odds with OO design goals because it inhibits inheritance polymorphism. But when trying to understand the behavior of a large program, it helps to know what methods are not overridden.

Now it's absolutely true that good OO design involves using a great deal of inheritance. In fact, inheritance is central to many OO design patterns. But that doesn't mean that we should allow every method we write to be overwritten. Often a program will implicitly rely on certain key methods not being overwritten. By marking such methods as final, we will allow other programmers to better understand the behavior of expressions that call the method.

Additionally, marking classes as final can be a great boost to decipherability. It can really help to know at a glance which classes are never subclassed in a program. In fact, I would argue that the only classes that shouldn't be marked as final are classes that are actually subclassed in a program and classes that, as an inherent part of the program design, are intended to be subclassed from outside components.

Some may say that this concept will straightjacket future maintainers of the code, keeping them from being able to extend the code. I say it will certainly not restrict them. If future maintainers of a program need to extend it to include a subclass where none existed before, it's not hard to delete the final keyword on the corresponding class and recompile it, provided they have access to the source code (and if they don't have access to it, in what sense are they "maintainers" of that code?).

Meanwhile, that added keyword serves as a form of automatically verified documentation of an important invariant about the program ("automatically verified" because the program won't even compile if the documentation is violated). By forcing developers to consciously choose when they want to eliminate such an invariant, we can help to reduce the introduction of errors.

Unit tests and mutation
As always, unit tests can help in understanding side-effecting code. If a suite of unit tests adequately documents the effects of the methods in a program, then a programmer can understand each method more quickly just by reading its unit tests. Of course, the big question is whether the unit tests really do cover the effects adequately. Coverage analysis tools like Clover can help here to some degree.

Notice, however, that unit tests themselves are much easier to write for strictly functional methods. To test strictly functional methods, all that's involved is to call these methods with various representative inputs and check their outputs (and make sure they throw exceptions when they should).

When testing methods that modify the state of data structures, one must first perform the operations necessary to put the input data into the state expected by the method and then, after calling the method, check that every modification of the data expected by clients was performed correctly.

Wrapping up with refactoring tools
These tips can be great when writing new code, but what about when you have to maintain old code that is barely decipherable? Refactor, refactor, refactor.

Although refactoring old code takes time, that time is well spent, especially with all the tool support for refactoring now available for Java code. There are now many powerful tools for automatically refactoring Java code, tools that preserve key invariants automatically.

One of the most full-featured tools for refactoring Java code is the IDEA development environment. This environment provides automatic support for a significant chunk of Martin Fowler's refactoring patterns. Another tool I have found to be very useful is CodeGuide, a German IDE. Although its list of automatic refactorings is small compared to IDEA, it showcases an extraordinarily powerful feature -- continuous compilation. While you're typing new code, CodeGuide analyzes it and tells you if anything in the project has broken (of course, there is a short delay to prevent it from signaling errors on every keystroke).

Although continuous compilation negatively affects responsiveness, it can be well worth the wait in certain contexts. For example, you can type final in front of a field and instantly see if anything in the project breaks. If not, you know that the field isn't modified anywhere in the program. Similarly, you can type private in front of a field and instantly get a list of all outside accesses to the field (in the form of errors).

Another great feature of CodeGuide is that it provides seamless support for the JSR-14 experimental extension with generic types (scheduled for official addition in Java 1.5).

Although writing code for decipherability can take a lot more time and effort, it can help to increase the lifetime and the robustness of your code, and it can significantly enhance the quality of life for those who face the task of maintaining it. Finally, refactoring old code to be more maintainable takes time but pays for itself the next time you have to fix a bug.

作者	Re:诊断 Java 代码：设计轻松的代码维护 [Re:palatum]
xzpy00007 发贴: 37 积分: 10	于 2003-04-26 20:18 高手，应该向你学习，多看english文章。

已读帖子

新的帖子

被删除的帖子