Mohsen Vakilian's Blog

May 5, 2010

Should Types be Non-null by Default?

Filed under: language,refactoring — mohsenvakilian @ 2:32 pm

Patrice Chalin et al have studied 700K lines of open source Java code, and found that on average 75% of reference types (other than local variables) are meant to be non-null, based on design intent.

They presented their empirical study for the first time in an ECOOP 2007 paper entitled ”Non-null References by Default in Java: Alleviating the Nullity Annotation Burden“. In this paper, they argued that since most of type declarations are non-null by design intent, the default semantics of Java should be changed to non-null by default. I personally don’t think that it makes sense for Java to make such a backward-incompatible change. But, I do think that a solid refactoring tool for adding non-null types is valuable.

Patrice leads the JmlEclipse project, which is an Eclipse-based verification environment. JmlEclipse extends the Eclipse JDT Core plugin. It supports non-null types by default, and uses JML as its specification notation. They’ve chosen to use the JML notation for their non-null type system until JSR 305 is finalized. But, I’ve heard that JSR 305 is not going to be supported in Java 7, whereas JSR 308 is.

Patrice Chalin et al. published an extended version of their ECOOP 2007 paper in IET Software 2008. The title of their paper is “Reducing the use of nullable types through non-null by default and monotonic non-null“. In this paper, they also report their study on the usage of null. They noticed an interesting use of null that reduces memory usage. They noticed that sometimes developers set the reference to a large object to null to allow the garbage collector to free it. The other category of nullable types are what they call monotonic non-null types. A field of a monotonic type can hold the value of null before it’s initialized. But, once it gets a non-null value, it won’t admit null values anymore. They noticed that approximately 60% of nullable fields were monotonic non-null.

February 15, 2009

OOPSLA 2008–Annotation Refactoring

Filed under: evolution,refactoring — mohsenvakilian @ 12:27 am

Annotation refactoring is a refactoring by demonstration system to upgrade Java programs to use annotations. The example which the paper focuses on is JUnit. JUnit introduced some annotations in version 4 and annotation refactoring can be used to upgrade tests written using JUnit 3 to use JUnit 4. The authors try not to limit the tool to JUnit by inferring the transformations from a given example of upgrading a class. First, the differences between the two given versions of the class are computed. Then, the transformation is inferred and represented in a domain specific language which can be corrected and refined by the user.

The addressed problem is important and there are still more opportunities to improve the accuracy of the inferencer.  Upgrading programs to JUnit is a simple case of annotation refactoring and handling more complex Java frameworks will be challenging.

OOPSLA 2008–Sound and Extensible Renaming for Java

Filed under: refactoring — mohsenvakilian @ 12:05 am

Sound and extensible renaming addresses the difficulty of coming up with a correct implementation of the rename refactoring in Java. The presenter did a good job of motivating the audience at OOPSLA. At the beginning of his talk, he showed several examples of Java programs causing all major Java IDE’s to fail in performing the rename refactoring correctly. The authors mention two reasons for complexity of the rename refactoring:

  1. addition of new constructs to the Java language
  2. complex name lookup

They introduce a systematic way to ensure that the binding structure of the program is maintained when the refactoring is applied. In addition, they claim their approach is modular and thus easy to be extended to support new constructs of the Java language.
They have implemented the so called inverted lookup functions in JastAdd. By implementing the inverted lookup function corresponding to each lookup function, they claim no corner cases are missed and thus the binding structure is kept unchanged. While performing the rename refactoring, it might be necessary to replace some names by their fully qualified names. Some existing Java IDE’s abort the refactoring in such cases because of too strong preconditions. In their approach, they let the user perform the refactoring by automatically replacing a name by its fully qualified name if necessary.

Their paper demonstrates how tricky it is to implement the rename refactoring correctly. However, I still think there are some easier ways to fix the problem. Most IDE’s such as Eclipse can list all references of a particular variable. This means they can easily build the binding structure of the program. If we compare the binding structures before and after the refactoring we can prevent performing a rename refactoring which changes the binding structure.

The paper addresses an important problem and it might be possible to solve some deficiencies of other refactorings similarly.

March 22, 2008

JunGL a Scripting Language for Refactoring

Filed under: evolution,refactoring — mohsenvakilian @ 12:45 pm

JunGL is a hybrid functional-logic language in the tradition of ML and Datalog. The main data structure manipulated by JunGL is a graph representation of the program. Edges can be added to the graph lazily

The following function in JunGL computes the control flow edges emanating from a conditional statement:

let IfStmtCFSucc(node)=
match (node.thenBranch,node.elseBranch) with
| (null, null) -> [DefaultCFSucc(node)]
| (t, null) -> [t; DefaultCFSucc(node)]
| (null, e) -> [DefaultCFSucc(node); e]
| (t, e) -> [t; e] ;;

In JunGL, one can use predicates in functions via a stream comprehension.

{ ?x | P(?x) }

will return a stream of all x that satisfy the predicate P.

Path queries are regular expressions that identify paths in the program graph.


[var]
parent+
[?m:Kind(“MethodDecl”)]child
[?dec:Kind(“ParamDecl”)]
&
?dec.name == var.name

In the above path query, components between square brackets are conditions on nodes, whereas parent and child match edge labels. The above predicate thus describes a path from a variable occurrence var to its declaration as a method parameter.

They implement Rename Variable and Extract Method refactorings in JunGL for a subset of C#. Some refactoings are complex and require various analysis. So, I didn’t expect their language for describing refactorings to be simple. However, it seems to me that they’ve made the process of defining refactorings easier by using features of both functional programming and logic programming. Actually, there are several systems for defining refactorings out there and one could evaluate them against real programmers and compare them.

March 16, 2008

Jackpot Rule Language

Filed under: evolution,refactoring,technology — mohsenvakilian @ 3:18 pm

Jackpot is a NetBeans module that lets you define custom transformations. Jackpot transformations can be specified in two ways:

  1. Using Jackpot rules matching program segments or
  2. Using Jackpot API to manipulate the AST.

The Jackpot API is still under development. The rule language has been designed by James Gosling. And, in the following we’re going to know more about the Jackpot rule language by explaining two examples.

The first rule, removes all unnecessary casts. In this rule, meta-variables such as $T and $a are used to match various program elements. As you see in this example, type matching facilities are supported in Jackpot as guard expressions after the “::” operator.

($T)$a => $a :: $a instanceof $T;

The second rule transforms a regular for-loop into an enhanced for-loop. This rule uses both meta-lists and built-in guards. The $stmts$ symbol is a meta-list which matches a list of statements. The expression referencedIn($i, $stmts$), is a built-in guard which checks if the meta-variable $i has ever been referenced in the meta-list $stmts$.

for(int $i; $i < $array.length; i++) {
$T $var = ($T)$array[$i];
$stmts$;
} =>
for($T $var : $array) {
$stmts$;
} :: !referencedIn($i, $stmts$);

Jackpot rules are readable and if somebody needs a more sophisticated transformation he/she can use the Jackpot API to programmatically perform the transformation. Jackpot seems to be suitable for cases when you want to migrate your client code to use a new API. However, it lacks constructs to specify the scope. So, it may not be appropriate for transforming the library itself.
In Jackpot, no flow analysis is done during execution of a rule file. And, in an interview with Tom Ball, the Jackpot team leader, he mentions that Jackpot cannot be used outside of the NetBeans IDE.

February 6, 2008

Automatic Change Inference

Filed under: evolution,refactoring — mohsenvakilian @ 12:12 am

The research question addressed in this paper is:

“Given two versions of a program, what changes occurred with respect to a particular vocabulary of changes?”

Of course, we are looking for higher level approaches for reporting changes than traditional diff tools which define a change vocabulary in terms of delete, add, and move line.

Miryung Kim et al., presented their approach for change inference in a paper titled “Automatic Inference of Structural Changes for Matching Across Program Versions“.

By the following example you can get a feel of how the output of their tool looks like:

for all x in chart.*Plot.get*Range()
except {chart.MarketPlot.getVerticalRange}
argAppend(x, [ValueAxis)

This change rule, means that all methods that match the chart.*Plot.get*Range() pattern take an additional ValueAxis argument, except the getVerticalRange method in the MarkerPlot class.

As shown in the above example, their approach represents structural changes as a set of high-level change rules, automatically infers likely change rules and determines method-level matches based on the rules.

The set of transformation that they support are (1) refactorings such as rename package/class/field/method, add parameter, … (2) other transformations such as change argument types, change return types, change input argument list, …

They did a good job of printing the change rules in a concise manner. As they mention, the except clause sometimes signals a bug arising from incomplete or inconsistent changes.

I think more types of transformations should be taken into account for better descriptions of changes. Besides, one might try to handle changes due to refactorings by just recording them while they are being performed rather than trying to infer them by comparing the two versions of program.

October 29, 2007

iXj: A Language for Interactive Transformation of Java Programs

Filed under: refactoring — mohsenvakilian @ 1:00 am

At first, I suggest you to watch a demo of iXj. There is no publicly available version of the tool, but the demo will give you a feeling of how iXj works. It’s great, isn’t it?

I read Marat’s PhD dissertation to find out the ways the tool can be improved. In the following, some properties of iXj and some ideas for its improvement are explained.

I believe the best improvement to iXj is making the patterns reusable. Currently, iXj has been developed with one-time use of patterns. By making the transformations reusable, it might be possible to capture all the transformations of source code concisely. Making the transformations reusable is a step forward to defining custom refactorings.

We know that the power of regular expressions is limited. If we want to use regular expressions as building blocks for refactorings, this limitation would become a major problem. Here is the author’s explanation about the power of iXj:

Clearly there are TXL and refactoring transformations that are not expressible in iXj. Notably, those transformations that require control- and data-flow information, such as the Extract Method refactoring, are not supported.

Now, let’s consider the idea of making the tool language independent more closely. In page 57, the author says:

Our experience suggests that designing custom models on a case-by-case basis is the right approach for building language-based tools. However, extracting model instances from source code text and maintaining the correspondence between various models is a topic for another dissertation.

This statement and other parts of the dissertation made me somehow skeptical about the language independence approach. By being language independent, one might only rely on text based transformations which is obviously not adequate for safe transformation of source codes. One solution to solve this problem might be to perform the transformation in two phases. The first phase would be language independent and the second phase would check for language specific features. In this way, the first phase could be reused to implement the transformation tool for different languages.

Several developers participated in evaluation of iXj. One interesting point that the participants made was that the best existing tool for doing their maintenance tasks is javac (the Java compiler). They say:

The Java compiler is the most ubiquitous for its ability to locate places in source code that are semantically or syntactically inconsistent after a change.

I think this statement emphasizes the importance of semantic analysis in such tools. Semantic checks are not so easy to apply to a specific language. Thus, considering a language independent approach would make semantic checks much more harder.

In iXj, patterns are composed of several boxes representing different elements of source code. Each box has a concept name which determines the structural role of each element such as package, method, type, …
I think these names can be inferred automatically from a grammar-like specification file. So, there is a possibility to make the tool language-independent in this regards.

However, there were subtle issues in iXj in using concept names for arguments of a method. In iXj, arguments are numbered respectively. But, I think it might be better to use the formal parameter names for the concept names as they give the developer a better clue.

Although iXj has been developed with a high emphasis on usability issues, there are still some opportunities to improve the ease of its use. For example, as it’s mentioned in the dissertation, users are not always sure whether they have used the right number of wild-cards. Users should have a way to find out if their pattern is general enough. One way to remedy this problem is to let the developer refine the pattern by example. That is, to let the developer add or remove matching instances so that the tool can automatically make the pattern more general or more specific accordingly.

In iXj, the developer can use demarcation for capturing groups in regular expressions. Demarcating in regular expression languages is provided by using “\( … \)”. These captured groups don’t have any specific name and they can be referenced according to their placement in the pattern (“\1”, “\2”, …). I think one of the reasons that makes regular expression hard to understand is the use of group capturing. Group capturing clutters the expression and the developer should look up the relative position of the group inside the pattern to refer to it. My suggestion is to let the developer draw boxes with names instead of using a special syntax. Boxes will make the pattern more visual and it conforms better with the whole model of iXj.

At the end, there are some minor issues that can be fixed easily. Effective use of coloring is an example of such improvements.

September 18, 2007

Refactoring Using Type Constraints

Filed under: refactoring — mohsenvakilian @ 7:02 pm

Several refactorings of Eclipse are based on the results of this paper by Frank Tip.

Although the main idea behind the paper is simple, it definitely takes a lot of effort to apply them to all subtleties and various combinations of Java language elements. I think the implementation and wide use of his results is the best evaluation for their work. The good point is that he has extended his work to include various complex uses of Java generics.

The bad point about the paper might be that it is not known whether he has considered all the constructs of Java or has ignored some. For example, it is not easy to find out whether or not he supports rarely used constructs such as anonymous classes or the impact of using the final keyword on subclassing. It is not even known if his method supports extract interface or generalize declared type refactorings on generics or not. Maybe, he has talked about the limitations of his work in other papers.

An idea that came to mind while reading the paper is whether or not these refactorings are useful for languages such as Ruby with dynamic typing. As these refactorings just manipulate the class hierarchies and types, it seems that they won’t be useful for dynamic languages. However, by the growth of dynamic languages such as JRuby and Groovy which are compatible with Java, these refactorings might need further consideration.

As, I said before, some of Eclipse’s refactorings are based on the methods introduced in this paper. My question is how other IDE’s such as NetBeans and IntelliJ IDEA have dealt with the issue. Have they used the same concepts? If not, how do their methods compare with the method presented in this paper?

Create a free website or blog at WordPress.com.