Mohsen Vakilian's Blog

February 15, 2009

OOPSLA 2008–Annotation Refactoring

Filed under: evolution,refactoring — mohsenvakilian @ 12:27 am

Annotation refactoring is a refactoring by demonstration system to upgrade Java programs to use annotations. The example which the paper focuses on is JUnit. JUnit introduced some annotations in version 4 and annotation refactoring can be used to upgrade tests written using JUnit 3 to use JUnit 4. The authors try not to limit the tool to JUnit by inferring the transformations from a given example of upgrading a class. First, the differences between the two given versions of the class are computed. Then, the transformation is inferred and represented in a domain specific language which can be corrected and refined by the user.

The addressed problem is important and there are still more opportunities to improve the accuracy of the inferencer.  Upgrading programs to JUnit is a simple case of annotation refactoring and handling more complex Java frameworks will be challenging.

October 9, 2008

Identifying Syntactic Differences Between Two Programs

Filed under: evolution — mohsenvakilian @ 1:04 pm

I recently came across an old paper (1991) by Wuu Yang. The paper describes a diff program for C by matching syntax trees. The matching between two trees is defined to be a set of pairs of nodes, one from each tree, such that

  1. two nodes in a pair contain identical symbols,
  2. a node can match at most one node in the other tree, and
  3. the parent-child relationship as well as the order between sibling nodes are respected.

A maximum matching is a matching with the maximum number of pairs. A straight forward dynamic algorithm is proposed to find the maximum matching with the constraints above.

It’s good to make the matching restrictions such that one could come up with a nice algorithm to maximize a parameter. But, respecting the order of siblings in the matching, limits the algorithm in describing changes which involve moving program elements around.

We should keep in mind that the goal of developing a tool for displaying changes, is to assist the developer in understanding the differences. I believe we should pay attention to the vocabulary that developers use in describing their changes to each other. Do they only tell each other which parts of their codes match in two versions? I say no. Programmers use various vocabulary to explain how they have modified their code by talking about things that have moved, renamed, deleted, added, split, …

May 8, 2008

Tracking Code Clones

Filed under: evolution — mohsenvakilian @ 11:37 pm

In a paper presented at ICSE’07, Duala-Ekoko and Robillard introduced their Eclipse plugin for clone tracking.

Their plugin offers two main features:

  1. The ability to identify code clone groups and track them as software evolves.
  2. Enabling the developer to make his/her edits on several copies of code clones simultaneously.

To track the code clones as software evolves, they propose the idea of CRD which is an approximate identifier for a block of code. To be able to locate a CRD in a future version of code, it has to be independent of line numbers. In fact, CRD is an address for a block based on the enclosing constructs of the block.

For the simultaneous editing feature, they use Levenshtein distance to match lines.

The algorithms used seem pretty straight forward and the reason for choosing them seems to be the goal to build a lightweight interactive system for tracking and editing code clones.

April 25, 2008

Change Distilling

Filed under: evolution — mohsenvakilian @ 9:52 pm

The ChangeDistiller is an Eclipse plugin based on an algorithm described in a paper entitled “Change Distilling: Tree Differencing for Fine Grained Source Code Change Extraction“. It’s an improvement over the existing algorithm by Chawathe et al. for change detection in hierarchically structured information.

Chawathe et al. extract source code changes based on tree differencing. The algorithm splits the problem into two tasks:

  • Finding a “good” matching between the nodes of the trees T1 and T2.
  • Finding a minimum “conforming” edit script that transforms T1 into T2, given the computed matching.

Chawathe et al. define two fundamental matching criteria (one for leaves and one for inner nodes) necessary for the algorithm to produce a “good” matching set with which a minimum conforming edit script is achieved. The assumption behind these matching criteria is that:

For each leaf in T1 only one similar leaf in T2 can be found.

The problem is that this assumption fails for source codes. Thus, we don’t get a good matching due to mismatch propagation. Consequently, the resulting edit script won’t be optimum anymore. When the assumption fails, the algorithm applies a postprocessing step. However, Fluri et al. claim that there are a number of tree constellations in which the postprocessing step doesn’t improve the matching.

To solve these shortcomings, several options are proposed:

  1. Using best match instead of first match for leaves
  2. Using bigrams as string similarity matching instead of Levenshtein
  3. Trying out different values for thresholds
  4. Enabling dynamic thresholds
  5. Using inner node similarity weighting

They evaluated each configuration against the benchmark they had collected from ArgoUML, Azureus, and JDT.

In their implementation, they use the Eclipse compare plug-in to extract the methods and attributes that have changed. This prefiltering results lead to smaller trees for comparison. ASTs of both revisions are created using the AST visitor from JDT. And, the intermediate ASTs T1 and T2 are then fed into the change distilling algorithm. The output is a set of basic tree edit operations.

In spite of various improvements they used, the algorithm is still limited in finding the appropriate number of move operations. They tried to make their edit script optimum but not so readable. The script is described in terms of basic AST operations but it could be presented in a more intuitive way.

March 22, 2008

JunGL a Scripting Language for Refactoring

Filed under: evolution,refactoring — mohsenvakilian @ 12:45 pm

JunGL is a hybrid functional-logic language in the tradition of ML and Datalog. The main data structure manipulated by JunGL is a graph representation of the program. Edges can be added to the graph lazily

The following function in JunGL computes the control flow edges emanating from a conditional statement:

let IfStmtCFSucc(node)=
match (node.thenBranch,node.elseBranch) with
| (null, null) -> [DefaultCFSucc(node)]
| (t, null) -> [t; DefaultCFSucc(node)]
| (null, e) -> [DefaultCFSucc(node); e]
| (t, e) -> [t; e] ;;

In JunGL, one can use predicates in functions via a stream comprehension.

{ ?x | P(?x) }

will return a stream of all x that satisfy the predicate P.

Path queries are regular expressions that identify paths in the program graph.

? ==

In the above path query, components between square brackets are conditions on nodes, whereas parent and child match edge labels. The above predicate thus describes a path from a variable occurrence var to its declaration as a method parameter.

They implement Rename Variable and Extract Method refactorings in JunGL for a subset of C#. Some refactoings are complex and require various analysis. So, I didn’t expect their language for describing refactorings to be simple. However, it seems to me that they’ve made the process of defining refactorings easier by using features of both functional programming and logic programming. Actually, there are several systems for defining refactorings out there and one could evaluate them against real programmers and compare them.

March 16, 2008

Jackpot Rule Language

Filed under: evolution,refactoring,technology — mohsenvakilian @ 3:18 pm

Jackpot is a NetBeans module that lets you define custom transformations. Jackpot transformations can be specified in two ways:

  1. Using Jackpot rules matching program segments or
  2. Using Jackpot API to manipulate the AST.

The Jackpot API is still under development. The rule language has been designed by James Gosling. And, in the following we’re going to know more about the Jackpot rule language by explaining two examples.

The first rule, removes all unnecessary casts. In this rule, meta-variables such as $T and $a are used to match various program elements. As you see in this example, type matching facilities are supported in Jackpot as guard expressions after the “::” operator.

($T)$a => $a :: $a instanceof $T;

The second rule transforms a regular for-loop into an enhanced for-loop. This rule uses both meta-lists and built-in guards. The $stmts$ symbol is a meta-list which matches a list of statements. The expression referencedIn($i, $stmts$), is a built-in guard which checks if the meta-variable $i has ever been referenced in the meta-list $stmts$.

for(int $i; $i < $array.length; i++) {
$T $var = ($T)$array[$i];
} =>
for($T $var : $array) {
} :: !referencedIn($i, $stmts$);

Jackpot rules are readable and if somebody needs a more sophisticated transformation he/she can use the Jackpot API to programmatically perform the transformation. Jackpot seems to be suitable for cases when you want to migrate your client code to use a new API. However, it lacks constructs to specify the scope. So, it may not be appropriate for transforming the library itself.
In Jackpot, no flow analysis is done during execution of a rule file. And, in an interview with Tom Ball, the Jackpot team leader, he mentions that Jackpot cannot be used outside of the NetBeans IDE.

March 12, 2008

Continuous Integration in Open Source

Filed under: agile,evolution — mohsenvakilian @ 6:14 pm

Have open source projects increasingly been using continuous integration?

Dirk Riehle et. al. try to answer the above question in a paper entitled Continuous Integration in Open Source Software Development.

They considered 5122 open source projects from the database of the open source analytics firm

To come to the conclusion that continuous integration is being used in open source projects, they expect to see that the average size of code contributions, the individual check-in into a source code repository, would have gone down over the last ten years, and they would expect to see that the average frequency of such check-ins has gone up.

However, their analysis does not validate the hypothesis that open source software developers are practicing continuous integration. Their indicator, the average size of a commit, remained almost constant over the years with no significant trend or pattern.

One might argue that they should have categorized the projects based on the languages or the platforms they use as there might be some continuous integration tools available for one platform but not the other. Nevertheless, I think it gives enough evidence to prove the lack of continuous integration in open source software development.

So, why aren’t open source developers using continuous integration? Is it due to the lack of decent continuous integration tools for open source projects? Aren’t open source developers aware of the advantages of continuous integration? Is continuous integration useful for open source development at all?

The answer is not so clear to me. But, I assume we should give the community more time to adopt new technologies.

At the end, I’d like to emphasize the fact that one cannot conclude that a group is not agile for it’s not doing a particular agile practice. Kent Beck confirms my statement in an interview where he says:

“If somebody understood a bunch of practices and tried to do them, you could do agile development without being agile and it’s a disaster because you’re acting out of harmony with what you really believe when you do that.”

February 6, 2008

Automatic Change Inference

Filed under: evolution,refactoring — mohsenvakilian @ 12:12 am

The research question addressed in this paper is:

“Given two versions of a program, what changes occurred with respect to a particular vocabulary of changes?”

Of course, we are looking for higher level approaches for reporting changes than traditional diff tools which define a change vocabulary in terms of delete, add, and move line.

Miryung Kim et al., presented their approach for change inference in a paper titled “Automatic Inference of Structural Changes for Matching Across Program Versions“.

By the following example you can get a feel of how the output of their tool looks like:

for all x in chart.*Plot.get*Range()
except {chart.MarketPlot.getVerticalRange}
argAppend(x, [ValueAxis)

This change rule, means that all methods that match the chart.*Plot.get*Range() pattern take an additional ValueAxis argument, except the getVerticalRange method in the MarkerPlot class.

As shown in the above example, their approach represents structural changes as a set of high-level change rules, automatically infers likely change rules and determines method-level matches based on the rules.

The set of transformation that they support are (1) refactorings such as rename package/class/field/method, add parameter, … (2) other transformations such as change argument types, change return types, change input argument list, …

They did a good job of printing the change rules in a concise manner. As they mention, the except clause sometimes signals a bug arising from incomplete or inconsistent changes.

I think more types of transformations should be taken into account for better descriptions of changes. Besides, one might try to handle changes due to refactorings by just recording them while they are being performed rather than trying to infer them by comparing the two versions of program.

February 2, 2008

Subclipse Annotate View

Filed under: evolution — mohsenvakilian @ 5:26 pm

The SVN Annotate View is part of the SVN Repository Exploring Perspective and allows you to review the revision history for a specific file, right down to the individual line of code. As illustrated in the figure, The view is comprised of three distinct areas:

  1. The first pane shows a list of all the individuals that have changed this file along with the revision number and the total number of lines changed as part of that revision.Subclipse Annotate View
  2. The Eclipse text viewer showing the contents of the selected file at the head revision.
  3. The History View for the selected file.

The History View contains the list of all changes made to the selected file. For each change, the svn log and the list of all files affected in that commit are shown.

The History View and the view of the file contents are dynamically linked to the revision panel. Left clicking an entry in the revision pane will highlight all of the lines that were modified in the file in the text viewer. In the same way, when a line in the text viewer is selected, the revision author will be highlighted in the revision pane. The History View is automatically updated to provide the full details of each revision selected.

Subclipse Quick Diff

Filed under: evolution — mohsenvakilian @ 5:06 pm

Instead of using a compare editor, which will show changes between 2 or 3 files by showing each file side-by-side, you can enable quick diff support and see the changes within the text editor.

If you enable showing the differences in the overview ruler, you can, at a glance, get an idea of how many changes you have Subclipse Quick Diffmade to a file since your last commit. As shown in the figure, the overview ruler is segmented such that each segment corresponds to a change made by a developer. The darker a segment is the newer the change is. If you hover over a particular segment, the svn log of its corresponding commit will pop up.

Next Page »

Create a free website or blog at