Mohsen Vakilian's Blog

April 25, 2008

Change Distilling

Filed under: evolution — mohsenvakilian @ 9:52 pm

The ChangeDistiller is an Eclipse plugin based on an algorithm described in a paper entitled “Change Distilling: Tree Differencing for Fine Grained Source Code Change Extraction“. It’s an improvement over the existing algorithm by Chawathe et al. for change detection in hierarchically structured information.

Chawathe et al. extract source code changes based on tree differencing. The algorithm splits the problem into two tasks:

  • Finding a “good” matching between the nodes of the trees T1 and T2.
  • Finding a minimum “conforming” edit script that transforms T1 into T2, given the computed matching.

Chawathe et al. define two fundamental matching criteria (one for leaves and one for inner nodes) necessary for the algorithm to produce a “good” matching set with which a minimum conforming edit script is achieved. The assumption behind these matching criteria is that:

For each leaf in T1 only one similar leaf in T2 can be found.

The problem is that this assumption fails for source codes. Thus, we don’t get a good matching due to mismatch propagation. Consequently, the resulting edit script won’t be optimum anymore. When the assumption fails, the algorithm applies a postprocessing step. However, Fluri et al. claim that there are a number of tree constellations in which the postprocessing step doesn’t improve the matching.

To solve these shortcomings, several options are proposed:

  1. Using best match instead of first match for leaves
  2. Using bigrams as string similarity matching instead of Levenshtein
  3. Trying out different values for thresholds
  4. Enabling dynamic thresholds
  5. Using inner node similarity weighting

They evaluated each configuration against the benchmark they had collected from ArgoUML, Azureus, and JDT.

In their implementation, they use the Eclipse compare plug-in to extract the methods and attributes that have changed. This prefiltering results lead to smaller trees for comparison. ASTs of both revisions are created using the AST visitor from JDT. And, the intermediate ASTs T1 and T2 are then fed into the change distilling algorithm. The output is a set of basic tree edit operations.

In spite of various improvements they used, the algorithm is still limited in finding the appropriate number of move operations. They tried to make their edit script optimum but not so readable. The script is described in terms of basic AST operations but it could be presented in a more intuitive way.

Create a free website or blog at WordPress.com.