Mohsen Vakilian's Blog

May 5, 2010

Should Types be Non-null by Default?

Filed under: language,refactoring — mohsenvakilian @ 2:32 pm

Patrice Chalin et al have studied 700K lines of open source Java code, and found that on average 75% of reference types (other than local variables) are meant to be non-null, based on design intent.

They presented their empirical study for the first time in an ECOOP 2007 paper entitled ”Non-null References by Default in Java: Alleviating the Nullity Annotation Burden“. In this paper, they argued that since most of type declarations are non-null by design intent, the default semantics of Java should be changed to non-null by default. I personally don’t think that it makes sense for Java to make such a backward-incompatible change. But, I do think that a solid refactoring tool for adding non-null types is valuable.

Patrice leads the JmlEclipse project, which is an Eclipse-based verification environment. JmlEclipse extends the Eclipse JDT Core plugin. It supports non-null types by default, and uses JML as its specification notation. They’ve chosen to use the JML notation for their non-null type system until JSR 305 is finalized. But, I’ve heard that JSR 305 is not going to be supported in Java 7, whereas JSR 308 is.

Patrice Chalin et al. published an extended version of their ECOOP 2007 paper in IET Software 2008. The title of their paper is “Reducing the use of nullable types through non-null by default and monotonic non-null“. In this paper, they also report their study on the usage of null. They noticed an interesting use of null that reduces memory usage. They noticed that sometimes developers set the reference to a large object to null to allow the garbage collector to free it. The other category of nullable types are what they call monotonic non-null types. A field of a monotonic type can hold the value of null before it’s initialized. But, once it gets a non-null value, it won’t admit null values anymore. They noticed that approximately 60% of nullable fields were monotonic non-null.

Advertisements

April 14, 2010

Void Safety in Eiffel

Filed under: language — mohsenvakilian @ 12:20 pm

In a QCon 2009 talk, Tony Hoare called the invention of the null reference in 1965 his billion-dollar mistake.

Dereferencing null objects is known as the void safety problem. Bertrand Meyer et al. describe how they’ve added void safety to Eiffel in the paper “Avoid a Void: The eradication of null dereferencing“.

The authors argue that attempts to remove Void completely fail because there are valid uses of it. Moreover, they claim that terminating linked structures seem to be the only case that truly requires void references.

They implement a static type checker that ensures void safety for Eiffel. The main rule that the static type checker enforces is that a call x.f(...) is permitted only if x is statically attached. An attached type does not admit Void as one of its values.

They’ve developed straightforward rules to enforce preservation of the attachment property through assignment and argument passing.

To make the void safety mechanism easier to use, the type checkers infers non-void values in certain contexts. For example, if x is a local variable, the call in if x /= Void then x.f(...) is void-safe. However, if x is not a local variable, the call is no longer guaranteed to be void-safe because of the possibility of side effects in a multithreaded environment. In such cases, we need to use object tests. The object test in the statement if attached x as l then l.f(...) end checks that x is dynamically attached to an object. Then, it binds x it to the local name l. The scope of l in this case is the then clause.

Eiffel has a Check instruction which is a kind of assert statement. The paper suggests to use this instruction to limit the use of explicit object tests. But, one has to write an object test when using the Check instruction. So, I don’t see how the Check instruction replaces object tests. I’d guess that what they mean by limiting object tests is the number of object tests performed at runtime. According to the semantics of the Check instruction, if the compiler proves that the object test always succeeds, we won’t perform the test at runtime.

The paper also proposes stable attributes as a way of reducing the number of object tests. Stable attributes are defined as the following.

A detachable attribute is stable if it does not appear as the target of an assignment whose source is a detachable expression.

The paper mentions that the compiler can check the stability of an attribute by just checking the assignments in the class. I’m not familiar with Eiffel. And, it’s not clear to me how the compiler handles assignments outside the class.

The paper extends the void safety mechanism to generic types. But, they treat arrays specially because elements of an array don’t get initialized when the array is created. So, they introduce a new constructor for array that fills the array by a given initial value.

Finally, the paper reports on the conversion effort for making Eiffel code void-safe. The percentages of lines changed in EiffelBase and EiffelVision were about 11% and 3%, respectively. These figures confirm that making system-level libraries void-safe takes much more effort than user applications.

February 15, 2009

OOPSLA 2008–Annotation Refactoring

Filed under: evolution,refactoring — mohsenvakilian @ 12:27 am

Annotation refactoring is a refactoring by demonstration system to upgrade Java programs to use annotations. The example which the paper focuses on is JUnit. JUnit introduced some annotations in version 4 and annotation refactoring can be used to upgrade tests written using JUnit 3 to use JUnit 4. The authors try not to limit the tool to JUnit by inferring the transformations from a given example of upgrading a class. First, the differences between the two given versions of the class are computed. Then, the transformation is inferred and represented in a domain specific language which can be corrected and refined by the user.

The addressed problem is important and there are still more opportunities to improve the accuracy of the inferencer.  Upgrading programs to JUnit is a simple case of annotation refactoring and handling more complex Java frameworks will be challenging.

OOPSLA 2008–Sound and Extensible Renaming for Java

Filed under: refactoring — mohsenvakilian @ 12:05 am

Sound and extensible renaming addresses the difficulty of coming up with a correct implementation of the rename refactoring in Java. The presenter did a good job of motivating the audience at OOPSLA. At the beginning of his talk, he showed several examples of Java programs causing all major Java IDE’s to fail in performing the rename refactoring correctly. The authors mention two reasons for complexity of the rename refactoring:

  1. addition of new constructs to the Java language
  2. complex name lookup

They introduce a systematic way to ensure that the binding structure of the program is maintained when the refactoring is applied. In addition, they claim their approach is modular and thus easy to be extended to support new constructs of the Java language.
They have implemented the so called inverted lookup functions in JastAdd. By implementing the inverted lookup function corresponding to each lookup function, they claim no corner cases are missed and thus the binding structure is kept unchanged. While performing the rename refactoring, it might be necessary to replace some names by their fully qualified names. Some existing Java IDE’s abort the refactoring in such cases because of too strong preconditions. In their approach, they let the user perform the refactoring by automatically replacing a name by its fully qualified name if necessary.

Their paper demonstrates how tricky it is to implement the rename refactoring correctly. However, I still think there are some easier ways to fix the problem. Most IDE’s such as Eclipse can list all references of a particular variable. This means they can easily build the binding structure of the program. If we compare the binding structures before and after the refactoring we can prevent performing a rename refactoring which changes the binding structure.

The paper addresses an important problem and it might be possible to solve some deficiencies of other refactorings similarly.

January 12, 2009

OOPSLA 2008–Economics-Driven Architecting

Filed under: architecture — mohsenvakilian @ 3:32 pm

As a student volunteer I had to take the responsibility of a tutorial session which turned out to be the CBAM tutorial presented by Rick Kazman and Ipek Ozkaya. CBAM is a quantitative approach to architecture design decision making. CBAM augments ATAM by adding costs and benefits as attributes to be traded off. As an advantage of CBAM, Rick and Ipek mentioned that stakeholders like CBAM as it helps them reach consensus through a rational process.

Having had read the SAIP two years before, I was familiar with CBAM. However, this tutorial refreshed me on the subject of architecture analysis techniques.  Specially, the hands-on approach they took by having the participants use CBAM on an example of city information system was quite effective. They began by introducing several concepts such as quality attribute, value, utility, … . Then, they went through the steps of CBAM on the NASA ECS project.

My comment on CBAM is that this technique looks reasonable once you have reliable data about response measures and effects of architectural strategies on response levels. However, as Rick and Ipek were pointing out throughout the tutorial, uncertainty should be considered in decision making as well. They explained how uncertainty is taken into account in the following iterations of CBAM. But, that looked quite simplistic and insufficient. Uncertainty was not considered as  a central element of architectural decision making. Nevertheless, I think CBAM has already gone too far in trying to quantify costs and benefits and coming up with values for uncertainties requires guidelines, formulas, estimates, voting, … which will make the CBAM quite complicated.

November 24, 2008

OOPSLA 2008–Scala

Filed under: language — mohsenvakilian @ 11:51 pm

Several papers such as Generics of a Higher Kind, The Visitor Pattern as a Reusable, Generic, Type-Safe Component, and Constrained Types for Object-Oriented Languages at OOPSLA were more or less related to Scala. I hadn’t got the chance to educate myself about Scala until I attended the tutorial “Programming in Scala” offered by Bill Venners.

Scala interoperates with Java. By integrating features of both object oriented programming and functional programming it achieves a concise syntax. Scala is statically typed and OOPSLA papers related to Scala mainly dealt with its type system.

Bill went through concrete Scala examples and explained basic Scala syntax. Then, he went on and explained pattern matching in Scala which models algebraic data types in functional langauges. The tutorial was a good introduction to Scala. And, I hope to learn about more advanced features of Scala such as “Abstract Members”, “Extractors”, “Actors and Concurrency”, “Variance Annotation”, “Lower/Upper Type Bounds”

November 17, 2008

OOPSLA 2008–Architecture as Language

Filed under: architecture,dsl — mohsenvakilian @ 1:18 pm

Markus Voelter presented three tutorials on DSL’s at OOPSLA this year. The one I attended was entitled “Architecture as Language“. In this tutorial, he illustrated how textual DSLs can be used as succinct, precise, technology-independent and tool-processable description of a software architecture.

Markus went through building an example DSL and showed how the approach could be applied to product lines. He actually defined the grammar and added some constraints as he discussed the architecture and got a customized editor (using oAW‘s Xtext tool) at the end.

He defined components in his DSL like the following:

component DelayCalculator {

provides IDelayCalculator

requires IInfoScreen

}

Then he changed the grammar to express instances of components and wired instances together by expressing relationships between them. To describe relationships between instances and cardinalities of these relationships, he introduced the well-known concept of ports. Ports led to role-specific interfaces. Also, he added namespaces and extended the language to describe the publish/subscribe relationship between components. The resulting language looked like the following:

namespace datacenter {

component DelayCalculator {

provides aircraft: IAircraftStatus

provides managementConsole: IManagementConsole

requires screens[0..n]: IInfoScreen

publishes flights { publication = onchange }

}

}

namespace mobile {

component InfoScreen {

provides default: IInfoScreen

consumes flights { init = all update = every(60) }

}

}

Describing the architecture in a DSL has the benefit of making clear what the architect means. But, it takes time and effort and usually it’s valuable if some analysis or code generation can be done using the formal description. The oAW tool seems to try to exploit both benefits of an architectural DSL’s. But, I have to further play with the tool to see how it works in action. It’s not clear to me how well it can prevent developers from modifying the generated code.

DSL’s seem to be promising techniques to help customers and developers communicate. And, I’m looking for research opportunities in this field.

October 9, 2008

Identifying Syntactic Differences Between Two Programs

Filed under: evolution — mohsenvakilian @ 1:04 pm

I recently came across an old paper (1991) by Wuu Yang. The paper describes a diff program for C by matching syntax trees. The matching between two trees is defined to be a set of pairs of nodes, one from each tree, such that

  1. two nodes in a pair contain identical symbols,
  2. a node can match at most one node in the other tree, and
  3. the parent-child relationship as well as the order between sibling nodes are respected.

A maximum matching is a matching with the maximum number of pairs. A straight forward dynamic algorithm is proposed to find the maximum matching with the constraints above.

It’s good to make the matching restrictions such that one could come up with a nice algorithm to maximize a parameter. But, respecting the order of siblings in the matching, limits the algorithm in describing changes which involve moving program elements around.

We should keep in mind that the goal of developing a tool for displaying changes, is to assist the developer in understanding the differences. I believe we should pay attention to the vocabulary that developers use in describing their changes to each other. Do they only tell each other which parts of their codes match in two versions? I say no. Programmers use various vocabulary to explain how they have modified their code by talking about things that have moved, renamed, deleted, added, split, …

May 8, 2008

Tracking Code Clones

Filed under: evolution — mohsenvakilian @ 11:37 pm

In a paper presented at ICSE’07, Duala-Ekoko and Robillard introduced their Eclipse plugin for clone tracking.

Their plugin offers two main features:

  1. The ability to identify code clone groups and track them as software evolves.
  2. Enabling the developer to make his/her edits on several copies of code clones simultaneously.

To track the code clones as software evolves, they propose the idea of CRD which is an approximate identifier for a block of code. To be able to locate a CRD in a future version of code, it has to be independent of line numbers. In fact, CRD is an address for a block based on the enclosing constructs of the block.

For the simultaneous editing feature, they use Levenshtein distance to match lines.

The algorithms used seem pretty straight forward and the reason for choosing them seems to be the goal to build a lightweight interactive system for tracking and editing code clones.

April 25, 2008

Change Distilling

Filed under: evolution — mohsenvakilian @ 9:52 pm

The ChangeDistiller is an Eclipse plugin based on an algorithm described in a paper entitled “Change Distilling: Tree Differencing for Fine Grained Source Code Change Extraction“. It’s an improvement over the existing algorithm by Chawathe et al. for change detection in hierarchically structured information.

Chawathe et al. extract source code changes based on tree differencing. The algorithm splits the problem into two tasks:

  • Finding a “good” matching between the nodes of the trees T1 and T2.
  • Finding a minimum “conforming” edit script that transforms T1 into T2, given the computed matching.

Chawathe et al. define two fundamental matching criteria (one for leaves and one for inner nodes) necessary for the algorithm to produce a “good” matching set with which a minimum conforming edit script is achieved. The assumption behind these matching criteria is that:

For each leaf in T1 only one similar leaf in T2 can be found.

The problem is that this assumption fails for source codes. Thus, we don’t get a good matching due to mismatch propagation. Consequently, the resulting edit script won’t be optimum anymore. When the assumption fails, the algorithm applies a postprocessing step. However, Fluri et al. claim that there are a number of tree constellations in which the postprocessing step doesn’t improve the matching.

To solve these shortcomings, several options are proposed:

  1. Using best match instead of first match for leaves
  2. Using bigrams as string similarity matching instead of Levenshtein
  3. Trying out different values for thresholds
  4. Enabling dynamic thresholds
  5. Using inner node similarity weighting

They evaluated each configuration against the benchmark they had collected from ArgoUML, Azureus, and JDT.

In their implementation, they use the Eclipse compare plug-in to extract the methods and attributes that have changed. This prefiltering results lead to smaller trees for comparison. ASTs of both revisions are created using the AST visitor from JDT. And, the intermediate ASTs T1 and T2 are then fed into the change distilling algorithm. The output is a set of basic tree edit operations.

In spite of various improvements they used, the algorithm is still limited in finding the appropriate number of move operations. They tried to make their edit script optimum but not so readable. The script is described in terms of basic AST operations but it could be presented in a more intuitive way.

Next Page »

Blog at WordPress.com.