Publications
Below you can find a list of my latest publications. These tie in with my
software page, which lists the open source software implementations associated with these papers. Another list can be found on
researchr.org.
2010
Papers in Refereed Proceedings
- Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth. Pure and Declarative Syntax Definition. Paradise Lost and Regained. In Onward! 2010, ACM, 2010. [draft] [bib] [abstract]
Abstract. Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.
- Lennart C. L. Kats, Eelco Visser. Encapsulating Software Platform Logic by Aspect-Oriented Programming: A Case Study in Using Aspects for Language Portability. In IEEE International Working Conference on Source Code Analysis and Manipulation 2010 (SCAM 2010), IEEE, 2010. [draft] [bib] [abstract]
Abstract. Software platforms such as the Java Virtual Machine or the CLR .NET
virtual machine have their own ecosystem of a core programming
language or instruction set, libraries, and developer community.
Programming languages can target multiple software platforms to
increase interoperability or to boost performance. Introducing a new
compiler backend for a language is the first step towards targeting a
new platform, translating the language to the platform's language or
instruction set. Programs written in modern languages generally make
extensive use of APIs, based on the runtime system of the software
platform, introducing additional portability concerns. They may use
APIs that are implemented by platform-specific libraries. Libraries
may perform platform-specific operations, make direct native calls, or
make assumptions about performance characteristics of operations or
about the file system. This paper proposes to use aspect weaving to
invasively adapt programs and libraries to address such portability
concerns, and identifies four classes of aspects for this purpose. We
evaluate this approach through a case study where we retarget the
Stratego program transformation language towards the Java Virtual
Machine.
- Lennart C. L. Kats, Eelco Visser. The Spoofax Language Workbench. Rules for Declarative Specification of Languages and IDEs. In Object-Oriented Programming, Systems, Languages, and Applications (SPLASH/OOPSLA 2010), ACM, 2010. [draft] [bib] [abstract]
Abstract. Spoofax is a language workbench for efficient, agile development of
textual domain-specific languages with state-of-the-art IDE support.
Spoofax integrates language processing techniques for parser
generation, meta-programming, and IDE development into a single
environment. It uses concise, declarative specifications for
languages and IDE services. In this paper we describe the
architecture of Spoofax and introduce idioms for high-level
specifications of language semantics using rewrite rules, showing how
analyses can be reused for transformations, code generation, and
editor services such as error marking, reference resolving, and
content completion. The implementation of these services is supported
by language-parametric editor service classes that can be dynamically
loaded by the Eclipse IDE, allowing new languages to be developed and
used side-by-side in the same Eclipse environment.
2009
Papers in Refereed Proceedings
- Maartje de Jonge, Emma Nilsson-Nyman, Lennart C. L. Kats, Eelco Visser. Natural and Flexible Error Recovery for Generated Parsers. In Software Language Engineering (SLE 2009), Volume 5969 of Lecture Notes in Computer Science, pages 204—223, Springer, 2010. [pdf] [doi] [bib] [abstract]
Abstract. Parser generators are an indispensable tool for rapid language development.
However, they often fall short of the finesse of a hand-crafted parser, built
with the language semantics in mind. One area where generated parsers have provided
unsatisfactory results is that of error recovery. Good error recovery is both
natural, giving recovery suggestions in line with the intention of the programmer;
and flexible, allowing it to be adapted according to language insights and language
changes. This paper describes a novel approach to error recovery, taking
into account not only the context-free grammar, but also indentation usage. We
base our approach on an extension of the SGLR parser that supports fine-grained
error recovery rules and can be used to parse complex, composed languages. We
take a divide-and-conquer approach to error recovery: using indentation, erroneous
regions of code are identified. These regions constrain the search space
for applying recovery rules, improving performance and ensuring recovery suggestions
local to the error. As a last resort, erroneous regions can be discarded.
Our approach also integrates bridge parsing to provide more accurate suggestions
for indentation-sensitive language constructs such as scopes. We evaluate our approach
by comparison with the JDT Java parser used in Eclipse.
- Lennart C. L. Kats, Maartje de Jonge, Emma Nilsson-Nyman, Eelco Visser. Providing Rapid Feedback in Generated Modular Language Environments. Adding Error Recovery to Scannerless Generalized-LR Parsing. In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2009), pages 445—464, ACM, October 2009. [pdf] [doi] [bib] [presentation] [abstract]
Abstract. Integrated development environments (IDEs) increase programmer
productivity, providing rapid, interactive feedback based on the
syntax and semantics of a language. A heavy burden lies on developers
of new languages to provide adequate IDE support. Code generation
techniques provide a viable, efficient approach to semi-automatically
produce IDE plugins. Key components for the realization of plugins
are the language's grammar and parser. For embedded languages and
language extensions, constituent IDE plugin modules and their grammars
can be combined. Unlike conventional parsing algorithms, scannerless
generalized-LR parsing supports the full set of context-free grammars,
which is closed under composition, and hence can parse language
embeddings and extensions composed from separate grammar modules. To
apply this algorithm in an interactive environment, this paper
introduces a novel error recovery mechanism, which allows it to be
used with files with syntax errors -- common in interactive
editing. Error recovery is vital for providing rapid feedback in case
of syntax errors, as most IDE services depend on the parser -- from
syntax highlighting to semantic analysis and cross-referencing. We
base our approach on the principles of island grammars, and derive
permissive grammars with error recovery productions from normal SDF
grammars. To cope with the added complexity of these grammars, we
adapt the parser to support backtracking. We evaluate the recovery
quality and performance of our approach using a set of composed
languages, based on Java and Stratego.
- Lennart C. L. Kats, Anthony M. Sloane, Eelco Visser. Decorated Attribute Grammars. Attribute Evaluation Meets Strategic Programming. In Compiler Construction
(CC 2009),
Volume 5501 of Lecture Notes in Computer Science, pages 142—157.
Springer, March 2009. [pdf] [doi] [bib] [presentation] [abstract]
Abstract. Attribute grammars are a powerful specification formalism for
tree-based computation, particularly for software language
processing. Various extensions have been proposed to abstract over
common patterns in attribute grammar specifications. These include
various forms of copy rules to support non-local dependencies,
collection attributes, and expressing dependencies that are evaluated
to a fixed point. Rather than implementing extensions natively in an
attribute evaluator, we propose attribute decorators that
describe an abstract evaluation mechanism for attributes, making it
possible to provide such extensions as part of a library of
decorators. Inspired by strategic programming, decorators are
specified using generic traversal operators. To demonstrate their
effectiveness, we describe how to employ decorators in name, type, and
flow analysis.
- Lennart C. L. Kats, Karl T. Kalleberg, Eelco Visser. Domain-Specific Languages for Composable Editor Plugins. In T. Ekman and J. Vinju, editors, Ninth Workshop on Language Descriptions, Tools, and Applications
(LDTA'09), Electronic Notes in Theoretical Computer Science.
Elsevier, March 2009. [pdf] [bib] [presentation] [abstract]
Abstract. Modern IDEs increase developer productivity by incorporating many
different kinds of editor services. These can be purely syntactic,
such as syntax highlighting, code folding, and an outline for
navigation; or they can be based on the language semantics, such as
in-line type error reporting and resolving identifier declarations.
Building all these services from scratch requires both the extensive
knowledge of the sometimes complicated and highly interdependent APIs
and extension mechanisms of an IDE framework, and an in-depth
understanding of the structure and semantics of the targeted language.
This paper describes Spoofax/IMP, a meta-tooling suite that provides
high-level domain-specific languages for describing editor services,
relieving editor developers from much of the framework-specific
programming. Editor services are defined as composable modules of
rules coupled to a modular SDF grammar. The composability provided by
the SGLR parser and the declaratively defined services allows embedded
languages and language extensions to be easily formulated as
additional rules extending an existing language definition. The
service definitions are used to generate Eclipse editor plugins. We
discuss two examples: an editor plugin for WebDSL, a domain-specific
language for web applications, and the embedding of WebDSL in
Stratego, used for expressing the (static) semantic rules of WebDSL.
- Anthony M. Sloane, Lennart C. L. Kats, Eelco Visser. A Pure Object-Oriented Embedding of Attribute Grammars. In T. Ekman and J. Vinju, editors, Ninth Workshop on Language Descriptions, Tools, and Applications
(LDTA'09), Electronic Notes in Theoretical Computer Science.
Elsevier, March 2009. [pdf] [bib] [abstract]
Abstract. Attribute grammars are a powerful specification paradigm for many
language processing tasks, particularly semantic analysis of
programming languages. Recent attribute grammar systems
use dynamic scheduling algorithms to evaluate
attributes by need. In this paper, we show how to remove the need for
a generator, by embedding a dynamic approach in a modern,
object-oriented programming language to implement a small, lightweight
attribute grammar library. The Kiama attribution library has similar
features to current generators, including cached, uncached, circular,
higher-order and parameterised attributes, and implements new
techniques for dynamic extension and variation of attribute equations.
We use the Scala programming language because of its combination of
object-oriented and functional features, support for domain-specific
notations and emphasis on scalability. Unlike generators with
specialised notation, Kiama attribute grammars use standard Scala
notations such as pattern-matching functions for equations and mixins
for composition. A performance analysis shows that our approach is
practical for realistic language processing.
- Francisco Durán, Manuel Roldán, Emilie Balland, and Mark van den Brand and Steven Eker, Karl T. Kalleberg, Lennart C. L. Kats, Pierre-Etienne Moreau, Ruslan Shevchenko, Eelco Visser. The Second Rewrite Engines Competition. Volume 283 of Electronic Notes in Theoretical Computer Science, pages 281—291, Elsevier, May 2009. [pdf] [doi] [bib] [abstract]
Abstract. The 2nd Rewrite Engines Competition (REC) was celebrated as part of
the 7th Workshop on Rewriting Logic and its Applications (WRLA
2008). In this edition of the competition participated five systems,
namely ASF+SDF, Maude, Stratego/XT, Termware, and Tom. We explain here
how the competition was organized and conducted, and present its main
results and conclusions.
Papers In Journals
- Zef Hemel, Lennart C. L. Kats, Danny M. Groenewegen, Eelco Visser. Code Generation by Model Transformation. A Case Study in Transformation Modularity. Software and Systems Modeling, 2009. [doi] [bib] [abstract]
Abstract. The realization of model-driven software development requires
effective techniques for implementing code generators for
domain-specific languages. This paper identifies techniques for
improving separation of concerns in the implementation of generators.
The core technique is code generation by model transformation,
that is, the generation of a structured representation (model) of the
target program instead of plain text. This approach enables the
transformation of code after generation, which in turn enables the
extension of the target language with features that allow better
modularity in code generation rules. The technique can also be applied
to 'internal code generation' for the translation of high-level
extensions of a DSL to lower-level constructs within the same DSL
using model-to-model transformations.
This paper refines our earlier description of code generation by model
transformation with an improved architecture for the composition of
model-to-model normalization rules, solving the problem of combining
type analysis and transformation. Instead of coarse-grained stages
that alternate between normalization and type analysis, we have
developed a new style of type analysis that can be integrated with
normalizing transformations in a fine-grained manner. The
normalization strategy has a simple extension interface and integrates
non-local, context-sensitive transformation rules.
We have applied the techniques in a realistic case study of
domain-specific language engineering, i.e. the code generator for
WebDSL, using Stratego, a high-level transformation language that
integrates model-to-model, model-to-code, and code-to-code
transformations.
2008
Papers in Refereed Proceedings
- Lennart C. L. Kats, Martin Bravenboer, Eelco Visser. Mixing Source and Bytecode. A Case for Compilation by Normalization. In Object-Oriented Programming, Systems, Languages, and Applications
(OOPSLA 2008),
ACM SIGPLAN Notices 43(10), pages 91—108.
ACM, October 2008. [pdf] [doi] [bib] [presentation] [abstract]
Abstract. Language extensions increase programmer productivity by providing
concise, often domain-specific syntax, and support for static
verification of correctness, security, and style constraints.
Language extensions can often be realized through translation to the
base language, supported by preprocessors and extensible
compilers. However, various kinds of extensions require further
adaptation of a base compiler's internal stages and components, for
example to support separate compilation or to make use of low-level
primitives of the platform (e.g., jump instructions or unbalanced
synchronization). To allow for a more loosely coupled approach, we
propose an open compiler model based on normalization steps from a
high-level language to a subset of it, the core language. We developed
such a compiler for a mixed Java and (core) bytecode language, and
evaluate its effectiveness for composition mechanisms such as traits,
as well as statement-level and expression-level language extensions.
- Zef Hemel, Lennart C. L. Kats, Eelco Visser. Code Generation by Model Transformation. A Case Study in Transformation Modularity. In International Conference on Model Transformation
(ICMT 2008),
volume 5063 of Lecture Notes in Computer Science, pages 183—198.
Springer, June 2008. (An updated, extended version was published in 2009 in SoSyM.). [pdf] [doi] [bib] [abstract]
Abstract. The realization of model-driven software development requires
effective techniques for implementing code generators. In this paper,
we present a case study of code generation by model
transformation with Stratego, a high-level transformation language
based on the paradigm of rewrite rules with programmable strategies
that integrates model-to-model, model-to-code, and code-to-code
transformations. The use of concrete object syntax guarantees
syntactic correctness of code patterns, and enables the subsequent
transformation of generated code. The composability of strategies
supports two dimensions of transformation modularity.
Vertical modularity is achieved by designing a generator as a pipeline
of model-to-model transformations that gradually transforms a
high-level input model to an implementation.
Horizontal modularity is achieved by supporting the definition of
plugins which implement all aspects of a language feature.
We discuss the application of these techniques in the implementation
of WebDSL, a domain-specific language for dynamic web applications
with a rich data model.
- Lennart C. L. Kats, Karl T. Kalleberg, Eelco Visser. Generating Editors for Embedded Languages. Integrating SGLR into IMP. In A. Johnstone and J. Vinju, editors, Eighth Workshop on Language Descriptions, Tools, and Applications
(LDTA'08), pages 168—173
April 2008. [pdf] [bib] [abstract]
Abstract. Integrated Development Environments (IDEs) increase productivity by
providing a rich user interface and rapid feedback for a specific
language. Creating an editor for a specific language is not a trivial
undertaking, and is a cumbersome task even when working with an
extensible framework such as Eclipse. A new IBM-guided effort, the
IMP framework, relieves the IDE developer from a significant portion
of the required work by providing various abstractions for this. For
embedded languages, such as embedded regular expressions, SQL queries,
or code generation templates, its LALR parser generator falls short,
however. Scannerless parsing with SGLR enables concise, modular
definition of such languages. In this paper, we present an
integration of SGLR into IMP, demonstrating that a scannerless parser
can be successfully integrated into an IDE. Given an SDF syntax
definition, the sdf2imp tool automatically generates an editor plugin
based on the IMP API, complete with syntax checking, syntax
highlighting, outline view, and code folding. Using declarative
domain-specific languages, these services can be customized, and using
the IMP metatooling framework it can be extended with other features.
- Danny M. Groenewegen, Zef Hemel, Lennart C. L. Kats, Eelco Visser. When Frameworks Let You Down. Platform-Imposed Constraints on the Design and Evolution of Domain-Specific Languages. In Jeff Gray et al, editors, Eighth OOPSLA Workshop on Domain Specific Modeling
(DSM'08), pages 64—66.
October 2008. [pdf] [bib] [abstract]
Abstract. Application frameworks encapsulate domain knowledge in a reusable
library, providing abstractions for a particular domain. As such, they
can form the basis for domain-specific languages, which may offer
notational constructs, static analysis, and optimizations specific for
the domain. Additional abstractions can be incrementally added on top
of a domain-specific, following an inductive approach towards its
design, evolving the language as new domain insights are acquired. A
problem arises when such additions do not align well with the
underlying framework. In this paper, we provide different examples of
this problem and describe scenarios of dealing with it.
- Danny M. Groenewegen, Zef Hemel, Lennart C. L. Kats, Eelco Visser. WebDSL: A Domain-Specific Language for Dynamic Web Applications. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications
(OOPSLA 2008), pages 779—780,
ACM, October 2008. [pdf] [doi] [bib] [abstract]
Abstract. WebDSL is a domain-specific language for the implementation of dynamic
web applications with a rich data model. It consists of a core
language with constructs to define entities, pages and business
logic. Higher-level abstractions, modeling access control and work
flow, are defined in a modular fashion as extensions of the core
language.
2007
- Lennart C. L. Kats. Supporting Language Extension and Separate Compilation by Mixing Java and Bytecode. Master's Thesis INF/SCR-07-02, Institute of Information and Computing Sciences,
Utrecht University.
August 31, 2007. [pdf] [bib] [abstract]
Abstract. Language extensions, such as embedded domain-specific languages, are
often implemented by assimilating (rewriting) the extended language
constructs to the host language. The result can then be compiled by a
standard compiler. This approach is limited by the host language,
which may not be designed with code generation in mind. An example of
this is Java, which provides insufficient protection against name
capture of host language identifiers, and does not provide the same
low-level primitives that exist in the underlying Java Virtual
Machine. For example, it does not provide equivalents for a jump or
jump subroutine instruction, unbalanced synchronization, stack
manipulation, or specification of debugging information. Code
generated from a language that does not match Java's structure can
therefore require inefficient or laborious alternatives instead.
We propose a new open compiler model to provide generators direct
access to the underlying compiled code. With conventional open
compilers, leveraging the bytecode-generating back-end is an intricate
process, requiring adaptations tangled throughout the system. The
result is hard to develop, understand, and maintain. By providing a
mixed source language of Java and the underlying bytecode instruction
language, we can provide access to the back-end at the
source-level. Compiled instructions can be used in place of statements
or expressions, which can aid language extensions but also
applications of separate compilation. For example, it can simplify
aspect weavers by enabling direct composition of source code aspects
with compiled classes, or vice versa. As such, we also introduce a
Java traits compiler that allows operation on classes and traits in
both source and compiled form.