UML vs. Domain-Specific Languages

Software Development Magazine - Project Management, Programming, Software Testing

Scrum Expert - Articles, tools, videos, news and other resources on Agile, Scrum and Kanban

UML vs. Domain-Specific Languages

Mark Dalgarno, Software Acumen, http://www.software-acumen.com/
Matthew Fowler, New Technology / enterprise

Introduction

The software industry has a big problem as it tries to build bigger, more complex, software systems in less time and with less money. With C++ and Java failing to deliver significantly improved developer productivity over their predecessors it's no surprise that around 40% of developers [1] are already using or are planning to use code generation approaches to tackle this problem.

There are by now many cases studies of the successful application of code generation tools and technologies and it is our view that allowing developers to raise the level of abstraction over that supported by general-purpose programming languages is the best bet for development organisations wishing to address the productivity problem noted above. Although there are other approaches to raising the level of abstraction - such as framework evolution, or even programming language evolution, code generation, because it is flexible and fast, has the advantage of being able to adapt to new environments relatively quickly.

In this article, we consider the two most popular starting points for code generation:

UML for program modelling, part of the OMG's Model Driven Architecture (MDA) approach [2], and
Domain-Specific Languages (DSLs), little languages that are created specifically to model some problem domain.

As well as introducing both approaches, our aim is to offer advice on their usefulness for real-world development. We also ask whether they are mutually exclusive or if in some circumstances it can make sense to combine them.

UML and MDA

Experience of using UML as a modelling language is widespread and so using UML to express what is required in a system and generating code from that is acceptable for many organisations.

"Code generation" in UML originally meant a very low level of generation - converting classes on a diagram into classes in C or Java. Experience has shown that this level of modelling does not give any business benefit when applied to complete systems. However, by using more specialised or abstract modelling elements it is possible to increase the amount of generation, as we shall see. This approach was adopted by the OMG in 2001 as part of its MDA standard.

MDA was developed to enable organisations to protect their software investment in the face of continually changing technology "platforms" (languages, operating-systems, interoperability solutions, architecture frameworks etc.). If the design and implementation is tied to the platform, then a platform change means a complete rewrite of a software system.

To avoid this, MDA proposed to separate "the specification of system functionality from the specification of the implementation of that functionality on a specific technology platform". The specification of system functionality is a Platform Independent Model (PIM); the specification on a particular platform is a Platform-Specific Model (PSM). The PSM can be annotated by developers to provide advice or guidance for the final code "text" generation step - which creates the source code and configuration files.

To reap the business benefits of this approach, the PIM must survive platform change and be reusable across platforms. The implications are that:

models become first-class artefacts in the development process, rather than being ignored after a certain point: if you change the PIM, the functionality of the delivered system will change
code generation becomes important: mapping the PIM to the PSM by hand is costly and error prone, whereas automatic mapping to the PSM can significantly reduce the cost of a transition to a new or upgraded platform.

MDA defines a set of standards for transforming models that was finally completed in 2007. These standards are well supported in the telecom and defence sectors, where there is a history of investing in development tools as part of large projects. In the commercial world, the lack of standards led to companies supporting the "model-driven" approach (MDD - development, MDE - engineering etc.) using a variety of tools to transform UML models into working systems - "pragmatic MDA", as it was called.

The industry position of UML also means that developers can choose from a wide variety of vendors for their MDA tooling. Furthermore, vendors typically provide additional products based on the MDA approach, reducing the investment for an individual company to adopt MDA.

However, there are some issues in the use of MDA. First is the expression of detailed business logic. While 90-95% of a commercial information system can be generated from a UML model, there is a point where the business logic is not general and so not amenable to a code generation solution. There are two approaches to expressing the business logic. The "purist" approach is to model the business logic; one of the MDA specifications covers this approach. The "pragmatic" approach is to leave holes in the generated application for the hand-written business logic; this is most popular where there is a rich, standardised development environment, like Java or C#/.NET.

Another issue is the low level of UML and the looseness (or generality, to put a positive slant on it) of its semantics: a common criticism is that UML is too big and vague to be effective. This assumes that the only "code generation" possible is the very low-level code generation described earlier - the assumption is that UML can't express more abstract or specialised concepts.

But this criticism ignores UML's profile feature. "Pragmatic MDA" vendors use this to specialise UML. To do this, they define profiles so developers can create models with a more specialised terminology and associated data. On top of that, vendors add their own validation to tighten up the UML semantics. The result is a domain-specific subset of UML if you like.

Using UML profiles gives as much expressive power as DSLs: stereotyped classes typically equate to the DSL terminology (the 'nouns' - see sidebar) and stereotyped relationships are the same as for relationships in graphical DSL terminology. In other words, either approach can express concepts of arbitrary levels of abstraction.

There are two main problems with using UML with profiles to define new modelling languages:

With current UML tools it is usually hard to remove parts of UML that are not relevant or need to be restricted in a specialised language
All the diagram types have restrictions based on the UML semantics.

For example, NT/e is in the process of building a graphical DSL for a novel middleware product. The key to this is being able to model methods as first-class model elements. In theory we should be able to do this using action diagrams, but in practice there is too much other baggage that drags along with it. As we will see below, the DSLs are built from the ground up, so the modeller is not confronted with extraneous UML semantics or modelling elements.

Despite this, defining a high-level UML profile has historically been the best commercial approach to realising MDA. To produce a new profile is relatively cheap. On the marketing front, the installed base of UML tools and the understanding of the practice and benefits of modelling mean MDA products can be positioned as 'add-ons' rather than a completely new paradigm.

Domain-Specific Languages

Introduction

Although DSLs and DSL tools have been around for a while now it is only in the past few years that interest in this area has really taken off - partly in response to Microsoft's entry into this space with its DSL Tools for Visual Studio.

As noted above DSLs are little languages that can be used to directly model concepts in specific problem domains. These languages can be textual, like most programming languages, or graphical. Underpinning each DSL is a domain-specific code generator that maps domain-specific models created with the DSL into the required code.

One way to think of how to use a (graphical) DSL is to imagine a palette containing the boxes and lines that correspond to key concepts and relationships in your problem domain. Modelling with this palette involves selecting the concepts you wish to model and 'painting' them onto your canvas. 'Painting' different types of lines between these concepts can then create different types of relationships between the concepts. An advantage of the DSL approach is that the modelling environment can constrain and validate a created model for the domain's semantics, something that is not possible with UML profiles.

What about the 'domain' in 'Domain-Specific Language'

The first description of DSL's I (MF) heard said, "you use the concepts from the target industry as your modelling types". I was confused as to how that related to modelling and programming. Thinking about nouns and verbs helped me understand what's going on here. Let me try to explain...

Computer languages have a syntax (e.g. ';' to terminate commands), a general semantics (e.g. expressing conditional clauses, loops) and a way of defining types for your problem domain (e.g. Customer, Order, Product). This is precisely what we do with day-to-day language:

we invent new nouns to describe new types that we encounter
but we don't change the syntax or underlying semantics of English to write about a particular problem industry or technical domain.

Most people now recognise that the syntax is not worth arguing about - most syntax expressions are transformable to each other. It's more interesting to start defining types as extensions in the language:

For a DSL in a technical domain, these types can be things like 'page', 'entity', 'service', which can be mapped onto a specific technology domain;
For an industry-specific DSL, such as for the insurance industry, we have types like 'policy', 'cover' or 'party' etc.

So, first of all a DSL can give you a range of new nouns, drawn from the domain's standard terminology. So far, this can be done without changing the general semantics of the language. But then we get to the "verbs" - the relationships between the nouns. For a domain-specific language, these relationships are the active part of the domain - they're what the players do to each other! The end result is that the nouns and verbs you use to describe industry-specific situations end up forming their own language within a language, as any programmer's spouse will tell you. The DSL approach is a formalisation of this natural process.

The ability to express the relationships between concepts of the domain is what makes a DSL specific, and potentially very powerful. In basic UML - without profiles - the relationships tend to apply generically. For example, a UML association can relate any type of class. But in a DSL, relationships can be constrained - e.g. an insurance policy can only be comprised of fire/theft/damage etc. - not people. Furthermore, industry-specific information about the relationship can be added into the model. This can produce a graphical modelling language that is industry-specific, precise and powerful. For domains with limited variations, the relationships may have enough information needed to generate all the implementation code.

If this makes the whole process sound very simple, it is ... up to a point. The complexities start when you implement those 'nouns' and 'verbs' in a real implementation platform: then you have to use existing components built in another language (as for a software product line for example), or use another layer of code generation. But when these implementation issues are resolved, the DSL approach, particularly graphical DSLs, gives a working presentation of a system that is appealing and understandable to business stakeholders.

Tool Support

Tools to support the definition of DSLs and domain-specific code generators have been around for a while now but have been far less commonly available than MDA-based toolsets, with only one or two vendors offering mature products. Given this, many developers using DSLs have chosen to go down the road of implementing their own generators with varying degrees of success due to the complexity of this type of work.

This is now changing with the increasing availability of tooling to support DSL and generator creation from companies such as MetaCase, Microsoft and as part of the Eclipse Modelling Framework. To some extent these have reduced the skill levels required to create DSLs and domain-specific generators.

Which to use?

Given that both approaches now have momentum behind them in the form of vendor support, successful case studies, and increasing industry awareness the question arises for developers of which approach to adopt (assuming developers are completely open-minded!).

Perhaps the first thing to note is that developers in organisations, or supply-chains, where use of UML or Microsoft technologies is mandated may find it politically difficult to choose a 'competing' approach. Modelling and code generation is just one part of the software life cycle, albeit an important part, and must fit in with the rest of the organisation's tooling and processes.

Similarly, in industry sectors such as real-time systems engineering where intensive work has already been undertaken to support the particular modelling needs and constraints of the sector (e.g. with development of the SysML customisation of UML [3]) developers may not find it cost-effective to create their own unique UML profiles or DSLs that don't take advantage of this prior work.

As noted above, a basic DSL can be produced using UML profiles and this will often be a viable and relatively quick approach for a first-time code generator. However, the baggage that UML brings to the problem can confuse novice modellers; to avoid this, generator developers may choose to directly proceed to building their own DSLs - either with tool support or in a completely bespoke manner.

It's also worth mentioning that in many cases software systems can only be implemented with multiple modelling languages and code generators addressing different aspects of the overall problem. There's nothing to stop developers, who on the whole are a pragmatic bunch, from using a hybrid approach that combines UML with DSLs to create solutions that draw on the strengths of each approach, and indeed this is what some organisations, such as NT/e, have done very successfully.

Conclusion

So what is the outlook for the industry? It's our belief that as a basis for modelling for code generation, UML tools - in their current form - will gradually lose "market share" to DSLs: DSLs can be more direct, appealing and easier to use for a wider range of users. However, UML vendors, with their strong background in code generation approaches, can compete by adding a non-UML modelling and meta-modelling 'surface'. Combined with their tool's existing UML features, this would make an attractive combination product for many companies.

References

[1] http://www.developereye.com/info_view.aspx?id=41184

[2] http://www.omg.org/mda/index.htm

[3] http://www.sysml.org/

[4] http://www.codegeneration.net/tiki-read_article.php?articleId=81.

[5] http://www.codegeneration.net/audio/cgn-episode1.mp3.
Leading experts including the OMG's Andrew Watson, Microsoft's Steve Cook and this article's co-author Matthew Fowler discuss

Related Methods & Tools articles

Related Resources

Click here to view the complete list of archived articles

This article was originally published in the Summer 2008 issue of Methods & Tools

Methods & Tools
is supported by

Software Testing
Magazine

The Scrum Expert