Code and Data in the Light of DSLs

Posted by Kai Kreuzer on July 3, 2007
In the keynote at TSSJS, Martin Fowler talked about one of his favorite topics: Domain Specific Languages (DSLs). In his point of view this is all about writing readable code that speaks for itself and doesn't need any further inline documentation to be understood. That general purpose languages such as Java do not fulfill this requirement can be at the vast number of proprietary XML configuration files, each of which is based on his own little DSL, according to Fowler. I agree with him that extensive XML configurations do not help the readability at all and rather result in the infamous XML configuration hell. With the advent of Spring, more and more information is moved from the code to XML files and although the Spring team doesn't see any problem with this, projects like the SpringIDE hint that there is some lack of readability in these files and that a tool-supported representation of the information might be of help.

But didn't we just mix up two different things: Java and XML? Java is used to write code, while XML merely holds some configuration data. Is that really the case? And what is actually the difference between code and data?

Code vs. Data
On machine level, its all just bytes. A memory range can store machine code and/or data, and there is no sign on what the information is supposed to be. It is easy to execute data - buffer overflow exploits demonstrate this daily. Higher level programming languages such as Java introduce a strong separation of code and data in order to prevent such security holes. XML is widely used for configuration data and for the exchange of runtime data (think of web services or message queues). But then there are things like HTML, BPEL, etc. Do these merely hold configuration data or are they actually code? What about a script that is stored inside a database and executed via the Java Scripting API? And what is a rule in jRules? The boundaries between pure code that defines the sequence of the operations and data that influences the decisions in this sequence has become completely blurred. It is a matter of personal taste where configuration ends and code starts - only the power of the syntax might be used here as a guide and as a starting point for discussions.

Use Cases of DSLs
Let's come back to DSLs. Do they now address code or configuration? After aboves elaboration, the answer is both. Although the general discussion of DSLs often refers to DSL-enabling languages like Lisp and Ruby, and by this at the description of code, there are also frameworks like openArchitectureWares xText that help to describe models, i.e. data structures with a DSL. Fowler distinguishes between external and internal DSLs. Not wanting to draw a line here, I am still tempted to say that internal DSLs are usually a better match for code, while external DSLs are a good candidate for data.

DSLs and XML
So will DSLs be a solution to the XML configuration hell? I do not believe so. As I said, on the one hand the public discussion is much more about dynamic languages that are seen as the saviour for many problems - for code. On the other hand, XML languages still spread unmolestedly and there is no end in sight - but that's what XML was about right? Having an XML Schema as your DSL simply gives you a platform independent format with tool support (XML editors and parsers) for all you ever wanted to express. What is just lacking is the readibility that Fowler asks for. But instead of replacing XML and XML Schema with some well-readable DSL, the market seems to strive for a different solution: It regards XML merely as the serialisation format and provides graphical editors on top of these. Now that is readable as a diagram looks so much better than XML code. We just might now end up in an editor hell - you will need a vast set of editors (all with a different user experience of course) just to configure your application. So how will we change this if we just have "vi" available on a remote console? Well, let's hope that the XML format is still human-readable and that the XSD is somewhere near!

Summary
The dilemma is that XML Schemas are the most widespread DSLs, while not offering a good readability. The current discussions on DSLs won't have any impact on this fact - in contrast, in our heterogeneous world XML will continue to spread not only as a configuration or data format, but also to express programmatic logic (e.g. BPEL). Instead of having an easily readable format, more and more graphical editors will become available to ease the work with these XML formats and to make them available to (business-oriented) people that are not at ease with XML.