DataWeave Language Fundamentals

Written By: Joshua Erney

The purpose of this post is to fill in a gap that exists in the Mule community, addressing what Java programmers need to know about DataWeave (DW) to become proficient with the language. The fundamental difference between Java and DW is the programming paradigm under which they operate. Java falls under the object-oriented paradigm and DW falls under the functional paradigm. This post will help explain the functional programming (FP) concepts that DW implements and how they compare to the imperative (OO) concepts that Java programmers are familiar with.

Don’t worry if you don’t know what OO or FP is, or if this sounds intimidating. I will primarily focus on the pragmatic implications of DW’s design as a language, only discussing theory when I think it’s especially beneficial to you as a practitioner. In fact, this is the last time I’ll ever mention FP and OO in this post. I only bring it up because it’s important to know these labels if you decide to learn more on your own in the future (e.g., the search “other languages like DW” probably won’t turn up what you need but “functional programming languages” probably will). The labels are not important in this context but will assist you with understanding how DW might be different from languages you’ve probably used in the past.

I will assume you have cursory experience with the DW language. Namely, I expect that you know how to use map, filter and mapObject, and how to assign variables using %var and create functions using %function. If not, I’d recommend you check out the guides published by MuleSoft on these features.

I’ll start by examining expressions in DW and how they’re different from statements in Java. Then, I will discuss immutable data and pure functions: how they help us reason about code and how they can make some operations a little more cumbersome. Next, I will discuss first-class functions: what they are and how to take advantage of them. Afterwards, I will discuss higher-order functions: what they are, how to use them and how to make your own. Finally, I’ll discuss lambdas and how they make languages with higher-order functions easier to use.

Everything is an Expression

In DW, everything is an expression. An expression is a chunk of code that evaluates and returns something. This is opposed to a statement which can evaluate but not return anything. Here is an example of a statement in Java which doesn’t return anything:
if(x == 0) { y = x; } else { x = 0; y = x; }

We couldn’t slap a z = … in front of the statement and expect it to return anything (ignoring the fact that it’s a syntax error and will not compile). Compare this to how DW implements conditional logic:

0 when x == 0 otherwise 1

This line of code returns something; it’s an expression. I could assign it to a variable:

%var y = 0 when x == 0 otherwise 1

I’m not the language designer but I’d guess this has something to do with why DW doesn’t have a return keyword. It’s not necessary because everything is an expression and therefore it is implied that everything returns. So, you can write functions like this:

%function add2(n) n + 2

Which might look a little weird to those of you coming from a Java background. First, there are no curly braces around the body of the function. Curly braces in DW are reserved for creating simple objects (e.g. {"Hello":"world"}), and are not needed to define the body of the function (unless the body of the function creates an object). Second, there is no return keyword. You might ask yourself, what does a function return in the case it contains multiple nested expressions? In this case, the function returns the result of evaluating the last expression in the body. For example:

%function add2IfOdd(n) (n + 2) when (n mod 2) != 0 otherwise …

The above code will return the result of n + 2 when n is odd. If it’s not odd, DW will keep evaluating what expressions “…” contains until there is nothing left to evaluate. Finally, it will return whatever value was returned by the final expression.

As a final example let’s check out printing to the console in Java:

System.out.println("Hello", world");

This is a statement; it doesn’t return anything. Compare this to printing to the console in DW:

%var onePlusOne = log("the result of 1 + 1 is", 1 + 1)

This will do two things, print “the result of 1 + 1 is – 2″ to the console and return the value 2 from the log function. The variable assignment is not necessary to use log, it’s just put there to illustrate log returns a value that you can use later. If you’re curious about how log works in action, check out my post about debugging DW code.

The lesson here is to remember that every block of code in DW is an expression and therefore returns a value. This is a significant break from the Java-esque paradigm where statements are also available.

Immutable Data and Pure Functions

Immutable data is a feature of DW that might not be immediately apparent to developers coming from a Java background. For example, in the following script:

%dw 1.0 %output application/java %var input = [1,2,3,4,5,6] --- input filter $ > 3 // Output: [4,5,6]

You might think the filter function removed the first three values from the input array and return the modified array. However, since DW has immutable data you cannot remove values from an existing array. Instead, think about how DW is creating a new array without the first three values of the input array. You can see this here:

%dw 1.0 %output application/java %var input = [1,2,3,4,5,6] %var big = input filter $ > 3 --- input //Output: [1,2,3,4,5,6]

For visual learners, Figure 1 below might better explain what’s going on. DW is passing each value through the filter to determine if it makes it into the output array:

This might seem like a “6 or one-half dozen” situation, but the implications are very important to understand. There are two big consequences of immutable data which will affect the design of your DW code vs. your Java code: imperative looping constructs, like for, are gone (they rely on modifying a variable so that the loop terminates) and values will never be modified (this gives us pure functions, more on them later).

The lack of imperative looping constructs is not as disruptive as you might think. The pattern of looping is abstracted away by functions you’ll use daily, like map and filter. However, sometimes map and filter won’t be the right tool for the job. Luckily, reduce is always available. Lastly, recursive function calls (i.e. functions that call themselves), as a warning, situations, where you will need to use reduce or recursion, will become disruptive until you are comfortable with them. Practice makes perfect here. If you need practice with reduce within the context of DW, or you do not yet understand how it works, I’d recommend checking out my blog post on Using reduce in DataWeave.

The fact that data cannot be modified almost always benefits us. It allows us to more easily predict the behavior of code because we don’t need to wonder whether a function will modify a variable that we can pass to it. This language feature gives us a couple of powerful concepts that we can leverage. One is called pure functions. Pure functions are functions that return the same output when given the same input no matter what. The second, is referential transparency, meaning we can substitute any expression with the result of that expression. These concepts give our code a math-like quality when it comes to trying to understand how code works and predicting its behavior. With the mathematical expression (1 + 2) * 3, we can substitute 3 for (1 + 2) to give us the simpler expression 3 * 3. In DW, we can reason about the code in the same way:

%dw 1.0 %output application/java %function add2(n) n + 2 %function multiplyBy3(n) n * 3 --- multiplyBy3(add2(1))

We can confidently substitute the expression add2(1) with the value 3, relieving ourselves of the mental burden of needing to understand how that function works in the context of the larger goal. Now we only need to understand how multiplyBy3(3) works. This idea probably seems trivial in this context; after all, how hard is it to reason about simple arithmetic? However, this idea scales nicely as our DW scripts become more complicated. For example, let’s take a script that calls 10 different functions. We can look at the script at a macro-level, saying that instead of an orchestration of 10 individual functions it’s one huge function. As an alternative, we can also zoom into a micro-level and make the same kinds of assertions against individual functions and expressions. This is the power of pure functions and immutable data.

I’m sure you noticed I said this idea almost always benefits us. We don’t get this predictability of code behavior for nothing. One downside is it forces you to take a new perspective on how your programs work. This takes time to get comfortable with. The other downside is that some cases that are easy to implement in languages like Java, become more tedious in DW. Let’s take the example of incrementing a value in an existing object. In Java, you can do this in a single line of code if you chose not to use a temporary variable:

// The variable m could be representing as {"one": 1, "two": 2} m. put("one", map.get("one") + 1); // Results in {"one": 1, "two": 2}

To achieve a similar result in DW, we need to create a new object from the existing object where all the key-value pairs are the same except the target value:

%dw 1.0 %output application/java %var m = { 'one': 1, 'two': 2 } --- m mapObject ({ $$: $ } unless ('$$' == 'one') otherwise { $$: $ + 1 }) // Output: // { // 'one': 2, // 'two': 2 // }

That’s seven lines of DW to one line in Java (of course we could compress all the DW code into one line, but who wants to read that?). However, this example is an outlier. So, while in this example we are writing more DW code than the equivalent Java program, on average we will end up writing less. This is a huge win because less code means less potential for bugs. In my experience, I’ve found occasional inconveniences are a small cost for the ability to more easily predict how code is supposed to work.

Now that we’ve covered expression, immutable data, and pure functions, you could probably stop here and go about the rest of your Mule career writing DW code that is great. You now know that you need to think about DW code differently than you do Java. Instead of thinking about code in terms of how you can modify existing data, you need to think about code in terms of how to build new data from immutable data. At this point, I’d recommend you take this idea and educate yourself on the internal mechanics of map, filter, and reduce.

First-Class Functions

In DW, functions are first-class citizens. This means that we can use them in the same way we do arrays, objects, and primitives. We can assign them to variables, pass them to functions, return them from functions, add them to arrays, etc. This is what allows for functions like map, filters, and reduce to take other functions as parameters. Here’s a pedagogical example that illustrates some of the ways you can use functions:

%dw 1.0 %output application/java %var input = [1,2,3,4,5,6] %function even(n) (n mod 2) == 0] %var isEven = even %var fnArr = [isEven] --- input filter fnArr[0]($) // Output: [2,4,6]

*Note: the above code shows an error in my Anypoint IDE, but it runs fine.

In the above example, we assign the function even to the variable isEven, put isEven in an array, fnArr, and pass that variable as the second parameter to the filter function. This leverages three pieces of functionality gained from first-class functions. Again, this is an example used to show what can be done with functions when they’re first-class citizens. Please don’t write code like this for your company or clients.

First-class functions are typically a pain-point for programmers with a Java background because they’re unsupported in Java. If you’re a long-time Java programmer, this concept of functions running around without having their hands held by classes may be unfamiliar. If you’ve done any programming in JavaScript (I hope this assumption isn’t too far off, as most programmers in the world have some experience with JS), you’ve probably dealt with first-class functions before:

[1,2,3,4,5].forEach(function(n) { console.log(n + 1); });

In this case, the forEach function is given a function as its sole argument.

You may not find much information about first-class functions and how they’re used in DW (or really any of the distinctly functional concepts of the language, for that matter) but there are plenty of learning materials centered around them for other languages. If you’re looking for another source to learn about first-class functions and higher-order functions (covered in the next section), I’d recommend using a more popular language with ample documentation and blog posts, like JavaScript. JavaScript shares a few important things with DW, like first-class functions and being a dynamic language (more or less). However, JavaScript has mutable data by default, so be aware of this when learning about these concepts through the language. I’m confident that the functional concepts you learn in JavaScript will easily carry over to DW.

Higher-Order Functions

First-class functions enable us to create higher-order functions. A higher-order function (HOF) is a function that takes another function as input, returns a function as output, or both. HOFs are a powerful tool in terms of abstraction, separation of concerns, and code reuse (much like classes).

I like to use the example of counting things in an array to illustrate the type of separation of concerns that HOFs enable. For example, let’s say you have a requirement that asks you to determine the count of numbers in an array greater than 100. We’ll implement this using filter and sizeOf, but keep in mind this is a great use case for reduce as well.

%dw 1.0 %output application/java %input = [98,99,100,101,102] %function countGreaterThan100(arr) sizeOf (arr filter $ > 100) --- countGreaterThan100(input) // Output: 2

Then you get another requirement where you need to get the frequency of which a string, defined by a client, appears in an array. You might be eyeing your countGreaterThan100 function and trying to figure out a way you could make it generalized so that it works for both cases. But you see that it’s counting based off numeric comparisons and now you need to compare strings. Eventually, you get discouraged and implement this function:

%function countOfIdMatches(arr) sizeOf (arr filter $ == flowVar.id)

But notice the similarities:

%function countGreaterThan100(arr) sizeOf (arr filter $ > 100) %function countOfIdMatches(arr) sizeOf (arr filter $ == flowVars.id)

We pass in an array, we’re using filter and sizeOf in the exact same way, and the shape of the two functions is almost identical. With so many similarities, chances are we’ll be able to abstract them out into a separate function. You may have noticed that the only thing that’s meaningfully different is how to filter the array, which is determined by an expression that evaluates to true or false. What if we could pass in that expression? We can. Just wrap it in a function and call it within the counting function with the appropriate arguments. The logic within this function will dictate to the counting function when to increment the counter. Then you could use it to count things dealing with strings, numbers, objects, arrays, etc.:

%function countBy(arr, fn) sizeOf (arr filter fn($)) %function greaterThan100(n) n > 100 %function idMatch(id) id == flowVars.id %var over100 = countBy([98,99,100,101,102], greaterThan100) %var idMatches = countBy(["1", "2", "2", "3", "2"] idMatch)

This is how you can use HOFs to create utility functions that will likely find use across multiple projects. Keep in mind that this kind of abstraction is available to you in Java (for example, sorting arrays by defining a Comparator, and passing that Comparator to a sort method), but requires considerably more boilerplate code.

Lambdas

Lambdas are just a fancy way of saying functions that don’t have a name. they are also called anonymous functions, function literals, and unnamed functions. Here’s what they look like in DW:

((n) -> n + 1)

You define your input parameters in parentheses, add an arrow, then define the body of the function. Finally, you wrap the whole thing in parentheses. In this case, we have a function that takes in a single parameter, n, and returns the result of adding n + 1. You can take in more than one parameter to your function, if necessary:

((n, m) -> n + m)

Lambdas are rarely necessary in the sense that we can get along just fine without them. This is because you can always define a function before you use it. However, lambdas allow us to create functions on the fly. This allows us to more easily work with HOFs without any boilerplate code. If you’ve been using filter, you’ve probably been using lambdas without knowing it:

["foo","bar","foo","foo","bar"] filter $ == "foo"

Here, $ == “foo” is a lambda. You might be thinking to yourself “Yes, I’ve done that before, but the syntax doesn’t match what you said earlier.” You’re correct. What I just showed is a syntactic convenience for this:

["foo","bar","foo","foo","bar"] filter ((str) -> str == "foo")

$ in the previous code just refers to the current value of the iteration. Just keep in mind that $ (and $$ for some functions) is only available for built-in DW functions, not functions that you define.

Let’s see how we’d use the countBy function in our previous section with a lambda instead of a previously defined function:

%function countBy(arr, fn) sizeOf (arr filter fn($)) --- countBy([1,2,3,4,5], ((n) -> n > 3)) // Output: [4,5]

We need to add an input parameter, fn, that will represent our function, which defines when to increment the counter. Then we will replace the conditional section of the Boolean expression with a call to fn, being sure to pass it the current value of the iteration. And that’s it!

Conclusion

In summary, we went over expressions, immutable data, pure functions, first-class functions, higher-order functions, and lambdas. We discussed what each of these are, how they require a change in perspective, and how they work together to enable DW to excel at its given task. Please understand, this high-level change in the way you think about your code. It is not going to happen overnight. But if you’re diligent and work a little bit at it each day, you’ll get it.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

Comments

Gopi says

August 14, 2018 at 11:10 pm

Thanks for the article and comparison between imperative programming and DW functional programming. What are the differences between Java8 lambda and DW and performance considerations?

On a separate note for blog layout, can the “spin the wheel for clarity” be pinned to the top of the page? It is affecting the page layout when the font size is increased. Just a thought.

Written By: Joshua Erney

Everything is an Expression

Immutable Data and Pure Functions

First-Class Functions

Higher-Order Functions

Lambdas

Conclusion

About the Author

More Posts by this Author

Comments

Leave a Reply Cancel reply

About

Solutions

Partners

Popular Links