ArganoMS3 | Cloud Integration Solutions

  • About
    • Why Us
    • Leadership
    • Our Core Values
    • Clients
    • ArganoMS3 PHILIPPINES
  • Services & Expertise
    • API & INTEGRATION
      • Tavros
      • KONG API
      • MuleSoft
      • APIGEE
      • REDHAT FUSE / CAMEL
    • STRATEGY & OPTIMIZATION
      • DEVOPS
      • 24x7x365 Ops Support
    • CLOUD SERVICES
      • AMAZON AWS
      • SALESFORCE
    • ARTIFICIAL INTELLIGENCE
      • RPA
      • BIG DATA
      • IoT
  • Partners
    • MuleSoft
    • AWS
    • UiPath
    • Kong
    • Apigee
    • Red Hat
  • Resources
    • ArganoMS3 Is Talkin’ Nerdy
    • Case Study
    • MUnit Whitepaper
    • MuleSoft Runtime Whitepaper
    • API-Led Architecture Whitepaper
  • Careers
    • Careers at ArganoMS3
    • Company Culture
    • Job Opportunities
    • ArganoMS3 Benefits
    • Internal Base Camp
    • Recruitment Brochure
  • Blog
    • Archive
  • Contact

Written By: Joshua Erney

The purpose of this post is to fill in a gap that exists in the Mule community, addressing what Java programmers need to know about DataWeave (DW) to become proficient with the language. The fundamental difference between Java and DW is the programming paradigm under which they operate. Java falls under the object-oriented paradigm and DW falls under the functional paradigm. This post will help explain the functional programming (FP) concepts that DW implements and how they compare to the imperative (OO) concepts that Java programmers are familiar with.

Don’t worry if you don’t know what OO or FP is, or if this sounds intimidating. I will primarily focus on the pragmatic implications of DW’s design as a language, only discussing theory when I think it’s especially beneficial to you as a practitioner. In fact, this is the last time I’ll ever mention FP and OO in this post. I only bring it up because it’s important to know these labels if you decide to learn more on your own in the future (e.g., the search “other languages like DW” probably won’t turn up what you need but “functional programming languages” probably will). The labels are not important in this context but will assist you with understanding how DW might be different from languages you’ve probably used in the past.

I will assume you have cursory experience with the DW language. Namely, I expect that you know how to use map, filter and mapObject, and how to assign variables using %var and create functions using %function. If not, I’d recommend you check out the guides published by MuleSoft on these features.

I’ll start by examining expressions in DW and how they’re different from statements in Java. Then, I will discuss immutable data and pure functions: how they help us reason about code and how they can make some operations a little more cumbersome. Next, I will discuss first-class functions: what they are and how to take advantage of them. Afterwards, I will discuss higher-order functions: what they are, how to use them and how to make your own. Finally, I’ll discuss lambdas and how they make languages with higher-order functions easier to use.

Everything is an Expression

In DW, everything is an expression. An expression is a chunk of code that evaluates and returns something. This is opposed to a statement which can evaluate but not return anything. Here is an example of a statement in Java which doesn’t return anything:
if(x == 0) {
y = x;
} else {
x = 0;
y = x;
}

We couldn’t slap a z = … in front of the statement and expect it to return anything (ignoring the fact that it’s a syntax error and will not compile). Compare this to how DW implements conditional logic:

0 when x == 0 otherwise 1

This line of code returns something; it’s an expression. I could assign it to a variable:

%var y = 0 when x == 0 otherwise 1

I’m not the language designer but I’d guess this has something to do with why DW doesn’t have a return keyword. It’s not necessary because everything is an expression and therefore it is implied that everything returns. So, you can write functions like this:

%function add2(n)
n + 2

Which might look a little weird to those of you coming from a Java background. First, there are no curly braces around the body of the function. Curly braces in DW are reserved for creating simple objects (e.g. {"Hello":"world"}), and are not needed to define the body of the function (unless the body of the function creates an object). Second, there is no return keyword. You might ask yourself, what does a function return in the case it contains multiple nested expressions? In this case, the function returns the result of evaluating the last expression in the body. For example:

%function add2IfOdd(n)
(n + 2) when (n mod 2) != 0 otherwise …

The above code will return the result of n + 2 when n is odd. If it’s not odd, DW will keep evaluating what expressions “…” contains until there is nothing left to evaluate. Finally, it will return whatever value was returned by the final expression.

As a final example let’s check out printing to the console in Java:

System.out.println("Hello", world");

This is a statement; it doesn’t return anything. Compare this to printing to the console in DW:

%var onePlusOne = log("the result of 1 + 1 is", 1 + 1)

This will do two things, print “the result of 1 + 1 is – 2″ to the console and return the value 2 from the log function. The variable assignment is not necessary to use log, it’s just put there to illustrate log returns a value that you can use later. If you’re curious about how log works in action, check out my post about debugging DW code.

The lesson here is to remember that every block of code in DW is an expression and therefore returns a value. This is a significant break from the Java-esque paradigm where statements are also available.

Immutable Data and Pure Functions

Immutable data is a feature of DW that might not be immediately apparent to developers coming from a Java background. For example, in the following script:

%dw 1.0
%output application/java  

%var input = [1,2,3,4,5,6]
---
input filter $ > 3

// Output: [4,5,6]

You might think the filter function removed the first three values from the input array and return the modified array. However, since DW has immutable data you cannot remove values from an existing array. Instead, think about how DW is creating a new array without the first three values of the input array. You can see this here:

%dw 1.0
%output application/java

%var input = [1,2,3,4,5,6]

%var big = input filter $ > 3
---
input

//Output: [1,2,3,4,5,6]

For visual learners, Figure 1 below might better explain what’s going on. DW is passing each value through the filter to determine if it makes it into the output array:

This might seem like a “6 or one-half dozen” situation, but the implications are very important to understand. There are two big consequences of immutable data which will affect the design of your DW code vs. your Java code: imperative looping constructs, like for, are gone (they rely on modifying a variable so that the loop terminates) and values will never be modified (this gives us pure functions, more on them later).

The lack of imperative looping constructs is not as disruptive as you might think. The pattern of looping is abstracted away by functions you’ll use daily, like map and filter. However, sometimes map and filter won’t be the right tool for the job. Luckily, reduce is always available. Lastly, recursive function calls (i.e. functions that call themselves), as a warning, situations, where you will need to use reduce or recursion, will become disruptive until you are comfortable with them. Practice makes perfect here. If you need practice with reduce within the context of DW, or you do not yet understand how it works, I’d recommend checking out my blog post on Using reduce in DataWeave.

The fact that data cannot be modified almost always benefits us. It allows us to more easily predict the behavior of code because we don’t need to wonder whether a function will modify a variable that we can pass to it. This language feature gives us a couple of powerful concepts that we can leverage. One is called pure functions. Pure functions are functions that return the same output when given the same input no matter what. The second, is referential transparency, meaning we can substitute any expression with the result of that expression. These concepts give our code a math-like quality when it comes to trying to understand how code works and predicting its behavior. With the mathematical expression (1 + 2) * 3, we can substitute 3 for (1 + 2) to give us the simpler expression 3 * 3. In DW, we can reason about the code in the same way:

%dw 1.0
%output application/java

%function add2(n)
n + 2

%function multiplyBy3(n)
n * 3
---
multiplyBy3(add2(1))

We can confidently substitute the expression add2(1) with the value 3, relieving ourselves of the mental burden of needing to understand how that function works in the context of the larger goal. Now we only need to understand how multiplyBy3(3) works. This idea probably seems trivial in this context; after all, how hard is it to reason about simple arithmetic? However, this idea scales nicely as our DW scripts become more complicated. For example, let’s take a script that calls 10 different functions. We can look at the script at a macro-level, saying that instead of an orchestration of 10 individual functions it’s one huge function. As an alternative, we can also zoom into a micro-level and make the same kinds of assertions against individual functions and expressions. This is the power of pure functions and immutable data.

I’m sure you noticed I said this idea almost always benefits us. We don’t get this predictability of code behavior for nothing. One downside is it forces you to take a new perspective on how your programs work. This takes time to get comfortable with. The other downside is that some cases that are easy to implement in languages like Java, become more tedious in DW. Let’s take the example of incrementing a value in an existing object. In Java, you can do this in a single line of code if you chose not to use a temporary variable:

// The variable m could be representing as {"one": 1, "two": 2}
m. put("one", map.get("one") + 1);

// Results in {"one": 1, "two": 2}

To achieve a similar result in DW, we need to create a new object from the existing object where all the key-value pairs are the same except the target value:

%dw 1.0
%output application/java

%var m = {
 'one': 1,
 'two': 2
}
---
m mapObject ({
  $$: $
}
unless ('$$' == 'one') otherwise
{
  $$: $ + 1
})

// Output:
// {
//    'one': 2,
//    'two': 2
// }

That’s seven lines of DW to one line in Java (of course we could compress all the DW code into one line, but who wants to read that?). However, this example is an outlier. So, while in this example we are writing more DW code than the equivalent Java program, on average we will end up writing less. This is a huge win because less code means less potential for bugs. In my experience, I’ve found occasional inconveniences are a small cost for the ability to more easily predict how code is supposed to work.

Now that we’ve covered expression, immutable data, and pure functions, you could probably stop here and go about the rest of your Mule career writing DW code that is great. You now know that you need to think about DW code differently than you do Java. Instead of thinking about code in terms of how you can modify existing data, you need to think about code in terms of how to build new data from immutable data. At this point, I’d recommend you take this idea and educate yourself on the internal mechanics of map, filter, and reduce.

First-Class Functions

In DW, functions are first-class citizens. This means that we can use them in the same way we do arrays, objects, and primitives. We can assign them to variables, pass them to functions, return them from functions, add them to arrays, etc. This is what allows for functions like map, filters, and reduce to take other functions as parameters. Here’s a pedagogical example that illustrates some of the ways you can use functions:

%dw 1.0
%output application/java

%var input = [1,2,3,4,5,6]

%function even(n)
 (n mod 2) == 0]

%var isEven = even
%var fnArr = [isEven]
---
input filter fnArr[0]($)

// Output: [2,4,6]

    *Note: the above code shows an error in my Anypoint IDE, but it runs fine.

In the above example, we assign the function even to the variable isEven, put isEven in an array, fnArr, and pass that variable as the second parameter to the filter function. This leverages three pieces of functionality gained from first-class functions. Again, this is an example used to show what can be done with functions when they’re first-class citizens. Please don’t write code like this for your company or clients.

First-class functions are typically a pain-point for programmers with a Java background because they’re unsupported in Java. If you’re a long-time Java programmer, this concept of functions running around without having their hands held by classes may be unfamiliar. If you’ve done any programming in JavaScript (I hope this assumption isn’t too far off, as most programmers in the world have some experience with JS), you’ve probably dealt with first-class functions before:

[1,2,3,4,5].forEach(function(n) {
console.log(n + 1);
});

In this case, the forEach function is given a function as its sole argument.

You may not find much information about first-class functions and how they’re used in DW (or really any of the distinctly functional concepts of the language, for that matter) but there are plenty of learning materials centered around them for other languages. If you’re looking for another source to learn about first-class functions and higher-order functions (covered in the next section), I’d recommend using a more popular language with ample documentation and blog posts, like JavaScript. JavaScript shares a few important things with DW, like first-class functions and being a dynamic language (more or less). However, JavaScript has mutable data by default, so be aware of this when learning about these concepts through the language. I’m confident that the functional concepts you learn in JavaScript will easily carry over to DW.

Higher-Order Functions

First-class functions enable us to create higher-order functions. A higher-order function (HOF) is a function that takes another function as input, returns a function as output, or both. HOFs are a powerful tool in terms of abstraction, separation of concerns, and code reuse (much like classes).

I like to use the example of counting things in an array to illustrate the type of separation of concerns that HOFs enable. For example, let’s say you have a requirement that asks you to determine the count of numbers in an array greater than 100. We’ll implement this using filter and sizeOf, but keep in mind this is a great use case for reduce as well.

%dw 1.0
%output application/java

%input = [98,99,100,101,102]

%function countGreaterThan100(arr)
  sizeOf (arr filter $ > 100)
---
countGreaterThan100(input)

// Output: 2

Then you get another requirement where you need to get the frequency of which a string, defined by a client, appears in an array. You might be eyeing your countGreaterThan100 function and trying to figure out a way you could make it generalized so that it works for both cases. But you see that it’s counting based off numeric comparisons and now you need to compare strings. Eventually, you get discouraged and implement this function:

%function countOfIdMatches(arr)
  sizeOf (arr filter $ == flowVar.id)

But notice the similarities:

%function countGreaterThan100(arr)
  sizeOf (arr filter $ > 100)

%function countOfIdMatches(arr)
  sizeOf (arr filter $ == flowVars.id)

We pass in an array, we’re using filter and sizeOf in the exact same way, and the shape of the two functions is almost identical. With so many similarities, chances are we’ll be able to abstract them out into a separate function. You may have noticed that the only thing that’s meaningfully different is how to filter the array, which is determined by an expression that evaluates to true or false. What if we could pass in that expression? We can. Just wrap it in a function and call it within the counting function with the appropriate arguments. The logic within this function will dictate to the counting function when to increment the counter. Then you could use it to count things dealing with strings, numbers, objects, arrays, etc.:

%function countBy(arr, fn)
  sizeOf (arr filter fn($))

%function greaterThan100(n)
  n > 100

%function idMatch(id)
  id == flowVars.id

%var over100    = countBy([98,99,100,101,102], greaterThan100)
%var idMatches  = countBy(["1", "2", "2", "3", "2"] idMatch)

This is how you can use HOFs to create utility functions that will likely find use across multiple projects. Keep in mind that this kind of abstraction is available to you in Java (for example, sorting arrays by defining a Comparator, and passing that Comparator to a sort method), but requires considerably more boilerplate code.

Lambdas

Lambdas are just a fancy way of saying functions that don’t have a name. they are also called anonymous functions, function literals, and unnamed functions. Here’s what they look like in DW:

((n) -> n + 1)

You define your input parameters in parentheses, add an arrow, then define the body of the function. Finally, you wrap the whole thing in parentheses. In this case, we have a function that takes in a single parameter, n, and returns the result of adding n + 1. You can take in more than one parameter to your function, if necessary:

((n, m) -> n + m)

Lambdas are rarely necessary in the sense that we can get along just fine without them. This is because you can always define a function before you use it. However, lambdas allow us to create functions on the fly. This allows us to more easily work with HOFs without any boilerplate code. If you’ve been using filter, you’ve probably been using lambdas without knowing it:

["foo","bar","foo","foo","bar"] filter $ == "foo"

Here, $ == “foo” is a lambda. You might be thinking to yourself “Yes, I’ve done that before, but the syntax doesn’t match what you said earlier.” You’re correct. What I just showed is a syntactic convenience for this:

["foo","bar","foo","foo","bar"] filter ((str) -> str == "foo")

$ in the previous code just refers to the current value of the iteration. Just keep in mind that $ (and $$ for some functions) is only available for built-in DW functions, not functions that you define.

Let’s see how we’d use the countBy function in our previous section with a lambda instead of a previously defined function:

%function countBy(arr, fn)
sizeOf (arr filter fn($))
---
countBy([1,2,3,4,5], ((n) -> n > 3))

// Output: [4,5]

We need to add an input parameter, fn, that will represent our function, which defines when to increment the counter. Then we will replace the conditional section of the Boolean expression with a call to fn, being sure to pass it the current value of the iteration. And that’s it!

Conclusion

In summary, we went over expressions, immutable data, pure functions, first-class functions, higher-order functions, and lambdas. We discussed what each of these are, how they require a change in perspective, and how they work together to enable DW to excel at its given task. Please understand, this high-level change in the way you think about your code. It is not going to happen overnight. But if you’re diligent and work a little bit at it each day, you’ll get it.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

More Posts by this Author

  • Tips on Debugging DataWeave Code
  • How to Modify All Values of an Object in DataWeave
  • When Should You Use reduce in DataWeave

Filed Under: Mulesoft Tagged With: Coding, DataWeave, Developer Tips

Written By: Joshua Erney

 
 
This post will examine the reduce function in DataWeave (DW) language. It will first illustrate reduce at a high level using a simple example. Then, it will explain how reduce works, what it expects as arguments, and how those arguments need to be constructed. It’ll dive into a more complex, real-world example that illustrates how reduce can be used to make existing map/filter code more efficient.  Additionally, it will provide an example of how we can use reduce to create a utility function that separates the concerns for counting the amount of time something occurs in the array. And finally, it will go over how and when to use default values for the input function to reduce, and some additional considerations when using reduce in the workplace. This post will also take the opportunity to highlight a few functional programming concepts like higher-order functions, and immutable data structures. If you’re familiar with these concepts already, feel free to skim over those sections, you won’t miss anything.

reduce is one of those functions in DW that doesn’t seem to get as much love as its companions, map and filter. Most people know about map and filter; use map to create a new array that’s the result of applying a transformation function to each element of the original array, and use filter to create a new array with elements from the original array removed according to a predicate function. When it comes to data transformation, it’s relatively easy to identify use cases for map and filter: mapping fields from one data source to another, or removing elements that shouldn’t go to a certain data source. They’re relatively specific functions (not to mention used all the time in Mule integrations and data sequences in general), so identifying when they should be used is straightforward once you understand what they do. Identifying where reduce comes into play is a little bit more difficult because it is incredibly general, and unfortunately most of the examples out there don’t really paint a great picture of what it’s capable of. Most of us who were curious about reduce in the past have already seen the addition example everywhere:

Typical reduce Examples

%dw 1.0
%output application/java
 
%var input = [1,2,3,4]
---
input reduce ((curVal, acc = 0) -> acc + curVal)
 
// Output: 10

If you’re not familiar, this code adds all the numbers in the array. Incredible!

The code is trivial and chances are you’re never going to simply add an array of numbers together in any code, but this example illustrates something important that I looked over for a long time, and maybe you did, too: reduce, like map and filter, takes an array and a function as input, but unlike map and filter, its primary job is to reduce all the elements in the array to a single element, where an element can be a number, string, object or array. In this case, we have an array of numbers that’s reduced into a single number.

Let’s unwrap the mechanics of reduce to make sure we really understand how to use it before moving on. First things first, just like map and filter, reduce is a higher-order function. What’s a higher-order function, you ask? It is a function that takes a function as one of its inputs. reduce takes two parameters, on its left side it takes an array, and on its right side it takes a function that it will use to operate on each value of the array. The left side is trivial, the right side is where things can get confusing. The function passed to reduce needs to take two arguments, the first will represent the current value of the iteration, the second will be an accumulator (which could be anything: a number, object, array, etc). Just like it’s your responsibility to make sure the function passed to filter returns a boolean, it’s your responsibility to make sure the function passed to reduce returns the new value for the accumulator. Let’s look at how the accumulator changes with each step through the input array by using the log function on the example above (Refer to the blog post on Debugging DataWeave Code for more info on how log works). If you’re unclear of how reduce works, log will be your best friend when debugging reduce functions. We will also log the current value of the iteration.

Typical reduce Examples with log

%dw 1.0
%output application/java
 
%var input = [1,2,3,5]
---
input reduce ((curVal, acc = 0) -> log ("acc = ", acc) + log(
"curVal = ", curVal))

Here’s what the log looks like (formatted for clarity):

acc    = 0
curVal = 1
 
acc    = 1
curVal = 2
 
acc    = 3
curVal = 3
 
acc    = 6
curVal = 4

Keep in mind that in the above code, we’re logging acc before it is replaced by the expression acc + curVal. Let’s take that log file and look at pieces of it to see what reduce is doing.

acc    = 0
curVal = 1

0 + 1 = 1. What’s the next value for acc? 1!

acc    = 1
curVal = 2

1 + 2 = 3. What’s the next value for acc? 3!

acc    = 3
curVal = 4

By now you see where this is going.

Let’s make this example a little bit more complicated to illustrate that we can use something more complex than a number for the accumulator. What if we wanted to add all the even numbers together, add all the odd numbers together, and return both? First, we already know we’re going to need a container to hold the two values. Let’s decide now that for this we will use an object with two keys, odd and even. We’ll also create a function, isEven, to help future developers understand our code. We’ll slap on the log now so we can see how the accumulator changes with each iteration.

A More Complex reduce Example

%dw 1.0
%output application/java
 
%var input = [1,2,3,4,5]
 
%function isEven (n) n%2 == 0
---
input reduce ((curVal, acc = {odd: 0, even: 0}) -> log("acc = ",
{
 odd:  (acc.odd + curVal
          unless isEven(curVal)
          otherwise acc.odd),
 even: (acc.even + curVal
          when isEven(curVal)
          otherwise acc.even
}))
 
// Output: {odd: 9, even: 6}

Here’s what the log file looks like:

acc = {odd: 1, even: 0}
acc = {odd: 1, even: 2}
acc = {odd: 4, even: 2}
acc = {odd: 4, even: 6}
acc = {odd: 9, even: 6}

Since the array we passed to reduce alternates between odd and even numbers, the function we passed reduce alternates between adding to the odd value and the even value as well. And notice that the function passed to reduce creates a new object to return as the accumulator every time. We’re not modifying the existing accumulator object. Data structures in DW are immutable by design, so we couldn’t modify the accumulator even if we wanted to. Avoiding the modification of an existing object is an important functional programming concept; map and filter work the same way. This might seem confusing at first, but look at it this way: for reduce, the data that you return must be in the same shape as your accumulator. In the first example, our accumulator was a number, so we return a number. In this example, our accumulator was an object with two keys, odd and even, so we return an object with the keys odd and even.

Above are just pedagogical examples, though. How might we use reduce in the work place? A typical use case is to count the number of times something occurs (why “something” was italicized will be revealed later). Say we receive an array of payment transactions from a data source, and we want to know how many of these transactions were over a certain threshold, say, $100.00, and we want a list of all the merchants that charged us over $100.00, with no duplicates. The requirements dictate that this must all be in a single object. Here’s how we might do that without reduce:

Real-World Example without reduce

% dw 1.0
%output applications/java
 
%var input = [
  {
    "merchant" : "Casita Azul",
    "amount"   : 51.70
  },
  {
    "merchant" : "High Wire Airlines",
    "amount"   : 378.80
  },
  {
    "merchant" : "Generic Fancy Hotel Chain",
    "amount"   : 555.33
  },
  {
    "merchant" : "High Wire Airlines",
    "amount"   : 288.88
  }
]
 
%var threshold = 100
 
%function overThreshold(n) n > threshold
 
%var transactionsOverThreshold = input filter overThreshold($)
 
%var merchants = transactionsOverThreshold map $.merchant distinctBy $
---
{
  count: sizeOf transactionsOverThreshold,
  merchants: merchants
}
// Output:
// {
//   count: 3,
//   merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }

This is nice, and does the job quite well for a small input payload. But notice that we need to loop through the input payload once to filter out objects with amounts over the threshold, and then we need to loop through the resulting array again to map the values to get a list of merchant names, and then loop through that resulting array to filter out duplicate merchants. This is expensive! Since this is a real-world example, what if there were 400K records instead of just 4? At this point you might be thinking to yourself “I can just use Java instead, and I will only have to loop through the payload once with a for loop.” Not so fast! Don’t give up on DW just yet. What if we could use a single reduce instead of multiple map/filter combinations? Here’s what that would look like:

Real-World Example with reduce

% dw 1.0
%output applications/java
 
%var input = ... // Same as above except for 400K instead of 4
 
%var threshold = 100
 
%function overThreshold (n) n > threshold
---
input reduce ((curVal, acc = {count: 0, merchants: []}) -> ({
  count: acc.count +1,
  merchants: acc.merchants + curVal.merchant
              unless acc.merchants contains curVal.merchant
              otherwise acc.merchants
}) when overThreshold(curVal.amount) otherwise acc)
 
// Output:
// {
//   count: 3,
//   merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }

Much better. Now we can deal with everything we need to in one loop over the input payload. Keep this in mind when you’re combining map, filter, and other functions to create a solution: reduce can be used to simplify these multi-step operations and make them more efficient (thanks to Josh Pitzalis and his article ‘How JavaScript’s Reduce method works, when to use it, and some of the cool things it can do’, for this insight. Check it out to see how you can create a pipeline of functions to operate on an element using reduce. It is very cool).

Notice that again we’re never mutating the accumulator, because data structures are immutable in DataWeave. We either pass on the existing accumulator (otherwise acc), or we create a new object and pass that on to the next step of iteration. Also notice that we’ve reduced an array of elements into a single object, and built an array within the object in the process; much more complex than adding a series of integers, but still a completely valid use case for reduce.

Let’s simplify the problem above to illustrate another point. This time, we’ll only get the count of every transaction over $100.00. Counting the number of occurrences that something happens in an array is a very common use case for reduce. It’s so common that we should separate the concern of how to count from the concern of when to increment the counter. Here goes nothing:

Using reduce to Abstract Away Counting

% dw 1.0
%output applications/java
 
%var input = ... // Same as above
 
%var threshold = 100
 
%function countBy(arr, predicate)
  arr reduce ((curVal, acc = 0) -> acc + 1
                                    when predicate(curVal)
                                    otherwise acc)
---
{
  count: countBy(input, ((obj) -> obj.amount > threshold))
}
 
// Output:
// {
//   count: 3
// }

Now we have a higher-order function, countBy, that takes in an array, and a function that defines exactly under what conditions we should increment the counter. We use that function in another higher-order function, reduce, which deals with the actual iteration and reduction to a single element. How cool is that? Now, with tools like readUrl, we can define the countBy function in a library, throw it into a JAR, and reuse it across all our projects that need it. Very cool.

Using the default values for reduce input function

The examples shown above do not use the default arguments to reduce’s function, $ and $$. I think it’s easier to teach how reduce works by explicitly defining the parameters to the input function, but in some situations, this won’t work, and you’ll need to rely on the defaults. For example, let’s implement the function maxBy using reduce, which will get us the maximum value in an array according to a predicate function that defines what makes one value larger than another.

%function maxBy(arr, fn)
  arr reduce ((curVal, acc = 0) ->
    curVal when fn(curVal, acc) otherwise acc)

Do you see the problem here? We initialize the accumulator with 0. If we pass in the array [-3,-2,-1], and the function ((curVal, max) -> curVal > max), we’d expect a function called maxBy to return -1, but this one will return 0, a value that’s not even in the array, because curVal > max will return false for every element in the array. Even worse, what if arr wasn’t an array of numbers? We might try to get around this by doing this instead:

%function maxBy(arr, fn)
  arr reduce ((curVal, acc = arr[0]) ->
    curVal when fn(curVal, acc) otherwise acc)

Which will work just fine, but it will waste the first iteration by comparing the first value to the first value. At this point we might as well avoid getting the value by index and take advantage of the default arguments: $, which is the current value of the iteration, and $$, which is the current value of the accumulator. By default, $$ will be initialized with the first value of the array passed in, and $ will be initialized with the second:

%function maxBy(arr, fn)
  arr reduce ($ when fn($, $$) otherwise $$)

So the lambda ($ when fn($, $$) otherwise $$) can be explained as, “Set the accumulator ($$) to the current value ($) when the function fn returns true, otherwise, set the accumulator as the current accumulator.”

Additional Considerations

Before wrapping up there are three things I’d like to point out. First, we’ve seen that we can replace map/filter combinations with reduce, so it follows that we can implement both map and filter in terms of reduce. Here’s filter if you need proof:

%function filter(arr, predicate)
  arr reduce ((curVal, acc=[]) ->
    acc + curVal when predicate(curVal) otherwise acc
  )

This means that there are times when you may try to use reduce where map or filter would be the more specific and appropriate tool to get the job done. Try not to use reduce in these circumstances, and instead reach for the more specific function. The intent of your code will be more obvious, and future developers reading your code will thank you. I’ve found that a good rule of thumb is, if I’m going to use reduce to reduce to an array, chances are my intentions would be clearer using map or filter.

Second, these examples use the variables curVal and acc to denote the current value of the iteration, and the accumulator. I’ve used these names to help illustrate how reduce works. I do not recommend using these names when you write code. Use names that describe what you’re working with. For example, when trying to find the count of transactions over a threshold to generate a report like we did earlier, we might use trans and report instead of curVal and acc.

Third, this is more of general advice for consultants: reduce isn’t a concept that is easily understood by most programmers (I wrote this article for myself to better understand how it works), especially those that come from a Java/C++/C# background where mutable data structures and imperative looping constructs are the name of the game. At ArganoMS3 we have a multitude of clients, some heavily adopting MuleSoft products across their organizations with years of internal MuleSoft expertise, others having no internal expertise and needing just a few small integrations. As consultants, we need to leave these organizations with code that they can change, maintain, and expand on when we’re gone. Get a feel for the organization you’re working with. Do the developers there understand functional concepts? Are most of them long-time Java programmers who’ve never seen reduce in their lives? If you’re dealing with the former, using reduce to lessen the amount of code you need to write to accomplish a task is a good move; others might already understand your code, and if not, they have other people within their organization that can help. If you’re dealing with the latter, you’ll probably cause a fair share of headaches and fist-shakings at the clever code you left behind that the client is now having trouble understanding. Point being, reduce is not for everyone or every organization, and the client needs to come before the code.

Conclusion

In conclusion, we use reduce to break down an array of elements into something smaller, like a single object, a number, a string. But it also does so much more. reduce is an important and highly flexible tool in functional programming, and therefore, in DW as well. Like most highly flexible programming constructs, reduce is a double-edged sword, and can easily be abused, but when you use it right, there’s nothing quite like it.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

More Articles by this Author

  • Tips on Debugging DataWeave Code
  • How to Modify All Values of an Object in DataWeave

Filed Under: Mulesoft Tagged With: Coding, DataWeave

An Overview

By: Joshua Erney

Recently, I was getting data from a stored procedure that contained a very large amount of whitespace on certain fields but not on others, and, occasionally, those fields were null. I wanted to remove this white space on these fields. Below, I will document the process to get to the final iteration of the code by using more and more functional programming concepts as I go. Here was my first go at it:

First Iteration:

%dw 1.0
%output application/java
 
%function removeWhitespace (v) trim v when v is :string otherwise v
---
payload map {
  field1: removeWhitespace($.field1),
  field2: removeWhitespace($.field2)
...
}

This works. But it has two problems: it obscures the mapping by adding a repetitive function call for EVERY field and, if you don’t want to do every field, you would need to individually identify which fields have extra whitespace and which ones don’t. This is time consuming and potentially impossible given that this could change from row-to-row in the database. So, this solution might work now, but it might not work for every field. Additionally, if things change, it is incredibly brittle. Here’s my second iteration:

Second Iteration

%dw 1.0
%output application/java
 
%function removeWhitespaceFromValues(obj)
  obj mapObject {($$): trim $ when $ is :string otherwise $}
 
%var object = removeWhitespaceFromValues (payload)
---
object map {
  field1: $.field1,
  field2: $.field2
...
}

Awesome! We no longer have to individually identify which fields have a bunch of white space because the function doesn’t care. It will trim every value that is a string, which is exactly what we want. But… could it be better? What if someone wanted to use this code in the future to do the same thing to a JSON object with nested objects and lists? The second iteration of the function will not accomplish this; it will not apply the trim to nested objects and arrays. This next sample uses the match operator, which brings pattern matching to DataWeave. If this is your first exposure to either match or pattern matching, you can find out more about the match operator using these MuleSoft documents , and more about pattern matching in general here. Now let’s take a look at the code.

Third Iteration

%dw 1.0
%output application/java
 
%function removeWhitespace (e)
  e match {
    :array  -> $ map removeWhiteSpaceFromValues($),
    :object -> $ mapObject {($$): removeWhiteSpaceFromValues($)},
    default -> trim $ when $ is :string otherwise $
    }
%var object = removeWhitespaceFromValues( payload )
---
...

Cool. Now we have a function that will remove the white space from every value in an element that is a string, including deeply-nested elements. You might think we’re done. But we can do a lot better using something called higher-order functions. In other words, we’re going to pass a function into our existing function to specify exactly how we want it to work. Check it out (thanks to Leandro Shokida at MuleSoft for his help with this function):

Final Iteration

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%function trimWhitespace(v)
  trim v when v is :string otherwise v
 
%var object = applyToValues(payload, trimWhitespace)
---
...

This effectively makes the act of looping through every value in an element completely generic “applyToValues“. From here, we can define exactly what we want to happen for each value in the element “trimWhitespace“. We’ve effectively separated the concern of targeting every value in an element, with the concern of what to do with that value. What if we wanted to do something other than trim each value in the object? Just change the function you pass in. Maybe you want to trim the value if it’s a string, and increment it if it’s a number. Let’s see what that would look like:

Third Iteration (New Functionality)

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%function trimOrIncrement(v)
  v match {
    :string -> trim v,
    :number -> v + 1,
    default -> v
}
 
%var object = applyToValues(payload, trimOrIncrement)
---
...

Notice the most important thing here, the applyToValues function did not need to change at all. The only thing we changed was the function we passed into it. One last point, we don’t even need to give our function a name; we can create the second argument to applyToValues on the spot using a lambda or anonymous function. Here we will use a lambda to increment the value if it’s a number:

Third Iteration (lambda)

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%var object = applyToValues(payload, ((v) ->
  v + 1 when v is :number otherwise v))
---
...

In Summary

In conclusion, the idea is that DataWeave is flexible enough that you can have a chunk of code that steps through all the values of an element, and that code can be reused across projects (see Mulesoft documentation on readUrl). At the same time, you don’t need to permanently wire-in how each value is modified (i.e. it could be a trim, it could remove all spaces, it could add 1 to every number, etc). This kind of functionality is easily available to you in DataWeave, take advantage!

Josh Erney  has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.  

Filed Under: Mulesoft Tagged With: Coding

MuleSoft’s ESB (Enterprise Service Bus) product suite is offered in three primary forms: licensed On-Premise version, PaaS (Platform as a Service) Anypoint Platform (Cloud Hub), and MuleSoft CE. For the purpose of this post the focus will be a comparative analysis of CloudHub and On-Premise. Pros and cons exist for both products, and each have distinctive features that should be considered during the decision-making process for your organization. In a nutshell, it can be factually stated that some of these features are:

    Features
    CloudHub
    On-Premise
    Technical Details
    Configurable VPC, Whitelisting
    Yes
    Yes
    CloudHub offers a nice interface for you to set up your Virtual Private Cloud, whitelist connections to your Virtual Private Cloud (VPC) and much more. Furthermore, you can set your own in your datacenter, with your network team and proper allocation of time and resources.
    Load Balancing
    Yes
    No
    CloudHub version comes with a load balancing feature that allows configuration of load balancing after you set a VPC in your environment. With the on-premise version, you would need to purchase your load balancer, licenses and hire an expert to set it all up and additionally have this resource work in close collaboration with your Mule resources.
    Hands-Off Server & Host Management
    Yes
    No
    CloudHub (CH) as the name implies is all cloud based and you don’t have to worry about anything (they just hand you the tab). On the other hand, the On-Premise version requires you to have all of your infrastructure set up and ready on your own with your resources (you know how that goes, right?)
    Simple Vertical Scaling of your Applications
    Yes
    No
    CH allows you to add more vCores and workers to a particular application with, literally, a couple of clicks; this is a very cool feature. With On-Prem, on the other hand, you need to buy your server, commission it, install all the prerequisite,s and much more just to add one more server.
    Customizable Hosting Settings
    No
    Yes
    On-Prem does offer the flexibility to set up your load balancing, domains, VPC, and anything you want to do to your environment as you will be building it in your own data center. With CH you get many features, however, they are only customizable to a certain extent and you’re limited by what CH allows you to do in the web interface.
    High Availability Architecture
    Yes
    Yes
    You can implement this in both versions of the product. However, note that for On-Prem you would need to configure your Mule Management Console or Anypoint Runtime Manager and have the necessary servers you want and need for you HA set-up. With CH it’s just a couple of clicks.
    Custom log4j for Logging
    No/Optional
    Yes
    CH only offers limited customization of log messages, where with the On-Prem version you can configure and append your log4j.xml, as well as roll your logs as you please.
    One Click Trivial Mule Runtime Upgrade
    Yes
    No
    Upgrading Mule On-Premise is painful. You must install the new Runtime, move all your projects to the server again, make sure you have the applicable Java version, all environment properties, domains additional dependencies, and more on each of your servers to upgrade. In CH, it is as simple as a dropdown (a couple of clicks, you just have to make sure you test properly in your local environment with the correct runtime).
    Application Properties Management
    Yes
    Yes
    You can have externalized properties files in both versions of the product. One cool aspect of CH is that each application has a properties tab in which you can specify properties and then restart your application. This allows for very quick and painless property changes that developers tend to make.
    On the Fly Process Kickoff and/or Scheduler Modification
    Yes
    No
    Schedulers (poll components) can be enabled/disabled, changed or kick started immediately during runtime without changing the underlying application pretty neat if you have application integrations that need to fire on a schedule. This feature is not available for On-Premise.
    Automated Alerting System
    Yes
    Yes
    Both versions allow for automated emails when thresholds are not met on your desired infrastructure metrics.
    Clustering Specific Servers
    No
    Yes
    With On-Prem, you can cluster specific servers that you have in your infrastructure and set them accordingly. With CH you can only scale on a per-application basis and not with specific workers (you only supply the amount of vCores and workers you need and the replication and clustering is taken care for you).
    Static IP Assignment
    Yes
    Yes
    Both versions offer simple ways to create static IPs. CH interface, again, is just a couple of clicks but you can do the same with your network team’s help and coordination.

General Pros and Cons of CloudHub versus On-Premise:

Based on the table above, the CloudHub version looks like the no-brainer option, right? However, like anything in this world, really nice features and convenience does not come cheap. The CloudHub option only allows up to 10 applications per vCore and each vCore is expensive. However, from the experience acquired by working with CloudHub over many years, some of the features might be worth paying for (depending on your particular situation). Overall, MuleSoft’s CloudHub, as with many similar cloud-based technologies has tremendously evolved over the last few years, giving customers almost all of the features as if the solutions were implemented in your data center. Among the most useful features implemented on CloudHub are: VPC offering, custom firewall rules, proxy server implementation, and out of the box load balancing (for more details please see these mulesoft runtime manager docs). The only real technical drawbacks are the fact that you would not be able to access local file systems, transfer files among applications, and there are some constraints while using objectstores (in memory DB). There are alternative cloud solutions, like using Amazon queues, S3, and other cloud technologies to mitigate the impact. With that being said, customers usually don’t need a whole lot of extras when facing a particular CloudHub limitation.

As for the On-Premise version of the Mule Runtime, there are some things to consider. In my opinion, the most burdensome is the time and knowledge required to correctly install, maintain, and upgrade the Mule Runtime, in addition to also the version of Java SDK/OpenJDK, operating system and all other components your application stack is going to need. This requirement might not seem like a large burden for the technical or the computer geek types, like most of us in the IT industry, however, from a CIO/executive management and human resources perspective it is a real nightmare and a potential point of failure that should be avoided, especially if all it takes is some dollar bills upfront. Additional to this cost is: the load balancers, firewalls, and all the other network components that might be necessary to make your application compliant with your organizational and departmental standards. Furthermore, as an on-premise installation will take place in your data center you would be responsible to ensure they are done correctly and properly maintained for the entire lifetime of your Enterprise Server Bus (ESB) needs. Having all the needed elements installed and implemented correctly and by qualified individuals is a lot easier said than done for many reasons, but that’s a topic for a later discussion. The point here is that correct installation and best practices will involve a whole lot more than just a couple of clicks and unzipping files.

Have You Ever Seen a Hybrid Unicorn?

With the rise of services such as Amazon EC2 web services platform and Windows Azure, among many others, there are “hybrid options” that some organizations have been able to successfully leverage. In my own, very subjective opinion, the most prominent and common option is to install a Mule Runtime with an on-premise license in a cloud-based Linux server such as those offered by Amazon Web Services. With this “hybrid solution” you avoid some of the limitations by both the CloudHub and the on-premise installations, but not all of them.

With the hybrid solution, you eliminate the constraint of only being able to deploy up to 10 applications per vCore. Additionally, hand installing an on-premise Mule Runtime to a cloud-hosted server can avoid several of the drawbacks mentioned above such as need for load balancing, firewall, proxy servers, API gateways, and all the other disadvantages of a physical hosting server (services like Amazon offer these solutions out of the box at very affordable pricing). However, a hybrid solution still does require qualified resources to install the Mule Runtime, Java, and all other components that your particular solution would need, but in my honest opinion, this is not a bad middle ground solution if you are required to go this route.

Myths and Pitfalls:

  1. CloudHub does not offer the necessary security features our executives and upper management would like.

    False. Today, with cybersecurity and data protection being a major concern for most companies, CloudHub has features that enable its customers to implement a Virtual Private Cloud, API proxy layers, Secure Socket Layers, API security protocols using LDAP, Oath2, etc., static IP custom firewall rules, secure properties placeholders, and many more components and features that can keep your application stack and your data very secured. Additionally, CloudHub is widely used in the banking, financial, and healthcare industries successfully where sensitive data is sent back and forth for countless transactions.

  2. We can get all needed on-premise components easily installed by our staff.

    Easier said than done. Even though on average most IT teams are fairly competent at ESB installation, testing does require certain levels of expertise that are not all that common. A Mule DevOps Engineer must be really familiar with the intricacies of the Mule server, Linux, HTTP ports, networking, firewall, mule startup properties and sometimes many other technologies like Maven and Artifactory in order to install, maintain, monitor, and upgrade a Mule environment.

  3. An enterprise ESB solution can be done inexpensively.

    Regardless of what ESB technology you use, and whether you decide on a cloud-based technology or an on-premise solution, an ESB implementation requires significant labor, materials, and expertise to be successfully implemented.

  4. Network team will be able to implement what is necessary very quickly and inexpensively for an on-premise solution.

    Probably one of the most common pitfalls. Network problems like port openings, firewall rules, and load balance testing, among many others, are by their own nature difficult and can be tricky to solve. More often than not, it takes a very proficient and knowledgeable Mule resource, in close communication and coordination with a network resource, to detect and resolve a network issue. Wrong resources aiming to solve issues of this nature may actually cause your entire implementation to fail.

  5. With CloudHub we can ensure that our platform will be up 100% of the time and it will be 100% reliable.

    Nothing in life is 100% guaranteed, and neither is CloudHub. Even though it’s a very reliable platform and seldom goes down, the truth is that it has happened in the past and undoubtedly will happen in the future. MuleSoft does have proactive monitoring and alerting as well as outstanding after incident analysis and report to its customers, however, like any other company run by humans, systems are not 100% perfect and things can and do go wrong.

  6. Changes to an application and deployments are a lot easier in CloudHub as compared to On-Premise.

    Solid best practices and a release management policy must be implemented in your team regardless of which implementation you choose. Yes, CloudHub does facilitate some aspects of the deployment process, however, bad practices and rogue deployments of apps can easily cause lots of headaches.

  7. Architecture is much simpler in CloudHub compared to On-Premise.

    False. The architectural advantages of CloudHub rely mostly on the hands-free maintenance and scalability aspects of an application stack. This allows for your architects to focus and shift more towards the architecture of your actual solution application rather than on the infrastructure architecture of your organization.

About the Author

Rene Lucena is an Integration Solutions Architect with Mountain State Software Solutions (ArganoMS3). He graduated from the University of Oklahoma with a Bachelor of Science in Computer Engineering and has since accumulated 14+ years of experience with in IT positions including those in Gaming, Web Development, Research, Hardware Design and Troubleshooting, Analysis, Test Engineering, Sales, Project Coordination, IT Software Procurement, DevOps, SQL Development, IT Governance, SDLC, Test-Driven Development, APIs and Leadership. He also has the honor of being a Mule Certified Developer.

Disclaimer: This article specifically and factually outlines the features of a CloudHub implementation and those features as compared to an On-Premise solution. Despite the definite advantages to using CloudHub, it is important to note that CloudHub is not necessarily the right architecture for every organization. Every organization is different, and the specific needs of your organization will help to determine whether CloudHub or On-Prem is right for you. Consultants at ArganoMS3 are fully equipped and trained to assist your organization in choosing the best architecture for your needs and to support you through both CloudHub and On-Prem implementations. For more information about choosing the right solution for you please contact us at contact@ms3-inc.com.

FUTURE PROOF SOFTWARE SOLUTIONS
ArganoMS³ enables organizations to meet today’s software, integration, cloud, and data-related challenges with confidence and ease.

About

  • Why Us
  • Leadership
  • Team
  • Clients
  • We're hiring

Solutions

  • API & Integration
  • Strategy and Optimization
  • Cloud Services
  • Artificial Intelligence

Partners

  • Apigee
  • AWS
  • Kong
  • MuleSoft
  • Red Hat
  • Salesforce
  • UiPath

Popular Links

  • Contact Us
  • Blog
  • White Papers
  • Case Study
COPYRIGHT © 2022 ⬤ ArganoMS³ MOUNTAIN STATE SOFTWARE SOLUTIONS

Copyright © 2022 · MS3 Mountain State Software Solutions · Log in