Written By: Joshua Erney
This post will examine the reduce function in DataWeave (DW) language. It will first illustrate reduce at a high level using a simple example. Then, it will explain how reduce works, what it expects as arguments, and how those arguments need to be constructed. It’ll dive into a more complex, real-world example that illustrates how reduce can be used to make existing map/filter code more efficient. Additionally, it will provide an example of how we can use reduce to create a utility function that separates the concerns for counting the amount of time something occurs in the array. And finally, it will go over how and when to use default values for the input function to reduce, and some additional considerations when using reduce in the workplace. This post will also take the opportunity to highlight a few functional programming concepts like higher-order functions, and immutable data structures. If you’re familiar with these concepts already, feel free to skim over those sections, you won’t miss anything.
reduce is one of those functions in DW that doesn’t seem to get as much love as its companions, map and filter. Most people know about map and filter; use map to create a new array that’s the result of applying a transformation function to each element of the original array, and use filter to create a new array with elements from the original array removed according to a predicate function. When it comes to data transformation, it’s relatively easy to identify use cases for map and filter: mapping fields from one data source to another, or removing elements that shouldn’t go to a certain data source. They’re relatively specific functions (not to mention used all the time in Mule integrations and data sequences in general), so identifying when they should be used is straightforward once you understand what they do. Identifying where reduce comes into play is a little bit more difficult because it is incredibly general, and unfortunately most of the examples out there don’t really paint a great picture of what it’s capable of. Most of us who were curious about reduce in the past have already seen the addition example everywhere:
Typical reduce Examples
%dw 1.0
%output application/java
%var input = [1,2,3,4]
---
input reduce ((curVal, acc = 0) -> acc + curVal)
// Output: 10
If you’re not familiar, this code adds all the numbers in the array. Incredible!
The code is trivial and chances are you’re never going to simply add an array of numbers together in any code, but this example illustrates something important that I looked over for a long time, and maybe you did, too: reduce, like map and filter, takes an array and a function as input, but unlike map and filter, its primary job is to reduce all the elements in the array to a single element, where an element can be a number, string, object or array. In this case, we have an array of numbers that’s reduced into a single number.
Let’s unwrap the mechanics of reduce to make sure we really understand how to use it before moving on. First things first, just like map and filter, reduce is a higher-order function. What’s a higher-order function, you ask? It is a function that takes a function as one of its inputs. reduce takes two parameters, on its left side it takes an array, and on its right side it takes a function that it will use to operate on each value of the array. The left side is trivial, the right side is where things can get confusing. The function passed to reduce needs to take two arguments, the first will represent the current value of the iteration, the second will be an accumulator (which could be anything: a number, object, array, etc). Just like it’s your responsibility to make sure the function passed to filter returns a boolean, it’s your responsibility to make sure the function passed to reduce returns the new value for the accumulator. Let’s look at how the accumulator changes with each step through the input array by using the log function on the example above (Refer to the blog post on Debugging DataWeave Code for more info on how log works). If you’re unclear of how reduce works, log will be your best friend when debugging reduce functions. We will also log the current value of the iteration.
Typical reduce Examples with log
%dw 1.0
%output application/java
%var input = [1,2,3,5]
---
input reduce ((curVal, acc = 0) -> log ("acc = ", acc) + log(
"curVal = ", curVal))
Here’s what the log looks like (formatted for clarity):
acc = 0
curVal = 1
acc = 1
curVal = 2
acc = 3
curVal = 3
acc = 6
curVal = 4
Keep in mind that in the above code, we’re logging acc before it is replaced by the expression acc + curVal. Let’s take that log file and look at pieces of it to see what reduce is doing.
acc = 0
curVal = 1
0 + 1 = 1. What’s the next value for acc? 1!
acc = 1
curVal = 2
1 + 2 = 3. What’s the next value for acc? 3!
acc = 3
curVal = 4
By now you see where this is going.
Let’s make this example a little bit more complicated to illustrate that we can use something more complex than a number for the accumulator. What if we wanted to add all the even numbers together, add all the odd numbers together, and return both? First, we already know we’re going to need a container to hold the two values. Let’s decide now that for this we will use an object with two keys, odd and even. We’ll also create a function, isEven, to help future developers understand our code. We’ll slap on the log now so we can see how the accumulator changes with each iteration.
A More Complex reduce Example
%dw 1.0
%output application/java
%var input = [1,2,3,4,5]
%function isEven (n) n%2 == 0
---
input reduce ((curVal, acc = {odd: 0, even: 0}) -> log("acc = ",
{
odd: (acc.odd + curVal
unless isEven(curVal)
otherwise acc.odd),
even: (acc.even + curVal
when isEven(curVal)
otherwise acc.even
}))
// Output: {odd: 9, even: 6}
Here’s what the log file looks like:
acc = {odd: 1, even: 0}
acc = {odd: 1, even: 2}
acc = {odd: 4, even: 2}
acc = {odd: 4, even: 6}
acc = {odd: 9, even: 6}
Since the array we passed to reduce alternates between odd and even numbers, the function we passed reduce alternates between adding to the odd value and the even value as well. And notice that the function passed to reduce creates a new object to return as the accumulator every time. We’re not modifying the existing accumulator object. Data structures in DW are immutable by design, so we couldn’t modify the accumulator even if we wanted to. Avoiding the modification of an existing object is an important functional programming concept; map and filter work the same way. This might seem confusing at first, but look at it this way: for reduce, the data that you return must be in the same shape as your accumulator. In the first example, our accumulator was a number, so we return a number. In this example, our accumulator was an object with two keys, odd and even, so we return an object with the keys odd and even.
Above are just pedagogical examples, though. How might we use reduce in the work place? A typical use case is to count the number of times something occurs (why “something” was italicized will be revealed later). Say we receive an array of payment transactions from a data source, and we want to know how many of these transactions were over a certain threshold, say, $100.00, and we want a list of all the merchants that charged us over $100.00, with no duplicates. The requirements dictate that this must all be in a single object. Here’s how we might do that without reduce:
Real-World Example without reduce
% dw 1.0
%output applications/java
%var input = [
{
"merchant" : "Casita Azul",
"amount" : 51.70
},
{
"merchant" : "High Wire Airlines",
"amount" : 378.80
},
{
"merchant" : "Generic Fancy Hotel Chain",
"amount" : 555.33
},
{
"merchant" : "High Wire Airlines",
"amount" : 288.88
}
]
%var threshold = 100
%function overThreshold(n) n > threshold
%var transactionsOverThreshold = input filter overThreshold($)
%var merchants = transactionsOverThreshold map $.merchant distinctBy $
---
{
count: sizeOf transactionsOverThreshold,
merchants: merchants
}
// Output:
// {
// count: 3,
// merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }
This is nice, and does the job quite well for a small input payload. But notice that we need to loop through the input payload once to filter out objects with amounts over the threshold, and then we need to loop through the resulting array again to map the values to get a list of merchant names, and then loop through that resulting array to filter out duplicate merchants. This is expensive! Since this is a real-world example, what if there were 400K records instead of just 4? At this point you might be thinking to yourself “I can just use Java instead, and I will only have to loop through the payload once with a for loop.” Not so fast! Don’t give up on DW just yet. What if we could use a single reduce instead of multiple map/filter combinations? Here’s what that would look like:
Real-World Example with reduce
% dw 1.0
%output applications/java
%var input = ... // Same as above except for 400K instead of 4
%var threshold = 100
%function overThreshold (n) n > threshold
---
input reduce ((curVal, acc = {count: 0, merchants: []}) -> ({
count: acc.count +1,
merchants: acc.merchants + curVal.merchant
unless acc.merchants contains curVal.merchant
otherwise acc.merchants
}) when overThreshold(curVal.amount) otherwise acc)
// Output:
// {
// count: 3,
// merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }
Much better. Now we can deal with everything we need to in one loop over the input payload. Keep this in mind when you’re combining map, filter, and other functions to create a solution: reduce can be used to simplify these multi-step operations and make them more efficient (thanks to Josh Pitzalis and his article ‘How JavaScript’s Reduce method works, when to use it, and some of the cool things it can do’, for this insight. Check it out to see how you can create a pipeline of functions to operate on an element using reduce. It is very cool).
Notice that again we’re never mutating the accumulator, because data structures are immutable in DataWeave. We either pass on the existing accumulator (otherwise acc), or we create a new object and pass that on to the next step of iteration. Also notice that we’ve reduced an array of elements into a single object, and built an array within the object in the process; much more complex than adding a series of integers, but still a completely valid use case for reduce.
Let’s simplify the problem above to illustrate another point. This time, we’ll only get the count of every transaction over $100.00. Counting the number of occurrences that something happens in an array is a very common use case for reduce. It’s so common that we should separate the concern of how to count from the concern of when to increment the counter. Here goes nothing:
Using reduce to Abstract Away Counting
% dw 1.0
%output applications/java
%var input = ... // Same as above
%var threshold = 100
%function countBy(arr, predicate)
arr reduce ((curVal, acc = 0) -> acc + 1
when predicate(curVal)
otherwise acc)
---
{
count: countBy(input, ((obj) -> obj.amount > threshold))
}
// Output:
// {
// count: 3
// }
Now we have a higher-order function, countBy, that takes in an array, and a function that defines exactly under what conditions we should increment the counter. We use that function in another higher-order function, reduce, which deals with the actual iteration and reduction to a single element. How cool is that? Now, with tools like readUrl, we can define the countBy function in a library, throw it into a JAR, and reuse it across all our projects that need it. Very cool.
Using the default values for reduce input function
The examples shown above do not use the default arguments to reduce’s function, $ and $$. I think it’s easier to teach how reduce works by explicitly defining the parameters to the input function, but in some situations, this won’t work, and you’ll need to rely on the defaults. For example, let’s implement the function maxBy using reduce, which will get us the maximum value in an array according to a predicate function that defines what makes one value larger than another.
%function maxBy(arr, fn)
arr reduce ((curVal, acc = 0) ->
curVal when fn(curVal, acc) otherwise acc)
Do you see the problem here? We initialize the accumulator with 0. If we pass in the array [-3,-2,-1], and the function ((curVal, max) -> curVal > max), we’d expect a function called maxBy to return -1, but this one will return 0, a value that’s not even in the array, because curVal > max will return false for every element in the array. Even worse, what if arr wasn’t an array of numbers? We might try to get around this by doing this instead:
%function maxBy(arr, fn)
arr reduce ((curVal, acc = arr[0]) ->
curVal when fn(curVal, acc) otherwise acc)
Which will work just fine, but it will waste the first iteration by comparing the first value to the first value. At this point we might as well avoid getting the value by index and take advantage of the default arguments: $, which is the current value of the iteration, and $$, which is the current value of the accumulator. By default, $$ will be initialized with the first value of the array passed in, and $ will be initialized with the second:
%function maxBy(arr, fn)
arr reduce ($ when fn($, $$) otherwise $$)
So the lambda ($ when fn($, $$) otherwise $$) can be explained as, “Set the accumulator ($$) to the current value ($) when the function fn returns true, otherwise, set the accumulator as the current accumulator.”
Additional Considerations
Before wrapping up there are three things I’d like to point out. First, we’ve seen that we can replace map/filter combinations with reduce, so it follows that we can implement both map and filter in terms of reduce. Here’s filter if you need proof:
%function filter(arr, predicate)
arr reduce ((curVal, acc=[]) ->
acc + curVal when predicate(curVal) otherwise acc
)
This means that there are times when you may try to use reduce where map or filter would be the more specific and appropriate tool to get the job done. Try not to use reduce in these circumstances, and instead reach for the more specific function. The intent of your code will be more obvious, and future developers reading your code will thank you. I’ve found that a good rule of thumb is, if I’m going to use reduce to reduce to an array, chances are my intentions would be clearer using map or filter.
Second, these examples use the variables curVal and acc to denote the current value of the iteration, and the accumulator. I’ve used these names to help illustrate how reduce works. I do not recommend using these names when you write code. Use names that describe what you’re working with. For example, when trying to find the count of transactions over a threshold to generate a report like we did earlier, we might use trans and report instead of curVal and acc.
Third, this is more of general advice for consultants: reduce isn’t a concept that is easily understood by most programmers (I wrote this article for myself to better understand how it works), especially those that come from a Java/C++/C# background where mutable data structures and imperative looping constructs are the name of the game. At ArganoMS3 we have a multitude of clients, some heavily adopting MuleSoft products across their organizations with years of internal MuleSoft expertise, others having no internal expertise and needing just a few small integrations. As consultants, we need to leave these organizations with code that they can change, maintain, and expand on when we’re gone. Get a feel for the organization you’re working with. Do the developers there understand functional concepts? Are most of them long-time Java programmers who’ve never seen reduce in their lives? If you’re dealing with the former, using reduce to lessen the amount of code you need to write to accomplish a task is a good move; others might already understand your code, and if not, they have other people within their organization that can help. If you’re dealing with the latter, you’ll probably cause a fair share of headaches and fist-shakings at the clever code you left behind that the client is now having trouble understanding. Point being, reduce is not for everyone or every organization, and the client needs to come before the code.
Conclusion
In conclusion, we use reduce to break down an array of elements into something smaller, like a single object, a number, a string. But it also does so much more. reduce is an important and highly flexible tool in functional programming, and therefore, in DW as well. Like most highly flexible programming constructs, reduce is a double-edged sword, and can easily be abused, but when you use it right, there’s nothing quite like it.
About the Author
Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.
Really Nice Blog…Informative like anything