ArganoMS3 | Cloud Integration Solutions

  • About
    • Why Us
    • Leadership
    • Our Core Values
    • Clients
    • ArganoMS3 PHILIPPINES
  • Services & Expertise
    • API & INTEGRATION
      • Tavros
      • KONG API
      • MuleSoft
      • APIGEE
      • REDHAT FUSE / CAMEL
    • STRATEGY & OPTIMIZATION
      • DEVOPS
      • 24x7x365 Ops Support
    • CLOUD SERVICES
      • AMAZON AWS
      • SALESFORCE
    • ARTIFICIAL INTELLIGENCE
      • RPA
      • BIG DATA
      • IoT
  • Partners
    • MuleSoft
    • AWS
    • UiPath
    • Kong
    • Apigee
    • Red Hat
  • Resources
    • ArganoMS3 Is Talkin’ Nerdy
    • Case Study
    • MUnit Whitepaper
    • MuleSoft Runtime Whitepaper
    • API-Led Architecture Whitepaper
  • Careers
  • Blog
    • Archive
  • Contact

Written By: Joshua Erney
[For original post visit: https://www.jerney.io/mule-style-guide-dataweave/]

In my previous post about Mule programming style, I discussed a couple things: first, why the readability of your code is so important, and second, how having a single flow that describes the overall intent of the code through descriptive doc:name attributes can really improve the readability of your code. In this post, I will discuss how I format my DataWeave code to achieve the same effects.

My Code

Again, I’d think to start with an example of what my DataWeave code typically looks like. Here’s a sample. Don’t worry about understanding what it does.

Again, I’d think to start with an example of what my DataWeave code typically looks like. Here’s a sample. Don’t worry about understanding what it does.

%dw 2.0
output application/json
 
import modules::Utils as utils
 
var invoices    = payload
var allocations = vars.allocations
 
fun formatDate(s: String) =
  s as Date   {format: “MM/dd/yyyy”}
    as String {format: “yyyyMMdd”}
 
fun replaceNullEverywhere(e, replacement=””) =
  utils::applyToValues(e,((v) -> if (v == null) replacement else v)
---
replaceNullEverywhere(
  invoices map ((invoice) -> {
    invoiceId:     invoice.invoice_id,
    invoicenumber: invoice.invoice_number,
    amount:        invoice.total as Number,
    invoiceDate:   formatDate(invoice.date),
    allocations:   allocations[invoice.invoice_id] map ((allocation)
      percentage: allocation.percentage,
      amount:     allocation.amount,
      datePaid:   formatDate(allocation.paid_date)
    })
  })
)

Now let’s talk about what I’m doing and why.

Reassign payload and vars when appropriate

When you’re looking at a script for the first time that someone else wrote, payload doesn’t carry much meaning other than just being the payload of the message. There’s nothing there that cues the reader to what the payload represents, unless metadata has been set up. Reassigning payload to be something more meaningful can go a long way towards helping someone understand the intention of your transformation. For example, in the example above, it’s obvious we’re mapping invoice data. Perhaps that’s obvious from the fields as well, but that won’t always be the case.

Prefer Naming Your Lambda/Callback Arguments Instead of Using the Default Values

You won’t find $, $$, or $$$ too often in my DataWeave code anymore. In my opinion, the saved keystrokes are just not worth the sacrifice in readability. If I do something like:

invoices map {
  Id:        $$,
  invoiceId: $.invoice_id
  ...
}

I will surely know what $ and $$ represent at the time of writing the code, but will I remember in 6 months when I come back to this code, or will I need to look up in the documentation again? What about the junior dev who’s going to maintain this? What if I did this instead:

Invoices map ((invoice, index)-> {
  id:        index,
  invoiceId: invoice.invoice_id
  ...
})

Which one is easier to understand?

Your Mapping Code Should Match the Shape of the Output, When Applicable

If you look at my example code, you’ll see that I intent once for the first map and do an additional indent for the second map. I do this because it mirrors how the output of the script will look:

[
  {
    “invoiceId”: 1,
    “invoiceNumber: “J503KL”,
    “amount”: 3.5
    “invoiceData”: “20181212”,
    “allocations: [
      {
        percentage: 100,
        amount: 100,
        datePaid: “20181222”
      }
   ]
  }
]

Justify Your Code

I’m assuming this will be the most polarizing preference because:

  1. I’ve never seen anyone voluntarily do this in ANY code (except my college professor. Shout out to Dr. Morgan Benton), and
  2. They assume it’s going to take a long time to format code this way

It’s going to take a bigger example to justify why I take the time to do this. Would you rather read this:

invoice map ((invoice, index) -. {
  “insertionCriteria”: createInsertionCriteria(invoice),
  “Invoice Line Number”: index + 1,
  “Business Unit”: buName,
  “Source”: upper source,
  “Invoice Number”: invoice.INVOICE_NUMBER,
  “Invoice Amount”: invoice.INVOICE_AMOUNT,
  “Invoice Date”: formatDate(invoice.INVOICE_DATE),
  “Supplier Name”: configs.supplierName,
  “Supplier Number”: configs.supplierNumber,
  “Supplier Site”: flowVars.configs.SUPPLIER_SITE_CODE,
  “Invoice Description”: trim invoice.INV_DESCRIPTION,
  “Invoice Type”: invoiceType(invoice.INVOICE_AMOUNT),
  “Payment Terms”: configs.PAYMENT_TERM,
  “LineType”: configs.LINE_TYPE,
  “Amount”: invoice.LINE_AMOUNT,
  “Line Description”: invoice.LINE_DESC,
  “Distribution Combination”: buildDistributionCombination(invoice),
  “Terms Date”: null,
  “Goods Received Date”: null,
  “Invoice Received Date”: null,
  “Accounting Date”: null,
  “Payment Method”: null,
  “Pay Group”: null,
  “Pay Alone”: null,
  “Discountable Amount”: null,
  “Prepayment Number”: null,
  “Prepayment Line Number”: null,
  “Prepayment Application Amount”: null,
  “Prepayment Accounting Date”: null,
  “Invoice Includes Prepayment”: null
  “Conversion Rate Type”: null,
  “Conversion Date”: null,
  “Conversion Rate”: null,
  “Liability Combination”: null,
  “Document Category Code”: null
})

Or this:

Invoice map ((invoice, index) -> {
  “Invoice Line Number”:              index + 1,
  “Business Unit”:                    buName,
  “Source”:                           upper source,
  “Invoice Number”:                   invoice.INVOICE_NUMBER,
  “Invoice Amount”:                   invoice.INVOICE_AMOUNT,
  “Invoice Date”:                     formatDate(invoice.INVOICE_DATE),
  “Supplier Name”:                    configs.SUPPLIER_NAME,
  “Supplier Number”:                  configs.SUPPLIER_NUMBER,
  “Supplier Site”:                    configs.SUPPLIER_SITE_CODE,
  “Invoice Description”:              trim invoice.INV_DESCRIPTION,
  “Invoice Type”:                     invoiceType(invoice.INVOICE_AMOUNT),
  “Payment Terms”:                    configs.PAYMENT_TERM,
  “Line Type”:                        configs.LINE_TYPE,
  “Amount”:                           invoice.LINE_AMOUNT,
  “Line Description”:                 invoice.LINE_DESC,
  “Distribution Combination”:         buildDiscriptionCombination(invoice),
  “Terms Date”:                       null,
  “Goods Received Date”:              null,
  “Invoice Received Date”:            null,
  “Accounting Date”:                  null,
  “Payment Methody”:                  null,
  “Pay Group”:                        null,
  “Pay Alone”:                        null,
  “Discountable Amount”:              null,
  “Prepayment Number”:                null,
  “Prepayment Line Number”:           null,
  “Prepayment Application Amount”:    null,
  “Prepayment Accounting Date”:       null,
  “Invoice Includes Prepayment”:      null,
  “Conversion Rate Type”:             null,
  “Conversion Date”:                  null,
  “Conversion Rate”:                  null,
  “Liability Combination”:            null,
  “Document Category Code”:           null,
})

This is a no-brainer for me. One looks like a giant blob of unstructured code, the other looks like a clear set of key-value mappings. You may be thinking “Wow, who has time to structure their code like this?”. Frankly, I did this for years before taking an afternoon to write a script that does it for me: that’s how important I think it is, and how dumb I was for not writing the script earlier. It’s not perfect, but it’s good enough that it saves me a ton of time. You can find it on my GitHub here.

Wrap Generic Functions in a Simple Interface, Even if you Don’t Have to

This one is a bit more complicated. You may have noticed the following in the header of my first example:

Import modules::Utils as utils
...
fun replaceNullEverywhere(e, replacement=””) =
  utils::applyToValue(e, ((v) -> if (v == null) replacement else v)

I could’ve skipped defining a new function and done this, instead:

---
utils::applyToValues(
  invoices map ((invoice) -> {
    ...
  },
  ((v) -> if (v == null) “” else v)
)

But I didn’t. Why?

applyToValues is a very generic function. As its first parameter, it can take a single value. It could be a string, an object an array, or any combination of those and more. Its second parameter is a function with a single argument. This allows the user to pass in any kind of functionality they want to apply to the first argument. So, we now know two things about applyToValues: its first parameter can be just about any value, and its second parameter can be just about any single-argument function.

Wow. That’s a lot of flexibility. The downside to all this flexibility is that the reader needs to understand a lot of context to understand how the function is being used. They need to know what the value passes as the first argument is, and they need to understand how the function passed as the second argument is going to interact with that value to change it. That’s asked a lot of someone reading your code for the first time!

Is there an easy way we can alleviate that mental burden? Yes, and it takes just a few seconds: Give it a more context-specific interface (i.e. wrap it with a function that’s easier to use). By giving applyToValues a wrapper called replaceNullEverywhere, it’s immediately obvious to the read how applyToValues is being used: It’s being used to replace all null values with another value. We also eliminate the need to expose the lambda to the caller, eliminating all the flexibility. At the expense of flexibility, we gain a ton of readability.

In software development, there’s typically a tradeoff between readability and flexibility. The more flexible a tool is, the more time you need spend learning how to use it properly. This is true in a lot of other areas of life, including photography. Think about how easy it is to use your phone camera versus a DSLR. The interface of a camera phone is simple. A DSLR can take fantastic photos in a much wider variety of conditions, but the interface is much more complicated. But you can’t just flip a switch on a DSLR and make it easier to use. However, it is that easy with programming, so take advantage and do a favor for those who will need to read your code in the future!

Above was a short overview of applyToValues. If you’re interested in a deep dive of how this function works and other ways it can be used in your transformations, check out my post about it here.

Conclusion

That’s how I handle formatting and writing my DataWeave code, in a nutshell. Little steps like assigning your payload to a descriptive variable name, naming your callback arguments, separating your functions from your mapping, justifying your code, and giving generic functions a more specific interface can go a long way in helping others understand your code. Don’t feel like you need to apply all these rules at once, or apply all of them at all. Obviously, these are my opinions, but if you can walk away with something that will help your clients and other developers, that’s great! Thanks for reading.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

 More Posts by this Author

  • Data Weave Language Fundamentals
  • Tips on Debugging DataWeave Code
  • How to Modify All Values of an Object in DataWeave
  • When Should You Use reduce in DataWeave

Filed Under: Integration, Mulesoft Tagged With: DataWeave

INC. Magazine Recognizes Mountain State Software Solutions, LLC. (ArganoMS3) As One of its Fastest Growing Companies in the U.S. for the Second Year!

For a 37th year, INC. Magazine has released its annual list of fastest-growing private companies in the United States. Mountain State Software Solutions, LLC. (ArganoMS3) is excited to announce that, for its second year, they have been selected for this prestigious recognition. This list recognizes private businesses and awards them for their continued success in the American Economy.

We are proud to share this honor with successful companies such as Microsoft, Dell, Pandora, Yelp, Zillow and so many other household names who gained early exposure on the Inc. 5000 list. In our second appearance on this exclusive list, ArganoMS3 ranked in the top 3%, at an astounding 73rd for the fastest growing private companies in America. In addition to our impressive overall ranking, we ranked #1 for IT System Development companies and #1 for WV based companies.

Companies that appear on this list show exponential growth year after year. With the ever-evolving nature of the IT field, these results represent a strong trend for companies to engage with strategic, focused integration firms. Our rapid growth and success is all because of our talented team and our distinctive partner-client concentrated business model. Our client base is continuing to grow daily, and we are continually striving to expand our engagements, setting us up for more success in the years to come.

“We are honored at being selected as one of INC500’s fastest growing companies for a 2nd year. We understand that we are among a very small and prestigious group of companies that have accomplished this” stated ArganoMS3 CEO and Founder, Aaron Weikle. “Our growth really shows the hard work and dedication to success we provide for our clients. The relationship that we establish coupled with continued client successes is what puts ArganoMS3 head and shoulders above the competition and this astronomical growth, moving us above 112th and landing us in the top 100, really proves our continued story of premium delivery and happy customers.”

ArganoMS3 is a global IT consulting firm based in the Washington DC metropolitan area that specializes in engineering Future Proof solutions for both commercial and federal customers. Our focus is on business acceleration and providing API driven solutions for today’s most complex integration challenges. With knowledge and experience in everything from MuleSoft Integration, to AWS Cloud Services, to Red Hat for DevOps capabilities and so much more, ArganoMS3 can leverage various innovative technologies to create a custom solution that is right for your business.

Visit the INC. Magazine site for the complete results of the Inc. 5000 & Inc. 500.

Filed Under: APIs, Mulesoft, News, Team Tagged With: Achievements, Awards, INC 500

Written By: Joshua Erney

The purpose of this post is to fill in a gap that exists in the Mule community, addressing what Java programmers need to know about DataWeave (DW) to become proficient with the language. The fundamental difference between Java and DW is the programming paradigm under which they operate. Java falls under the object-oriented paradigm and DW falls under the functional paradigm. This post will help explain the functional programming (FP) concepts that DW implements and how they compare to the imperative (OO) concepts that Java programmers are familiar with.

Don’t worry if you don’t know what OO or FP is, or if this sounds intimidating. I will primarily focus on the pragmatic implications of DW’s design as a language, only discussing theory when I think it’s especially beneficial to you as a practitioner. In fact, this is the last time I’ll ever mention FP and OO in this post. I only bring it up because it’s important to know these labels if you decide to learn more on your own in the future (e.g., the search “other languages like DW” probably won’t turn up what you need but “functional programming languages” probably will). The labels are not important in this context but will assist you with understanding how DW might be different from languages you’ve probably used in the past.

I will assume you have cursory experience with the DW language. Namely, I expect that you know how to use map, filter and mapObject, and how to assign variables using %var and create functions using %function. If not, I’d recommend you check out the guides published by MuleSoft on these features.

I’ll start by examining expressions in DW and how they’re different from statements in Java. Then, I will discuss immutable data and pure functions: how they help us reason about code and how they can make some operations a little more cumbersome. Next, I will discuss first-class functions: what they are and how to take advantage of them. Afterwards, I will discuss higher-order functions: what they are, how to use them and how to make your own. Finally, I’ll discuss lambdas and how they make languages with higher-order functions easier to use.

Everything is an Expression

In DW, everything is an expression. An expression is a chunk of code that evaluates and returns something. This is opposed to a statement which can evaluate but not return anything. Here is an example of a statement in Java which doesn’t return anything:
if(x == 0) {
y = x;
} else {
x = 0;
y = x;
}

We couldn’t slap a z = … in front of the statement and expect it to return anything (ignoring the fact that it’s a syntax error and will not compile). Compare this to how DW implements conditional logic:

0 when x == 0 otherwise 1

This line of code returns something; it’s an expression. I could assign it to a variable:

%var y = 0 when x == 0 otherwise 1

I’m not the language designer but I’d guess this has something to do with why DW doesn’t have a return keyword. It’s not necessary because everything is an expression and therefore it is implied that everything returns. So, you can write functions like this:

%function add2(n)
n + 2

Which might look a little weird to those of you coming from a Java background. First, there are no curly braces around the body of the function. Curly braces in DW are reserved for creating simple objects (e.g. {"Hello":"world"}), and are not needed to define the body of the function (unless the body of the function creates an object). Second, there is no return keyword. You might ask yourself, what does a function return in the case it contains multiple nested expressions? In this case, the function returns the result of evaluating the last expression in the body. For example:

%function add2IfOdd(n)
(n + 2) when (n mod 2) != 0 otherwise …

The above code will return the result of n + 2 when n is odd. If it’s not odd, DW will keep evaluating what expressions “…” contains until there is nothing left to evaluate. Finally, it will return whatever value was returned by the final expression.

As a final example let’s check out printing to the console in Java:

System.out.println("Hello", world");

This is a statement; it doesn’t return anything. Compare this to printing to the console in DW:

%var onePlusOne = log("the result of 1 + 1 is", 1 + 1)

This will do two things, print “the result of 1 + 1 is – 2″ to the console and return the value 2 from the log function. The variable assignment is not necessary to use log, it’s just put there to illustrate log returns a value that you can use later. If you’re curious about how log works in action, check out my post about debugging DW code.

The lesson here is to remember that every block of code in DW is an expression and therefore returns a value. This is a significant break from the Java-esque paradigm where statements are also available.

Immutable Data and Pure Functions

Immutable data is a feature of DW that might not be immediately apparent to developers coming from a Java background. For example, in the following script:

%dw 1.0
%output application/java  

%var input = [1,2,3,4,5,6]
---
input filter $ > 3

// Output: [4,5,6]

You might think the filter function removed the first three values from the input array and return the modified array. However, since DW has immutable data you cannot remove values from an existing array. Instead, think about how DW is creating a new array without the first three values of the input array. You can see this here:

%dw 1.0
%output application/java

%var input = [1,2,3,4,5,6]

%var big = input filter $ > 3
---
input

//Output: [1,2,3,4,5,6]

For visual learners, Figure 1 below might better explain what’s going on. DW is passing each value through the filter to determine if it makes it into the output array:

This might seem like a “6 or one-half dozen” situation, but the implications are very important to understand. There are two big consequences of immutable data which will affect the design of your DW code vs. your Java code: imperative looping constructs, like for, are gone (they rely on modifying a variable so that the loop terminates) and values will never be modified (this gives us pure functions, more on them later).

The lack of imperative looping constructs is not as disruptive as you might think. The pattern of looping is abstracted away by functions you’ll use daily, like map and filter. However, sometimes map and filter won’t be the right tool for the job. Luckily, reduce is always available. Lastly, recursive function calls (i.e. functions that call themselves), as a warning, situations, where you will need to use reduce or recursion, will become disruptive until you are comfortable with them. Practice makes perfect here. If you need practice with reduce within the context of DW, or you do not yet understand how it works, I’d recommend checking out my blog post on Using reduce in DataWeave.

The fact that data cannot be modified almost always benefits us. It allows us to more easily predict the behavior of code because we don’t need to wonder whether a function will modify a variable that we can pass to it. This language feature gives us a couple of powerful concepts that we can leverage. One is called pure functions. Pure functions are functions that return the same output when given the same input no matter what. The second, is referential transparency, meaning we can substitute any expression with the result of that expression. These concepts give our code a math-like quality when it comes to trying to understand how code works and predicting its behavior. With the mathematical expression (1 + 2) * 3, we can substitute 3 for (1 + 2) to give us the simpler expression 3 * 3. In DW, we can reason about the code in the same way:

%dw 1.0
%output application/java

%function add2(n)
n + 2

%function multiplyBy3(n)
n * 3
---
multiplyBy3(add2(1))

We can confidently substitute the expression add2(1) with the value 3, relieving ourselves of the mental burden of needing to understand how that function works in the context of the larger goal. Now we only need to understand how multiplyBy3(3) works. This idea probably seems trivial in this context; after all, how hard is it to reason about simple arithmetic? However, this idea scales nicely as our DW scripts become more complicated. For example, let’s take a script that calls 10 different functions. We can look at the script at a macro-level, saying that instead of an orchestration of 10 individual functions it’s one huge function. As an alternative, we can also zoom into a micro-level and make the same kinds of assertions against individual functions and expressions. This is the power of pure functions and immutable data.

I’m sure you noticed I said this idea almost always benefits us. We don’t get this predictability of code behavior for nothing. One downside is it forces you to take a new perspective on how your programs work. This takes time to get comfortable with. The other downside is that some cases that are easy to implement in languages like Java, become more tedious in DW. Let’s take the example of incrementing a value in an existing object. In Java, you can do this in a single line of code if you chose not to use a temporary variable:

// The variable m could be representing as {"one": 1, "two": 2}
m. put("one", map.get("one") + 1);

// Results in {"one": 1, "two": 2}

To achieve a similar result in DW, we need to create a new object from the existing object where all the key-value pairs are the same except the target value:

%dw 1.0
%output application/java

%var m = {
 'one': 1,
 'two': 2
}
---
m mapObject ({
  $$: $
}
unless ('$$' == 'one') otherwise
{
  $$: $ + 1
})

// Output:
// {
//    'one': 2,
//    'two': 2
// }

That’s seven lines of DW to one line in Java (of course we could compress all the DW code into one line, but who wants to read that?). However, this example is an outlier. So, while in this example we are writing more DW code than the equivalent Java program, on average we will end up writing less. This is a huge win because less code means less potential for bugs. In my experience, I’ve found occasional inconveniences are a small cost for the ability to more easily predict how code is supposed to work.

Now that we’ve covered expression, immutable data, and pure functions, you could probably stop here and go about the rest of your Mule career writing DW code that is great. You now know that you need to think about DW code differently than you do Java. Instead of thinking about code in terms of how you can modify existing data, you need to think about code in terms of how to build new data from immutable data. At this point, I’d recommend you take this idea and educate yourself on the internal mechanics of map, filter, and reduce.

First-Class Functions

In DW, functions are first-class citizens. This means that we can use them in the same way we do arrays, objects, and primitives. We can assign them to variables, pass them to functions, return them from functions, add them to arrays, etc. This is what allows for functions like map, filters, and reduce to take other functions as parameters. Here’s a pedagogical example that illustrates some of the ways you can use functions:

%dw 1.0
%output application/java

%var input = [1,2,3,4,5,6]

%function even(n)
 (n mod 2) == 0]

%var isEven = even
%var fnArr = [isEven]
---
input filter fnArr[0]($)

// Output: [2,4,6]

    *Note: the above code shows an error in my Anypoint IDE, but it runs fine.

In the above example, we assign the function even to the variable isEven, put isEven in an array, fnArr, and pass that variable as the second parameter to the filter function. This leverages three pieces of functionality gained from first-class functions. Again, this is an example used to show what can be done with functions when they’re first-class citizens. Please don’t write code like this for your company or clients.

First-class functions are typically a pain-point for programmers with a Java background because they’re unsupported in Java. If you’re a long-time Java programmer, this concept of functions running around without having their hands held by classes may be unfamiliar. If you’ve done any programming in JavaScript (I hope this assumption isn’t too far off, as most programmers in the world have some experience with JS), you’ve probably dealt with first-class functions before:

[1,2,3,4,5].forEach(function(n) {
console.log(n + 1);
});

In this case, the forEach function is given a function as its sole argument.

You may not find much information about first-class functions and how they’re used in DW (or really any of the distinctly functional concepts of the language, for that matter) but there are plenty of learning materials centered around them for other languages. If you’re looking for another source to learn about first-class functions and higher-order functions (covered in the next section), I’d recommend using a more popular language with ample documentation and blog posts, like JavaScript. JavaScript shares a few important things with DW, like first-class functions and being a dynamic language (more or less). However, JavaScript has mutable data by default, so be aware of this when learning about these concepts through the language. I’m confident that the functional concepts you learn in JavaScript will easily carry over to DW.

Higher-Order Functions

First-class functions enable us to create higher-order functions. A higher-order function (HOF) is a function that takes another function as input, returns a function as output, or both. HOFs are a powerful tool in terms of abstraction, separation of concerns, and code reuse (much like classes).

I like to use the example of counting things in an array to illustrate the type of separation of concerns that HOFs enable. For example, let’s say you have a requirement that asks you to determine the count of numbers in an array greater than 100. We’ll implement this using filter and sizeOf, but keep in mind this is a great use case for reduce as well.

%dw 1.0
%output application/java

%input = [98,99,100,101,102]

%function countGreaterThan100(arr)
  sizeOf (arr filter $ > 100)
---
countGreaterThan100(input)

// Output: 2

Then you get another requirement where you need to get the frequency of which a string, defined by a client, appears in an array. You might be eyeing your countGreaterThan100 function and trying to figure out a way you could make it generalized so that it works for both cases. But you see that it’s counting based off numeric comparisons and now you need to compare strings. Eventually, you get discouraged and implement this function:

%function countOfIdMatches(arr)
  sizeOf (arr filter $ == flowVar.id)

But notice the similarities:

%function countGreaterThan100(arr)
  sizeOf (arr filter $ > 100)

%function countOfIdMatches(arr)
  sizeOf (arr filter $ == flowVars.id)

We pass in an array, we’re using filter and sizeOf in the exact same way, and the shape of the two functions is almost identical. With so many similarities, chances are we’ll be able to abstract them out into a separate function. You may have noticed that the only thing that’s meaningfully different is how to filter the array, which is determined by an expression that evaluates to true or false. What if we could pass in that expression? We can. Just wrap it in a function and call it within the counting function with the appropriate arguments. The logic within this function will dictate to the counting function when to increment the counter. Then you could use it to count things dealing with strings, numbers, objects, arrays, etc.:

%function countBy(arr, fn)
  sizeOf (arr filter fn($))

%function greaterThan100(n)
  n > 100

%function idMatch(id)
  id == flowVars.id

%var over100    = countBy([98,99,100,101,102], greaterThan100)
%var idMatches  = countBy(["1", "2", "2", "3", "2"] idMatch)

This is how you can use HOFs to create utility functions that will likely find use across multiple projects. Keep in mind that this kind of abstraction is available to you in Java (for example, sorting arrays by defining a Comparator, and passing that Comparator to a sort method), but requires considerably more boilerplate code.

Lambdas

Lambdas are just a fancy way of saying functions that don’t have a name. they are also called anonymous functions, function literals, and unnamed functions. Here’s what they look like in DW:

((n) -> n + 1)

You define your input parameters in parentheses, add an arrow, then define the body of the function. Finally, you wrap the whole thing in parentheses. In this case, we have a function that takes in a single parameter, n, and returns the result of adding n + 1. You can take in more than one parameter to your function, if necessary:

((n, m) -> n + m)

Lambdas are rarely necessary in the sense that we can get along just fine without them. This is because you can always define a function before you use it. However, lambdas allow us to create functions on the fly. This allows us to more easily work with HOFs without any boilerplate code. If you’ve been using filter, you’ve probably been using lambdas without knowing it:

["foo","bar","foo","foo","bar"] filter $ == "foo"

Here, $ == “foo” is a lambda. You might be thinking to yourself “Yes, I’ve done that before, but the syntax doesn’t match what you said earlier.” You’re correct. What I just showed is a syntactic convenience for this:

["foo","bar","foo","foo","bar"] filter ((str) -> str == "foo")

$ in the previous code just refers to the current value of the iteration. Just keep in mind that $ (and $$ for some functions) is only available for built-in DW functions, not functions that you define.

Let’s see how we’d use the countBy function in our previous section with a lambda instead of a previously defined function:

%function countBy(arr, fn)
sizeOf (arr filter fn($))
---
countBy([1,2,3,4,5], ((n) -> n > 3))

// Output: [4,5]

We need to add an input parameter, fn, that will represent our function, which defines when to increment the counter. Then we will replace the conditional section of the Boolean expression with a call to fn, being sure to pass it the current value of the iteration. And that’s it!

Conclusion

In summary, we went over expressions, immutable data, pure functions, first-class functions, higher-order functions, and lambdas. We discussed what each of these are, how they require a change in perspective, and how they work together to enable DW to excel at its given task. Please understand, this high-level change in the way you think about your code. It is not going to happen overnight. But if you’re diligent and work a little bit at it each day, you’ll get it.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

More Posts by this Author

  • Tips on Debugging DataWeave Code
  • How to Modify All Values of an Object in DataWeave
  • When Should You Use reduce in DataWeave

Filed Under: Mulesoft Tagged With: Coding, DataWeave, Developer Tips

Written By: Joshua Erney

 
 
This post will examine the reduce function in DataWeave (DW) language. It will first illustrate reduce at a high level using a simple example. Then, it will explain how reduce works, what it expects as arguments, and how those arguments need to be constructed. It’ll dive into a more complex, real-world example that illustrates how reduce can be used to make existing map/filter code more efficient.  Additionally, it will provide an example of how we can use reduce to create a utility function that separates the concerns for counting the amount of time something occurs in the array. And finally, it will go over how and when to use default values for the input function to reduce, and some additional considerations when using reduce in the workplace. This post will also take the opportunity to highlight a few functional programming concepts like higher-order functions, and immutable data structures. If you’re familiar with these concepts already, feel free to skim over those sections, you won’t miss anything.

reduce is one of those functions in DW that doesn’t seem to get as much love as its companions, map and filter. Most people know about map and filter; use map to create a new array that’s the result of applying a transformation function to each element of the original array, and use filter to create a new array with elements from the original array removed according to a predicate function. When it comes to data transformation, it’s relatively easy to identify use cases for map and filter: mapping fields from one data source to another, or removing elements that shouldn’t go to a certain data source. They’re relatively specific functions (not to mention used all the time in Mule integrations and data sequences in general), so identifying when they should be used is straightforward once you understand what they do. Identifying where reduce comes into play is a little bit more difficult because it is incredibly general, and unfortunately most of the examples out there don’t really paint a great picture of what it’s capable of. Most of us who were curious about reduce in the past have already seen the addition example everywhere:

Typical reduce Examples

%dw 1.0
%output application/java
 
%var input = [1,2,3,4]
---
input reduce ((curVal, acc = 0) -> acc + curVal)
 
// Output: 10

If you’re not familiar, this code adds all the numbers in the array. Incredible!

The code is trivial and chances are you’re never going to simply add an array of numbers together in any code, but this example illustrates something important that I looked over for a long time, and maybe you did, too: reduce, like map and filter, takes an array and a function as input, but unlike map and filter, its primary job is to reduce all the elements in the array to a single element, where an element can be a number, string, object or array. In this case, we have an array of numbers that’s reduced into a single number.

Let’s unwrap the mechanics of reduce to make sure we really understand how to use it before moving on. First things first, just like map and filter, reduce is a higher-order function. What’s a higher-order function, you ask? It is a function that takes a function as one of its inputs. reduce takes two parameters, on its left side it takes an array, and on its right side it takes a function that it will use to operate on each value of the array. The left side is trivial, the right side is where things can get confusing. The function passed to reduce needs to take two arguments, the first will represent the current value of the iteration, the second will be an accumulator (which could be anything: a number, object, array, etc). Just like it’s your responsibility to make sure the function passed to filter returns a boolean, it’s your responsibility to make sure the function passed to reduce returns the new value for the accumulator. Let’s look at how the accumulator changes with each step through the input array by using the log function on the example above (Refer to the blog post on Debugging DataWeave Code for more info on how log works). If you’re unclear of how reduce works, log will be your best friend when debugging reduce functions. We will also log the current value of the iteration.

Typical reduce Examples with log

%dw 1.0
%output application/java
 
%var input = [1,2,3,5]
---
input reduce ((curVal, acc = 0) -> log ("acc = ", acc) + log(
"curVal = ", curVal))

Here’s what the log looks like (formatted for clarity):

acc    = 0
curVal = 1
 
acc    = 1
curVal = 2
 
acc    = 3
curVal = 3
 
acc    = 6
curVal = 4

Keep in mind that in the above code, we’re logging acc before it is replaced by the expression acc + curVal. Let’s take that log file and look at pieces of it to see what reduce is doing.

acc    = 0
curVal = 1

0 + 1 = 1. What’s the next value for acc? 1!

acc    = 1
curVal = 2

1 + 2 = 3. What’s the next value for acc? 3!

acc    = 3
curVal = 4

By now you see where this is going.

Let’s make this example a little bit more complicated to illustrate that we can use something more complex than a number for the accumulator. What if we wanted to add all the even numbers together, add all the odd numbers together, and return both? First, we already know we’re going to need a container to hold the two values. Let’s decide now that for this we will use an object with two keys, odd and even. We’ll also create a function, isEven, to help future developers understand our code. We’ll slap on the log now so we can see how the accumulator changes with each iteration.

A More Complex reduce Example

%dw 1.0
%output application/java
 
%var input = [1,2,3,4,5]
 
%function isEven (n) n%2 == 0
---
input reduce ((curVal, acc = {odd: 0, even: 0}) -> log("acc = ",
{
 odd:  (acc.odd + curVal
          unless isEven(curVal)
          otherwise acc.odd),
 even: (acc.even + curVal
          when isEven(curVal)
          otherwise acc.even
}))
 
// Output: {odd: 9, even: 6}

Here’s what the log file looks like:

acc = {odd: 1, even: 0}
acc = {odd: 1, even: 2}
acc = {odd: 4, even: 2}
acc = {odd: 4, even: 6}
acc = {odd: 9, even: 6}

Since the array we passed to reduce alternates between odd and even numbers, the function we passed reduce alternates between adding to the odd value and the even value as well. And notice that the function passed to reduce creates a new object to return as the accumulator every time. We’re not modifying the existing accumulator object. Data structures in DW are immutable by design, so we couldn’t modify the accumulator even if we wanted to. Avoiding the modification of an existing object is an important functional programming concept; map and filter work the same way. This might seem confusing at first, but look at it this way: for reduce, the data that you return must be in the same shape as your accumulator. In the first example, our accumulator was a number, so we return a number. In this example, our accumulator was an object with two keys, odd and even, so we return an object with the keys odd and even.

Above are just pedagogical examples, though. How might we use reduce in the work place? A typical use case is to count the number of times something occurs (why “something” was italicized will be revealed later). Say we receive an array of payment transactions from a data source, and we want to know how many of these transactions were over a certain threshold, say, $100.00, and we want a list of all the merchants that charged us over $100.00, with no duplicates. The requirements dictate that this must all be in a single object. Here’s how we might do that without reduce:

Real-World Example without reduce

% dw 1.0
%output applications/java
 
%var input = [
  {
    "merchant" : "Casita Azul",
    "amount"   : 51.70
  },
  {
    "merchant" : "High Wire Airlines",
    "amount"   : 378.80
  },
  {
    "merchant" : "Generic Fancy Hotel Chain",
    "amount"   : 555.33
  },
  {
    "merchant" : "High Wire Airlines",
    "amount"   : 288.88
  }
]
 
%var threshold = 100
 
%function overThreshold(n) n > threshold
 
%var transactionsOverThreshold = input filter overThreshold($)
 
%var merchants = transactionsOverThreshold map $.merchant distinctBy $
---
{
  count: sizeOf transactionsOverThreshold,
  merchants: merchants
}
// Output:
// {
//   count: 3,
//   merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }

This is nice, and does the job quite well for a small input payload. But notice that we need to loop through the input payload once to filter out objects with amounts over the threshold, and then we need to loop through the resulting array again to map the values to get a list of merchant names, and then loop through that resulting array to filter out duplicate merchants. This is expensive! Since this is a real-world example, what if there were 400K records instead of just 4? At this point you might be thinking to yourself “I can just use Java instead, and I will only have to loop through the payload once with a for loop.” Not so fast! Don’t give up on DW just yet. What if we could use a single reduce instead of multiple map/filter combinations? Here’s what that would look like:

Real-World Example with reduce

% dw 1.0
%output applications/java
 
%var input = ... // Same as above except for 400K instead of 4
 
%var threshold = 100
 
%function overThreshold (n) n > threshold
---
input reduce ((curVal, acc = {count: 0, merchants: []}) -> ({
  count: acc.count +1,
  merchants: acc.merchants + curVal.merchant
              unless acc.merchants contains curVal.merchant
              otherwise acc.merchants
}) when overThreshold(curVal.amount) otherwise acc)
 
// Output:
// {
//   count: 3,
//   merchants: [ 'High Wire Airlines', 'Generic Fancy Hotel Chain' ]
// }

Much better. Now we can deal with everything we need to in one loop over the input payload. Keep this in mind when you’re combining map, filter, and other functions to create a solution: reduce can be used to simplify these multi-step operations and make them more efficient (thanks to Josh Pitzalis and his article ‘How JavaScript’s Reduce method works, when to use it, and some of the cool things it can do’, for this insight. Check it out to see how you can create a pipeline of functions to operate on an element using reduce. It is very cool).

Notice that again we’re never mutating the accumulator, because data structures are immutable in DataWeave. We either pass on the existing accumulator (otherwise acc), or we create a new object and pass that on to the next step of iteration. Also notice that we’ve reduced an array of elements into a single object, and built an array within the object in the process; much more complex than adding a series of integers, but still a completely valid use case for reduce.

Let’s simplify the problem above to illustrate another point. This time, we’ll only get the count of every transaction over $100.00. Counting the number of occurrences that something happens in an array is a very common use case for reduce. It’s so common that we should separate the concern of how to count from the concern of when to increment the counter. Here goes nothing:

Using reduce to Abstract Away Counting

% dw 1.0
%output applications/java
 
%var input = ... // Same as above
 
%var threshold = 100
 
%function countBy(arr, predicate)
  arr reduce ((curVal, acc = 0) -> acc + 1
                                    when predicate(curVal)
                                    otherwise acc)
---
{
  count: countBy(input, ((obj) -> obj.amount > threshold))
}
 
// Output:
// {
//   count: 3
// }

Now we have a higher-order function, countBy, that takes in an array, and a function that defines exactly under what conditions we should increment the counter. We use that function in another higher-order function, reduce, which deals with the actual iteration and reduction to a single element. How cool is that? Now, with tools like readUrl, we can define the countBy function in a library, throw it into a JAR, and reuse it across all our projects that need it. Very cool.

Using the default values for reduce input function

The examples shown above do not use the default arguments to reduce’s function, $ and $$. I think it’s easier to teach how reduce works by explicitly defining the parameters to the input function, but in some situations, this won’t work, and you’ll need to rely on the defaults. For example, let’s implement the function maxBy using reduce, which will get us the maximum value in an array according to a predicate function that defines what makes one value larger than another.

%function maxBy(arr, fn)
  arr reduce ((curVal, acc = 0) ->
    curVal when fn(curVal, acc) otherwise acc)

Do you see the problem here? We initialize the accumulator with 0. If we pass in the array [-3,-2,-1], and the function ((curVal, max) -> curVal > max), we’d expect a function called maxBy to return -1, but this one will return 0, a value that’s not even in the array, because curVal > max will return false for every element in the array. Even worse, what if arr wasn’t an array of numbers? We might try to get around this by doing this instead:

%function maxBy(arr, fn)
  arr reduce ((curVal, acc = arr[0]) ->
    curVal when fn(curVal, acc) otherwise acc)

Which will work just fine, but it will waste the first iteration by comparing the first value to the first value. At this point we might as well avoid getting the value by index and take advantage of the default arguments: $, which is the current value of the iteration, and $$, which is the current value of the accumulator. By default, $$ will be initialized with the first value of the array passed in, and $ will be initialized with the second:

%function maxBy(arr, fn)
  arr reduce ($ when fn($, $$) otherwise $$)

So the lambda ($ when fn($, $$) otherwise $$) can be explained as, “Set the accumulator ($$) to the current value ($) when the function fn returns true, otherwise, set the accumulator as the current accumulator.”

Additional Considerations

Before wrapping up there are three things I’d like to point out. First, we’ve seen that we can replace map/filter combinations with reduce, so it follows that we can implement both map and filter in terms of reduce. Here’s filter if you need proof:

%function filter(arr, predicate)
  arr reduce ((curVal, acc=[]) ->
    acc + curVal when predicate(curVal) otherwise acc
  )

This means that there are times when you may try to use reduce where map or filter would be the more specific and appropriate tool to get the job done. Try not to use reduce in these circumstances, and instead reach for the more specific function. The intent of your code will be more obvious, and future developers reading your code will thank you. I’ve found that a good rule of thumb is, if I’m going to use reduce to reduce to an array, chances are my intentions would be clearer using map or filter.

Second, these examples use the variables curVal and acc to denote the current value of the iteration, and the accumulator. I’ve used these names to help illustrate how reduce works. I do not recommend using these names when you write code. Use names that describe what you’re working with. For example, when trying to find the count of transactions over a threshold to generate a report like we did earlier, we might use trans and report instead of curVal and acc.

Third, this is more of general advice for consultants: reduce isn’t a concept that is easily understood by most programmers (I wrote this article for myself to better understand how it works), especially those that come from a Java/C++/C# background where mutable data structures and imperative looping constructs are the name of the game. At ArganoMS3 we have a multitude of clients, some heavily adopting MuleSoft products across their organizations with years of internal MuleSoft expertise, others having no internal expertise and needing just a few small integrations. As consultants, we need to leave these organizations with code that they can change, maintain, and expand on when we’re gone. Get a feel for the organization you’re working with. Do the developers there understand functional concepts? Are most of them long-time Java programmers who’ve never seen reduce in their lives? If you’re dealing with the former, using reduce to lessen the amount of code you need to write to accomplish a task is a good move; others might already understand your code, and if not, they have other people within their organization that can help. If you’re dealing with the latter, you’ll probably cause a fair share of headaches and fist-shakings at the clever code you left behind that the client is now having trouble understanding. Point being, reduce is not for everyone or every organization, and the client needs to come before the code.

Conclusion

In conclusion, we use reduce to break down an array of elements into something smaller, like a single object, a number, a string. But it also does so much more. reduce is an important and highly flexible tool in functional programming, and therefore, in DW as well. Like most highly flexible programming constructs, reduce is a double-edged sword, and can easily be abused, but when you use it right, there’s nothing quite like it.

About the Author

Joshua Erney has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.

More Articles by this Author

  • Tips on Debugging DataWeave Code
  • How to Modify All Values of an Object in DataWeave

Filed Under: Mulesoft Tagged With: Coding, DataWeave

An Overview

By: Joshua Erney

Recently, I was getting data from a stored procedure that contained a very large amount of whitespace on certain fields but not on others, and, occasionally, those fields were null. I wanted to remove this white space on these fields. Below, I will document the process to get to the final iteration of the code by using more and more functional programming concepts as I go. Here was my first go at it:

First Iteration:

%dw 1.0
%output application/java
 
%function removeWhitespace (v) trim v when v is :string otherwise v
---
payload map {
  field1: removeWhitespace($.field1),
  field2: removeWhitespace($.field2)
...
}

This works. But it has two problems: it obscures the mapping by adding a repetitive function call for EVERY field and, if you don’t want to do every field, you would need to individually identify which fields have extra whitespace and which ones don’t. This is time consuming and potentially impossible given that this could change from row-to-row in the database. So, this solution might work now, but it might not work for every field. Additionally, if things change, it is incredibly brittle. Here’s my second iteration:

Second Iteration

%dw 1.0
%output application/java
 
%function removeWhitespaceFromValues(obj)
  obj mapObject {($$): trim $ when $ is :string otherwise $}
 
%var object = removeWhitespaceFromValues (payload)
---
object map {
  field1: $.field1,
  field2: $.field2
...
}

Awesome! We no longer have to individually identify which fields have a bunch of white space because the function doesn’t care. It will trim every value that is a string, which is exactly what we want. But… could it be better? What if someone wanted to use this code in the future to do the same thing to a JSON object with nested objects and lists? The second iteration of the function will not accomplish this; it will not apply the trim to nested objects and arrays. This next sample uses the match operator, which brings pattern matching to DataWeave. If this is your first exposure to either match or pattern matching, you can find out more about the match operator using these MuleSoft documents , and more about pattern matching in general here. Now let’s take a look at the code.

Third Iteration

%dw 1.0
%output application/java
 
%function removeWhitespace (e)
  e match {
    :array  -> $ map removeWhiteSpaceFromValues($),
    :object -> $ mapObject {($$): removeWhiteSpaceFromValues($)},
    default -> trim $ when $ is :string otherwise $
    }
%var object = removeWhitespaceFromValues( payload )
---
...

Cool. Now we have a function that will remove the white space from every value in an element that is a string, including deeply-nested elements. You might think we’re done. But we can do a lot better using something called higher-order functions. In other words, we’re going to pass a function into our existing function to specify exactly how we want it to work. Check it out (thanks to Leandro Shokida at MuleSoft for his help with this function):

Final Iteration

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%function trimWhitespace(v)
  trim v when v is :string otherwise v
 
%var object = applyToValues(payload, trimWhitespace)
---
...

This effectively makes the act of looping through every value in an element completely generic “applyToValues“. From here, we can define exactly what we want to happen for each value in the element “trimWhitespace“. We’ve effectively separated the concern of targeting every value in an element, with the concern of what to do with that value. What if we wanted to do something other than trim each value in the object? Just change the function you pass in. Maybe you want to trim the value if it’s a string, and increment it if it’s a number. Let’s see what that would look like:

Third Iteration (New Functionality)

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%function trimOrIncrement(v)
  v match {
    :string -> trim v,
    :number -> v + 1,
    default -> v
}
 
%var object = applyToValues(payload, trimOrIncrement)
---
...

Notice the most important thing here, the applyToValues function did not need to change at all. The only thing we changed was the function we passed into it. One last point, we don’t even need to give our function a name; we can create the second argument to applyToValues on the spot using a lambda or anonymous function. Here we will use a lambda to increment the value if it’s a number:

Third Iteration (lambda)

%dw 1.0
%output application/java
 
%function applyToValues(e, fn)
  e match {
    :array  -> $ map applyToValues($, fn),
    :object -> $ mapObject {($$): applyToValues($, fn)},
    default -> fn($)
}
 
%var object = applyToValues(payload, ((v) ->
  v + 1 when v is :number otherwise v))
---
...

In Summary

In conclusion, the idea is that DataWeave is flexible enough that you can have a chunk of code that steps through all the values of an element, and that code can be reused across projects (see Mulesoft documentation on readUrl). At the same time, you don’t need to permanently wire-in how each value is modified (i.e. it could be a trim, it could remove all spaces, it could add 1 to every number, etc). This kind of functionality is easily available to you in DataWeave, take advantage!

Josh Erney  has been working as a software engineer at Mountain State Software Solutions (ArganoMS3) since November 2016, specializing in APIs, software integration, and MuleSoft products.  

Filed Under: Mulesoft Tagged With: Coding

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • Next Page »
FUTURE PROOF SOFTWARE SOLUTIONS
ArganoMS³ enables organizations to meet today’s software, integration, cloud, and data-related challenges with confidence and ease.

About

  • Why Us
  • Leadership
  • Team
  • Clients
  • We're hiring

Solutions

  • API & Integration
  • Strategy and Optimization
  • Cloud Services
  • Artificial Intelligence

Partners

  • Apigee
  • AWS
  • Kong
  • MuleSoft
  • Red Hat
  • Salesforce
  • UiPath

Popular Links

  • Contact Us
  • Blog
  • White Papers
  • Case Study
COPYRIGHT © 2022 ⬤ ArganoMS³ MOUNTAIN STATE SOFTWARE SOLUTIONS

Copyright © 2023 · MS3 Mountain State Software Solutions · Log in