It’s OK to Play with Matches
I have a problem with my name. When someone calls me at work, I say “This is Mike Grucella.” My last name is pronounced grew-sel-uh. If I say my first and last name pretty quickly, the person on the other end of the line sometimes thinks I’m saying “Microsoft”. Not kidding. And it’s not just me. Prior to working at InRule, I used to work at a company called Microsystems. One day, my wife called the office:
Microsystems Receptionist: “This is Microsystems. How may I direct your call?”
Wife: “Can I speak with Mike Grucella?”
Microsystems Receptionist: “I’m sorry. That office is actually down the hall, but I can give you their number if you’d like.”
She then dials the other number…
Microsoft Receptionist: “This is Microsoft. How may I direct your call?”
Wife: “May I speak with Mike Grucella please?”
Microsoft Receptionist: “This is. How can I help you?”
“Microsystems”, “Microsoft”, “Mike Grucella”…it was a perfect storm for confusion. Sometimes things just aren’t clear. In my wife’s case, close wasn’t cutting it. However, there are times when “close enough” is exactly what you want.
Sometimes you don’t know up front the exact answer to a question, so approximation becomes your friend. If you can narrow down the field to a smaller, more relevant set of choices, it becomes much easier to find exactly what you’re looking for.
Let’s say you want to start your own business. Maybe it’s a doughnut shop. One step in this process is to register your business name. You like the sound of “Donuts Rule!”, so you try to register it. The registration system comes back to you with a list of comparable names:
“Doughnut House”, “InRule Technology”, “Nuts-n-Bolts Hardware”, “So Nuts for Donuts”, “Rules-of-the-Road Driving School”, etc.
The rule engine driving that registration check has some fuzzy matching algorithms baked into it. Now you know if your choice of “Donuts Rule!” is unique or too close to somebody else’s business name and you need to think of a better option.
Match the Algorithm to Your Need
There are a number of proven algorithms for fuzzy matching. I’m not going to get into the details of how they work but I’ve provided links for your reference. Here are two worth considering:
The Levenshtein Distance algorithm calculates the difference between two text values by determining the number of changes it would take to change the first value to the second value. The number of changes determines how “close” the two words are. The smaller the number, the closer the match and vice versa. For example, “same” and “framed” would result in a Levenshtein Distance of 3 (1 to change the “s” to an “f” + 1 to add the “r” + 1 to add the “d”). “Same” and “fame” would have a distance of 1.
The Soundex algorithm is really interesting. It was used in the early part of the 20th century. Yes, that would be pre-computer. The census takers needed a system to do phonetic matching on surnames that sounded the same but had different spellings, like “Smith” and “Smyth”. Names are translated into a 1 letter + 3 digit code, factoring out vowels and soft consonants like “h”, “w” and “y”. So “Smith” or “Smyth” would both result in S530, which the census taker would then file that person under, making it easier to locate relatives who may not be using the same spelling.
These are two very different options for fuzzy matching. Both have unique purposes. The best part is that if you need to incorporate these types of algorithms into your rules, you can do so in InRule’s irAuthor. One option is to create them as user-defined functions using InRule’s irScript language. There is a second option if you have existing fuzzy matching functions in some other part of your application. Maybe it’s wrapped in a web service or a .NET assembly. If that’s the case, you can make calls to either of those with InRule’s Endpoint configuration. The important thing to note is that there aren’t many things that are outside the realm of possibility with InRule. From the simplest “If…Then” statement to the more complex matching algorithms described above, you will never feel shortchanged when it comes to finding the right match for your application’s needs.