So this isn’t strictly a GIS or mapping tool or tip that I thought I’d share, but considering the amount of time and importance of joins, spreadsheets, datasets and databases in the profession, or even outside geospatial work, this little tool called ‘Fuzzy Lookup’ may come in handy.
Often when trying to query, search or join up one table to another, or for whatever purpose, VLOOKUP can do just the job, ‘vertically looking up’ one value, in order to display another. If you are new or unfamiliar to VLOOKUP, Debra Dalgleish has an awesome blog, Contextures to help with this and much more even for experienced users. There is the ‘range_lookup’ operator, FALSE, which will look for a identical match, or TRUE, an approximate match; the latter of these is useful for numbers, less so for text.
However, quite often data isn’t this tidy or helpful to make this work 100% and there will be exceptions which can make things messy, or fuzzy, so to speak. For example, I may have national data, which has the name, “Hong Kong, China SAR”, whereas my dataset from my spatial dataset says just “Hong Kong (China). We can see that this means the same thing in reality, but VLOOKUP can’t logically get round this, without changing the name in the data itself. This is fine for one, but what if you have many more. A plugin, called “Fuzzy Lookup Add-In for Excel“, free and simple to download and install from Microsoft for Excel 2007 upwards can help resolve this.
As shown above, the plugin allows you to check for similarities to join together two data tables, providing a similarity score, much like the geocoder tool in ArcGIS does. With this you can join and isolate any remaining non-joining data, importantly before, attempting to join this in ArcGIS and discovering the join errors there.
Hope this helps if you weren’t already aware of this tool.