Subscribe now and choose from over 30 free gifts worth up to £49 - Plus get £25 to spend in our shop
is there a way to find partial matches on two columns of data? My wife has a list of 1000 studies and another list with those 1000 studies plus 300 or so more. She needs to be able to show the extras that have appeared.
Conditional format on duplicate values doesn't work because some of the records in the first list have maybe a colon or full stop at the end as they've been copied over, whereas the new list has been tidied with text to columns and the like so is lot tidier.
A quick google finds a number of providers who do "fuzzy" match things as a plug in - these promise to be able to find partial matches - but we can't do plugins as it's a work laptop.
Any way of doing it in excel alone?
do they have DOIs ?
I tried various fuzzy matches macros previously; didn’t have much luck. I’d suggest spending some time cleaning up the original data list with global search/replace. E.g replace all “:” with “”. Then name the range of your source list and use vlookup to check if each value of the new list occurs anywhere in the named range.
You might find something useful on the SuperUser site, search for Comparing Similar Text Strings in Excel.
I use this:
https://www.microsoft.com/en-gb/download/details.aspx?id=15011
It's pretty handy. I've installed it on a number of different work machines at various places.
Trim() is also useful when cleaning data to remove any additional spaces which might have crept in.
So if you clean it up the numbers will match?
Find a replace as described above will sort that in about 5 seconds.
Trim is also a good shout.
Conditional format on duplicate values doesn’t work because some of the records in the first list have maybe a colon or full stop at the end as they’ve been copied over, whereas the new list has been tidied with text to columns and the like so is lot tidier.
If it's just one character at the end and the clean values are a consistent length you could use LEFT(A1,[the length]). If the unwanted character doesn't appear anywhere else in the value you could use SUBSTITUTE. If they're varying lengths you could use IF((RIGHT(A1,1) = ".") + (RIGHT(A1,1) = ":"), LEFT(A1,LEN(A1)-1), A1).
Another possibility is to use LEFT to create new columns with just the first 20 characters, find matches with vlookup, sort the matches together and hide or delete them. In combination with TRIM etc that might get it down to an easy manual check.
IIRC wont one of the lookup functions only find the closest match? So if you sort alphabetically it looks down the list untill it finds the last one that matches.
e.g. Searching for 6 in 1 3 5 7 9 would return 5 (or 7 i cant remember). But that would hopefully leave you with a shorter list to check through.
Otherwise go with greybeards idea, delete all spaces and compare the first 20 characters, should be enough to generate something unique but searchable.
Also, don't get too lost in the excel match rabbit hole if this isn't working. Two alphabetically sorted side by side lists of 1000 items wouldn't take long to do manually.
IIRC wont one of the lookup functions only find the closest match? So if you sort alphabetically it looks down the list untill it finds the last one that matches.
e.g. Searching for 6 in 1 3 5 7 9 would return 5 (or 7 i cant remember). But that would hopefully leave you with a shorter list to check through.
Vlookup with 4th argument 1. The list needs sorting - as above it would find 5
Levenshtein algorithm in VBA?
https://stackoverflow.com/questions/4243036/levenshtein-distance-in-vba
Edit: actually not sure that helps you at all, as you were
If the list of 1000 +300 has values that have additional characters but will still contain the string of the list your are matching against you can do this with wildcards and vlookups
So if your original list says
Study1
Study2
Study3
And your new one says
Study1,
Study2::
Study3 .,
Study4:
Vlookups will allow you to identify Study4
Thankful every day that i don't have to do this sort of rubbish in Excel 🙂
