 |
AppletTalk.com Java discussions newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Keith Green Guest
|
Posted: Tue Apr 20, 2004 8:35 pm Post subject: string class with method approximatelyEquals(String s) |
|
|
I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain. Ideally, s, the comparison
string could be either a specific string or a regular expression, but
I'd settle for an approximate comparison only to another specific
string.
thanks,
k
|
|
| Back to top |
|
 |
Jeff Schwab Guest
|
Posted: Tue Apr 20, 2004 9:23 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
Keith Green wrote:
| Quote: | I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain. Ideally, s, the comparison
string could be either a specific string or a regular expression, but
I'd settle for an approximate comparison only to another specific
string.
|
How do you define "approximate?" Here's some code to tell you how
different two strings are.
http://www.merriampark.com/ld.htm#JAVA
Regular expression support is in java.util.regex:
http://java.sun.com/j2se/1.4.2/docs/api/index.html
|
|
| Back to top |
|
 |
Chris Guest
|
Posted: Wed Apr 21, 2004 11:07 am Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
| Quote: | I'm looking for a string class that will do approximate comparisons.
|
What do you mean by approximate?
I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:
e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"
Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks
I can post the code if you want.
- sarge
|
|
| Back to top |
|
 |
Keith Green Guest
|
Posted: Wed Apr 21, 2004 11:59 am Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
Jeff Schwab <jeffplus (AT) comcast (DOT) net> wrote
| Quote: | Keith Green wrote:
I'm looking for a string class that will do approximate comparisons.
How do you define "approximate?" Here's some code to tell you how
different two strings are.
|
Here's the rub. I don't understand enough yet about the data to frame
the question (of what makes two strings approximately equal) clearly.
But I think the link you supplied gives me something to think about.
thanks,
k
|
|
| Back to top |
|
 |
Eric Guest
|
Posted: Wed Apr 21, 2004 1:33 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
I'd like to see your code very much.
Thanks,
- Eric
"Chris" <sarge_chris (AT) hotmail (DOT) com> wrote
| Quote: | I'm looking for a string class that will do approximate comparisons.
What do you mean by approximate?
I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:
e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"
Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks
I can post the code if you want.
- sarge
|
|
|
| Back to top |
|
 |
Roedy Green Guest
|
Posted: Thu Apr 22, 2004 5:34 am Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
On 20 Apr 2004 13:35:34 -0700, [email]TheFallibleFiend (AT) hotmail (DOT) com[/email] (Keith
Green) wrote or quoted :
| Quote: | I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain.
|
The only generic scheme I know of is called Soundex. see
http://mindprod.com/jgloss/soundex.html
Here is another algorithm off the top of my head.
If you had a dictionary of words, you could do something like a spell
checker to convert your string into a string of standard words. Then
you get X points for matching words and Y points for matching words in
the wrong place by 1, and Z points for matching words in the wrong
place by 2....
*******************
If what you are doing is trying to convert addresses to standard form,
I used to handle this with punch cards in the old days. I wrote a
mouse powered version in assembler years ago.
Basically you use a hashtable to accumulate all possible street names,
spelled correctly or not anywhere in a your database. Then you print
them out, one per card, deduped.
The user then manually sorts the cards with the desired name first
followed by all the screwy names that should be translated to the good
name.
Whenever you get new screwy names, you punch out more cards to be
inserted somewhere in the deck.
If nobody gets the street name right, they punch up a card with the
correct spelling as the lead card.
This takes much less labour than manually correcting all the erroneous
entries or making people try to guess the correct name during data
entry.
In the twilight days of the punch card I discovered they were
excellent for associating things, e.g. transformers and telephone
poles. It was much faster the the "modern" OCR.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
|
|
| Back to top |
|
 |
Keith Green Guest
|
Posted: Fri Apr 23, 2004 4:39 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
| Quote: | I'm looking for a string class that will do approximate comparisons.
What do you mean by approximate?
|
It's difficult to say. I would like strings like "BOC 1" and "BOC 2"
to return false, but "maybe" "BOC 1a" and "BOC 1" would return true
(that they are approxEqual). The code would be insightful. My
intuition was that this would be very application specific, but then I
thought maybe there was some way to tune these things. Now I'm back
to thinking they (approxEqual methods) must be app specific.
thanks,
k
| Quote: | I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:
e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"
Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks
I can post the code if you want.
- sarge
|
|
|
| Back to top |
|
 |
Eric Sosman Guest
|
Posted: Fri Apr 23, 2004 5:36 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
Keith Green wrote:
| Quote: |
[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
I'm looking for a string class that will do approximate comparisons.
What do you mean by approximate?
It's difficult to say.
|
Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it.
| Quote: | I would like strings like "BOC 1" and "BOC 2"
to return false, but "maybe" "BOC 1a" and "BOC 1" would return true
(that they are approxEqual). The code would be insightful. My
intuition was that this would be very application specific, but then I
thought maybe there was some way to tune these things. Now I'm back
to thinking they (approxEqual methods) must be app specific.
|
I'd imagine so. "Bush" and "Rush" may be either 75% or
100% similar, depending on your point of view.
--
[email]Eric.Sosman (AT) sun (DOT) com[/email]
|
|
| Back to top |
|
 |
Roedy Green Guest
|
Posted: Fri Apr 23, 2004 7:49 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
On Fri, 23 Apr 2004 13:36:30 -0400, Eric Sosman <Eric.Sosman (AT) sun (DOT) com>
wrote or quoted :
| Quote: |
Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it
|
The way someone might tackle such a problem in the near future is to
have the computer show you "random" (not really) pairs of strings and
ask you how close they are on a scale of 1 to 10. Then a neural net
algorithm works out a scheme to put your intuition into algorithmic
(neural net) form. You never do understand what the rules are, any
more than you understand precisely why you instantly like or dislike
some people.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
|
|
| Back to top |
|
 |
Keith Green Guest
|
Posted: Sun Apr 25, 2004 8:20 pm Post subject: Re: string class with method approximatelyEquals(String s) |
|
|
Eric Sosman <Eric.Sosman (AT) sun (DOT) com> wrote
| Quote: | Keith Green wrote:
[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
I'm looking for a string class that will do approximate comparisons.
What do you mean by approximate?
It's difficult to say.
Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it.
|
Yip. But I have time to think about this. I can work on other
analyses while I figure out what definition of "approximately equals"
might be appropriate for this problem space. Currently this part of
the analysis is being done by sorting and eyeballing the data (about
1.2 GB worth of it, summarized to about 20 MB, synopsized further to a
single worksheet). I don't think this is going to work in the long
run. Working on other stuff first will give me more time to
understand the interrelationships amongst the data.
My intuition is that this is a very common problem and I was really
hoping there might be something tunable, free, and easy. I don't mind
doing it from scratch if that's what's required (probably do this part
in perl, though).
Anyway, thanks to everyone in the thread for the ideas.
k
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|