AppletTalk.com Forum Index AppletTalk.com
Java discussions newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

string class with method approximatelyEquals(String s)

 
Post new topic   Reply to topic    AppletTalk.com Forum Index -> Java Help
View previous topic :: View next topic  
Author Message
Keith Green
Guest





PostPosted: Tue Apr 20, 2004 8:35 pm    Post subject: string class with method approximatelyEquals(String s) Reply with quote



I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain. Ideally, s, the comparison
string could be either a specific string or a regular expression, but
I'd settle for an approximate comparison only to another specific
string.

thanks,
k
Back to top
Jeff Schwab
Guest





PostPosted: Tue Apr 20, 2004 9:23 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote



Keith Green wrote:
Quote:
I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain. Ideally, s, the comparison
string could be either a specific string or a regular expression, but
I'd settle for an approximate comparison only to another specific
string.


How do you define "approximate?" Here's some code to tell you how
different two strings are.

http://www.merriampark.com/ld.htm#JAVA

Regular expression support is in java.util.regex:

http://java.sun.com/j2se/1.4.2/docs/api/index.html

Back to top
Chris
Guest





PostPosted: Wed Apr 21, 2004 11:07 am    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote



Quote:
I'm looking for a string class that will do approximate comparisons.

What do you mean by approximate?

I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:

e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"

Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks

I can post the code if you want.

- sarge

Back to top
Keith Green
Guest





PostPosted: Wed Apr 21, 2004 11:59 am    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

Jeff Schwab <jeffplus (AT) comcast (DOT) net> wrote

Quote:
Keith Green wrote:
I'm looking for a string class that will do approximate comparisons.

How do you define "approximate?" Here's some code to tell you how
different two strings are.

Here's the rub. I don't understand enough yet about the data to frame
the question (of what makes two strings approximately equal) clearly.
But I think the link you supplied gives me something to think about.

Quote:

http://www.merriampark.com/ld.htm#JAVA

thanks,
k

Back to top
Eric
Guest





PostPosted: Wed Apr 21, 2004 1:33 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

I'd like to see your code very much.

Thanks,
- Eric

"Chris" <sarge_chris (AT) hotmail (DOT) com> wrote

Quote:
I'm looking for a string class that will do approximate comparisons.

What do you mean by approximate?

I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:

e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"

Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks

I can post the code if you want.

- sarge



Back to top
Roedy Green
Guest





PostPosted: Thu Apr 22, 2004 5:34 am    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

On 20 Apr 2004 13:35:34 -0700, [email]TheFallibleFiend (AT) hotmail (DOT) com[/email] (Keith
Green) wrote or quoted :

Quote:
I'm looking for a string class that will do approximate comparisons.
This stuff should be in the public domain.

The only generic scheme I know of is called Soundex. see
http://mindprod.com/jgloss/soundex.html


Here is another algorithm off the top of my head.

If you had a dictionary of words, you could do something like a spell
checker to convert your string into a string of standard words. Then
you get X points for matching words and Y points for matching words in
the wrong place by 1, and Z points for matching words in the wrong
place by 2....

*******************

If what you are doing is trying to convert addresses to standard form,
I used to handle this with punch cards in the old days. I wrote a
mouse powered version in assembler years ago.

Basically you use a hashtable to accumulate all possible street names,
spelled correctly or not anywhere in a your database. Then you print
them out, one per card, deduped.

The user then manually sorts the cards with the desired name first
followed by all the screwy names that should be translated to the good
name.

Whenever you get new screwy names, you punch out more cards to be
inserted somewhere in the deck.

If nobody gets the street name right, they punch up a card with the
correct spelling as the lead card.

This takes much less labour than manually correcting all the erroneous
entries or making people try to guess the correct name during data
entry.

In the twilight days of the punch card I discovered they were
excellent for associating things, e.g. transformers and telephone
poles. It was much faster the the "modern" OCR.





--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Back to top
Keith Green
Guest





PostPosted: Fri Apr 23, 2004 4:39 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
Quote:
I'm looking for a string class that will do approximate comparisons.

What do you mean by approximate?


It's difficult to say. I would like strings like "BOC 1" and "BOC 2"
to return false, but "maybe" "BOC 1a" and "BOC 1" would return true
(that they are approxEqual). The code would be insightful. My
intuition was that this would be very application specific, but then I
thought maybe there was some way to tune these things. Now I'm back
to thinking they (approxEqual methods) must be app specific.

thanks,
k

Quote:
I wrote a custom 'fuzzy' string-comparison algorithm for an
address-matching project:

e.g. it would match the following:
"Number 37, The Old Barn at Homesteads"
and
"37 Old Barn Housesteads"

Based on:
- removal of standard key-words (e.g. At, The)
- removal of white-space and punctuation
- capitalisation
- character-by-character comparison against a percentage matched
threshold
- and a few other tricks

I can post the code if you want.

- sarge

Back to top
Eric Sosman
Guest





PostPosted: Fri Apr 23, 2004 5:36 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

Keith Green wrote:
Quote:

[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
I'm looking for a string class that will do approximate comparisons.

What do you mean by approximate?


It's difficult to say.

Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it.

Quote:
I would like strings like "BOC 1" and "BOC 2"
to return false, but "maybe" "BOC 1a" and "BOC 1" would return true
(that they are approxEqual). The code would be insightful. My
intuition was that this would be very application specific, but then I
thought maybe there was some way to tune these things. Now I'm back
to thinking they (approxEqual methods) must be app specific.

I'd imagine so. "Bush" and "Rush" may be either 75% or
100% similar, depending on your point of view.

--
[email]Eric.Sosman (AT) sun (DOT) com[/email]

Back to top
Roedy Green
Guest





PostPosted: Fri Apr 23, 2004 7:49 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

On Fri, 23 Apr 2004 13:36:30 -0400, Eric Sosman <Eric.Sosman (AT) sun (DOT) com>
wrote or quoted :

Quote:

Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it

The way someone might tackle such a problem in the near future is to
have the computer show you "random" (not really) pairs of strings and
ask you how close they are on a scale of 1 to 10. Then a neural net
algorithm works out a scheme to put your intuition into algorithmic
(neural net) form. You never do understand what the rules are, any
more than you understand precisely why you instantly like or dislike
some people.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Back to top
Keith Green
Guest





PostPosted: Sun Apr 25, 2004 8:20 pm    Post subject: Re: string class with method approximatelyEquals(String s) Reply with quote

Eric Sosman <Eric.Sosman (AT) sun (DOT) com> wrote

Quote:
Keith Green wrote:

[email]sarge_chris (AT) hotmail (DOT) com[/email] (Chris) wrote in message news:<568394b1.0404210307.3124c914 (AT) posting (DOT) google.com>...
I'm looking for a string class that will do approximate comparisons.

What do you mean by approximate?


It's difficult to say.

Then it's going to be doubly difficult to program.
The hardest part of software development is not in telling
the computer what to do, but in deciding what to tell it.

Yip. But I have time to think about this. I can work on other
analyses while I figure out what definition of "approximately equals"
might be appropriate for this problem space. Currently this part of
the analysis is being done by sorting and eyeballing the data (about
1.2 GB worth of it, summarized to about 20 MB, synopsized further to a
single worksheet). I don't think this is going to work in the long
run. Working on other stuff first will give me more time to
understand the interrelationships amongst the data.

My intuition is that this is a very common problem and I was really
hoping there might be something tunable, free, and easy. I don't mind
doing it from scratch if that's what's required (probably do this part
in perl, though).

Anyway, thanks to everyone in the thread for the ideas.

k

Back to top
Display posts from previous:   
Post new topic   Reply to topic    AppletTalk.com Forum Index -> Java Help All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.