AppletTalk.com Forum Index AppletTalk.com
Java discussions newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

String comparison

 
Post new topic   Reply to topic    AppletTalk.com Forum Index -> Java Help
View previous topic :: View next topic  
Author Message
Québec
Guest





PostPosted: Wed Nov 19, 2003 5:26 pm    Post subject: String comparison Reply with quote



I would like to have a clue on how to compare two strings and trying to
get for example:

"msedcationoflarynhill" to match "the_miseducation_of_lauryn_hill".

Jean


Back to top
Québec
Guest





PostPosted: Wed Nov 19, 2003 8:56 pm    Post subject: Re: String comparison Reply with quote



Using Array.sort() I am up to this.
=========
public class Array1 {
static String ssource = "msed_ca_ti_ono_flarynhill";
static char[] src = ssource.toCharArray();
static char[] titre = {'t', 'h', 'e', '_', 'm', 'i', 's', 'e', 'd', 'u',
'c', 'a', 't', 'i', 'o', 'n', '_', 'o', 'f', '_', 'l', 'a', 'u', 'r', 'y',
'n', '_', 'h', 'i', 'l', 'l'};


---------- Java1.3 ----------
the_miseducation_of_lauryn_hill
____aacdeefhhiiilllmnnoorsttuuy
____aacdefhiilllmnnoorsty
Normal Termination
Output completed (0 sec consumed).


Back to top
Alex Hunsley
Guest





PostPosted: Thu Nov 20, 2003 12:08 pm    Post subject: Re: String comparison Reply with quote



Québec wrote:

Quote:
I would like to have a clue on how to compare two strings and trying to
get for example:

"msedcationoflarynhill" to match "the_miseducation_of_lauryn_hill".

Jean

What you're wanting is not normal string comparison, but a form of
inexact string matching. It's like fuzzy matching, but I'm not sure if
fuzzy matching is exactly the right term. Inexact string matching isn't
boolean - doesn't return "true" or "false" - but returns a confidence
factor, e.g. 70%, which you then interpret as you want.

I've googled around for an existing java class to do this but not turned
up much, maybe I'll write one...

See http://makeashorterlink.com/?M16024896 for some details/ideas on
inexact string matching.

alex



Back to top
Alex Hunsley
Guest





PostPosted: Thu Nov 20, 2003 3:26 pm    Post subject: Re: String comparison Reply with quote

Thomas Schodt wrote:
Quote:
Make a regex from the master string.

eg.

import java.util.regex.Pattern;

class FuzzyMatch {
public static void main(String[] argv) {
String str1 = "msedcationoflarynhill";
String str2 = "the_miseducation_of_lauryn_hill";

char[] va = str2.toCharArray();
StringBuffer regex = new StringBuffer();
for (int i=0; i regex.append(va[i]);
regex.append('?');
}
if (Pattern.matches(regex.toString(),
str1.subSequence(0,str1.length()-1)))
System.out.println("Match");
else
System.out.println("No match");
}
}

Isn't this quite a limited way to do it? You're basically allowing an
extra random char between each matched real char, and it's a boolean
decision you have, not a goodness of match score.


Back to top
Alex Hunsley
Guest





PostPosted: Thu Nov 20, 2003 3:27 pm    Post subject: Re: String comparison Reply with quote

Alex Hunsley wrote:

Quote:
Québec wrote:

I would like to have a clue on how to compare two strings and trying to
get for example:

"msedcationoflarynhill" to match "the_miseducation_of_lauryn_hill".

Jean


What you're wanting is not normal string comparison, but a form of
inexact string matching. It's like fuzzy matching, but I'm not sure if
fuzzy matching is exactly the right term. Inexact string matching isn't
boolean - doesn't return "true" or "false" - but returns a confidence
factor, e.g. 70%, which you then interpret as you want.

I've googled around for an existing java class to do this but not turned
up much, maybe I'll write one...

See http://makeashorterlink.com/?M16024896 for some details/ideas on
inexact string matching.

alex

OK, I've knocked together a working sketch of a very simlpe character
proximity based matching alogrithm:


/**
* InexactStringMatcher class sketch.
* Use freely, enjoy kittens
*
* @author Alex Hunsley
*/

public class InexactStringMatcher
{

public int getMatchScore(String str1,
String str2,
String spaceChars) {
str1 = removeChars(str1, spaceChars).toLowerCase();
str2 = removeChars(str2, spaceChars).toLowerCase();
return getMatchScore(str1, str2);
}

/**
* returns the given string with all chars in charsToRemove
* removed from it
*/
public String removeChars(String str,
String charsToRemove)
{
for (int charIndex = 0; charIndex < charsToRemove.length();
charIndex ++) {
char charToRemove = charsToRemove.charAt(charIndex);

StringBuffer buf = new StringBuffer();

int strIndex = 0;
int charOccurencePos = -1;
do {
charOccurencePos = str.indexOf(charToRemove);
if (charOccurencePos >= 0) {
buf.append(str.substring(0, charOccurencePos));
str = str.substring(charOccurencePos + 1,
str.length());
}
} while (charOccurencePos >= 0);

// now put remainder of string in buf
buf.append(str);
// set str to be our newly made string with possibly
// chars removed
str = buf.toString();
}
return str;
}

/**
* See how closely two strings match each other
* by checking for proximity of similar characters.
* the lower the return value, the closer the match
*/
public int getMatchScore(String str1,
String str2) {
int scoreA = getOneWayMatchScore(str1, str2);
int scoreB = getOneWayMatchScore(str2, str1);

// System.out.println("___scoreA="+scoreA
// + " scoreB="+scoreB);

// average the two scores
// (could always take the minimum or maximum instead,
// the average seem to work)

return (int) ((scoreA + scoreB) / 2);
}

public int getOneWayMatchScore(String str1, String str2)
{
int totalScore = 0;
int str1Len = str1.length();
int str2Len = str2.length();
for (int i = 0; i < str1Len; i++) {
char c = str1.charAt(i);
int proximity = findCharProximity(str2, i, c);
totalScore += proximity;
}
return totalScore;
}

private int findCharProximity(String str,
int position,
char c)
{
int strlen = str.length();
int closestCharDist = -1;
boolean doRightSearch = true;
int startPosition = position;

// may need to fix start position if it's past end
// of string!
if (position >= str.length()) {
doRightSearch = false;
startPosition = str.length() - 1;
}

// work left along the string from the given position,
// looking for matching char
for (int i = startPosition; i >= 0; i--) {
if (str.charAt(i) == c) {
int thisCharDist = position - i;
if (thisCharDist < closestCharDist
Quote:
| closestCharDist == -1) {
closestCharDist = thisCharDist;

break;
}
}
}

if (doRightSearch == true) {
// work right along the string from the give position,
// looking for matching char
for (int i = position; i < strlen; i++) {
if (str.charAt(i) == c) {
int thisCharDist = i - position;
if (thisCharDist < closestCharDist
Quote:
| closestCharDist == -1) {
closestCharDist = thisCharDist;

break;
}
}
}
}
return closestCharDist;
}

public static void main(String[] args)
{
InexactStringMatcher matcher
= new InexactStringMatcher();
System
.out
.println("score from closely matched items is "
+matcher
.getMatchScore("msedcationoflarynhill",
"the_miseducation_of_lauryn_hill",
"_ "));
System
.out
.println("score from not so matched item is "
+matcher
.getMatchScore("msedcationoflarynhill",
"another_track_name_entirely",
"_ "));
}
}



.... try running the class, the main method shows you that you get a
better (lower!) score for the more similar strings.

alex



Back to top
Alex Hunsley
Guest





PostPosted: Thu Nov 20, 2003 4:11 pm    Post subject: Re: String comparison Reply with quote

Alex Hunsley wrote:

Quote:
OK, I've knocked together a working sketch of a very simlpe character
proximity based matching alogrithm:
[snip code]


Aeeeiii! what is it with my idnentation getting munged? Tabs I bet.



Back to top
Québec
Guest





PostPosted: Thu Nov 20, 2003 4:41 pm    Post subject: Re: String comparison Reply with quote


"Alex Hunsley" <lard (AT) tardis (DOT) ed.ac.molar.uk> wrote

Quote:
Thomas Schodt wrote:
Make a regex from the master string.

eg.

import java.util.regex.Pattern;

class FuzzyMatch {
public static void main(String[] argv) {
String str1 = "msedcationoflarynhill";
String str2 = "the_miseducation_of_lauryn_hill";

char[] va = str2.toCharArray();
StringBuffer regex = new StringBuffer();
for (int i=0; i regex.append(va[i]);
regex.append('?');
}
if (Pattern.matches(regex.toString(),
str1.subSequence(0,str1.length()-1)))
System.out.println("Match");
else
System.out.println("No match");
}
}

I did not touched java regex yet.


Is this a 1.4 java application?

---------- Javac1.3 ----------
FuzzyMatch.java:1: cannot resolve symbol
symbol : class Pattern
location: package regex
import java.util.regex.Pattern;
^
FuzzyMatch.java:15: cannot resolve symbol
symbol : method subSequence (int,int)
location: class java.lang.String
str1.subSequence(0,str1.length()-1)))
^
FuzzyMatch.java:14: cannot resolve symbol
symbol : variable Pattern
location: class FuzzyMatch
if (Pattern.matches(regex.toString(),
^
3 errors
Normal Termination
Output completed (3 sec consumed).



Back to top
Québec
Guest





PostPosted: Thu Nov 20, 2003 7:55 pm    Post subject: Re: String comparison Reply with quote

It does work . :-)



"Alex Hunsley" <lard (AT) tardis (DOT) ed.ac.molar.uk> wrote

Quote:
Alex Hunsley wrote:

OK, I've knocked together a working sketch of a very simlpe character
proximity based matching alogrithm:
[snip code]

Aeeeiii! what is it with my idnentation getting munged? Tabs I bet.





Back to top
Roedy Green
Guest





PostPosted: Thu Nov 20, 2003 8:17 pm    Post subject: Re: String comparison Reply with quote

On Thu, 20 Nov 2003 12:08:07 +0000, Alex Hunsley
<lard (AT) tardis (DOT) ed.ac.molar.uk> wrote or quoted :

Quote:
I've googled around for an existing java class to do this but not turned
up much, maybe I'll write one...

see http://mindprod.com/jgloss/soundex.html

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Back to top
Alex Hunsley
Guest





PostPosted: Thu Nov 20, 2003 10:43 pm    Post subject: Re: String comparison Reply with quote

Québec wrote:
Quote:
It does work . Smile

Excellent.
It's a start, anyway.
I forgot to say two things about the code I posted:

1) it ignores the case of the strings, as you'd want "LAURYN" to match
"lauryn"
2) you can specify a string containing 'space' characters, which are
character you're not interesting in matching on. In my example code, I
provided the string " _", i.e. a space char and the underline _, as
these would appear to be not important characters when you're matching
mp3 file names. Add any other characters you like to the " _" string as
appropriate.

alex


Back to top
Alex Hunsley
Guest





PostPosted: Fri Nov 21, 2003 10:20 am    Post subject: Re: String comparison Reply with quote

Thomas Schodt wrote:
Quote:


Isn't this quite a limited way to do it? You're basically allowing an
extra random char between each matched real char, and it's a boolean
decision you have, not a goodness of match score.


Huh?
The question mark does not match an extra random char, it says the previous
char was optional.

Ah yes, my bad, I was thinking of something else.

Quote:
It was my understanding that the OP just wanted match or not match.

I thought she was merely presenting an typical example, and that she
wanted an algorithm for the general situation...

Quote:
There is one caveat tho - an empty string will match. o_O

Oh no! :)


Back to top
Alex Hunsley
Guest





PostPosted: Fri Nov 21, 2003 2:33 pm    Post subject: Re: String comparison Reply with quote

Alex Hunsley wrote:
Quote:
Québec wrote:

It does work . Smile

I've refined it.
I realised that "lauryn" would not match "zzzzzzzzzzzzzzzzzzzzzzlauryn"
very well, which it proboably should.
The solution is mentioned in the comment at the top of the code as it
now stands:



/**
* InexactStringMatcher class sketch v2.
* History:
* v1. used total of char proximity offsets for score.
* v2. uses the total of differences between successive char prox.
offsets,
* means that "aTrackName" will get low score for
"zzzzzzzzzzzzaTackNme"
* which is desirable.
* Also, have realised that if one of the match strings is empty/close
* to empty, you may get falsely low scores (e.g. score=0 for
* empty string)
*
* Use freely, enjoy kittens
*
* @author Alex Hunsley
*/

public class InexactStringMatcher
{
// hack for the moment - char proximity will have this
// number for case where no char matches at all
private static int NO_CHAR_MATCH = 65535;
private static int A_HIGH_SCORE = 1000;
private static int BEST_SCORE = 0;

public int getMatchScore(String str1,
String str2,
String spaceChars) {
str1 = removeChars(str1, spaceChars).toLowerCase();
str2 = removeChars(str2, spaceChars).toLowerCase();
return getMatchScore(str1, str2);
}

/**
* returns the given string with all chars in charsToRemove
* removed from it
*/
public String removeChars(String str,
String charsToRemove)
{
for (int charIndex = 0; charIndex < charsToRemove.length();
charIndex ++) {
char charToRemove = charsToRemove.charAt(charIndex);

StringBuffer buf = new StringBuffer();

int strIndex = 0;
int charOccurencePos = -1;
do {
charOccurencePos = str.indexOf(charToRemove);
if (charOccurencePos >= 0) {
buf.append(str.substring(0, charOccurencePos));
str = str.substring(charOccurencePos + 1,
str.length());
}
} while (charOccurencePos >= 0);

// now put remainder of string in buf
buf.append(str);
// set str to be our newly made string with possibly
// chars removed
str = buf.toString();
}
return str;
}

/**
* See how closely two strings match each other
* by checking for proximity of similar characters.
* the lower the return value, the closer the match
*/
public int getMatchScore(String str1,
String str2) {
if (str1 == null || str2 == null) {
return A_HIGH_SCORE;
}
if (str1.equals(str2)) {
return BEST_SCORE;
}

int scoreA = getOneWayMatchScore(str1, str2);
int scoreB = getOneWayMatchScore(str2, str1);

// System.out.println("___scoreA="+scoreA
// + " scoreB="+scoreB);

// average the two scores
// (could always take the minimum or maximum instead,
// the average seem to work)
return (int) ((scoreA + scoreB) / 2);
}

public int getOneWayMatchScore(String str1, String str2)
{
int totalScore = 0;
int str1Len = str1.length();
int str2Len = str2.length();
boolean havePreviousProximity = false;
int prevProximity = 0;

for (int i = 0; i < str1Len; i++) {
char c = str1.charAt(i);
int proximity = findCharProximity(str2, i, c);

if (proximity != NO_CHAR_MATCH) {
if (havePreviousProximity == true) {
int proxDifference
= proximity - prevProximity;
totalScore += Math.abs(proxDifference);
// System.out.println("char="+c
// + " prox="+proximity
// + " *** proxDiff="
// + proxDifference);
}
prevProximity = proximity;
havePreviousProximity = true;
}
}
return totalScore;
}

private int findCharProximity(String str,
int position,
char c)
{
int strlen = str.length();
int closestCharDist = NO_CHAR_MATCH;
boolean doRightSearch = true;
int startPosition = position;

// may need to fix start position if it's past end
// of string!
if (position >= str.length()) {
doRightSearch = false;
startPosition = str.length() - 1;
}

// work left along the string from the given position,
// looking for matching char
for (int i = startPosition; i >= 0; i--) {
if (str.charAt(i) == c) {
int thisCharDist = position - i;
if (thisCharDist < closestCharDist
Quote:
| closestCharDist == NO_CHAR_MATCH) {
// we negate the distance, as we're looking

// to the left
closestCharDist = - thisCharDist;
break;
}
}
}

if (doRightSearch == true) {
// work right along the string from the give position,
// looking for matching char
for (int i = position; i < strlen; i++) {
if (str.charAt(i) == c) {
int thisCharDist = i - position;
if (thisCharDist < closestCharDist
Quote:
| closestCharDist == NO_CHAR_MATCH) {
closestCharDist = thisCharDist;

break;
}
}
}
}
return closestCharDist;
}

public static void main(String[] args)
{
InexactStringMatcher matcher
= new InexactStringMatcher();
System
.out
.println("score from closely matched items is "
+matcher
.getMatchScore("msedcationoflarynhill",
"the_miseducation_of_lauryn_hill",
"_ "));

System
.out
.println("score from another closely matched item is "
+matcher
.getMatchScore("msedcationoflarynhill",
"zzzzzzzzzzzzzzzzzzzzzzzthe_miseducation_of_lauryn_hill",
"_ "));
System
.out
.println("score from not so matched item is "
+matcher
.getMatchScore("msedcationoflarynhill",
"another_track_name_entirely",
"_ "));

System
.out
.println("score from not so matched item is "
+matcher
.getMatchScore("msedcationoflarynhill",
"",
"_ "));
System
.out
.println("score from not so matched item is "
+matcher
.getMatchScore("",
"msedcationoflarynhill",
"_ "));
}
}



Back to top
Québec
Guest





PostPosted: Mon Nov 24, 2003 3:41 pm    Post subject: Re: String comparison Reply with quote

I did not knowd it was called an edit diastance ;-)

http://citeseer.nj.nec.com/47440.html


Back to top
Québec
Guest





PostPosted: Wed Nov 26, 2003 4:43 pm    Post subject: Re: String comparison Reply with quote


Thanks to all of you.
Good Enough for me.
==========


public class TempString {

public static void main(String[] args) {
String str1 = "Écrire_la date-dans un_fichier.";
String str2 = "Écrirela dat w-dans _n_fichier.";
int len1 = str1.length();
int len2 = str2.length();
int len = ((len1> len2)? len2 :len1) -1;

StringBuffer strBuf = new StringBuffer(len);
StringBuffer strBuf2 = new StringBuffer(len);

int j = 0;

try{
for (int i = j ;i < len ;i++)
{
for (j =i; j {
if ((str1.substring(i, j)).equals(str2.substring(i, j)))
{}else{
strBuf2.append("*");
break;
}
}
strBuf.append(str1.substring(i, j-1));
strBuf2.append(str1.substring(i, j-1));
i = j-1;
}
}catch(StringIndexOutOfBoundsException e){
System.out.println("n==============" + strBuf);
}
String tmp = strBuf2.toString().trim();
System.out.println("n" + tmp.substring(1,strBuf2.length()));
System.out.println(str1);
double longueur = (double)strBuf.length()/len1;
System.out.println("n" + longueur);

}
}//end of class
============

import java.util.*;

public class MyCollection{

public static double comp(String str1, String str2){

int len1 = str1.length();
int len2 = str2.length();

int len = (len1 > len2) ? len1 : len2;

Collection c1 = new ArrayList();
Collection c2 = new ArrayList();

for (int k = 0;k c1.add(String.valueOf(str1.charAt(k)));
for (int m = 0;m c2.add(String.valueOf(str2.charAt(m)));

Collection smallest = new ArrayList();


if(len1 < len2){
c2.retainAll(c1);
smallest = c1;
}else{
c1.retainAll(c2);
smallest = c2;
}

Iterator it = smallest.iterator();
while(it.hasNext())
System.out.print(it.next());

return ((double)smallest.size()/(double)len);
}

public static void main(String[] args) {
String str3 = "the_miseducation_of_lauryn_hill";
String str4 = "msedcationoflarynhill";
System.out.println("n" + comp(str3, str4));
}
}


Back to top
newszilla
Guest





PostPosted: Thu Nov 27, 2003 10:39 am    Post subject: Re: String comparison Reply with quote


"Roedy Green" <roedy (AT) seewebsite (DOT) com> wrote

Quote:

see http://mindprod.com/jgloss/soundex.html


Er, Roedy, you might want to update that page - soundex was patented in
1918, 20 years before Knuth was born!

http://www.avotaynu.com/soundex.html

Henk van Voorthuijsen



Back to top
Display posts from previous:   
Post new topic   Reply to topic    AppletTalk.com Forum Index -> Java Help All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.