 |
AppletTalk.com Java discussions newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
*bicker* Guest
|
Posted: Sat Jun 18, 2005 10:26 am Post subject: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
We've isolated a problem we're encountering to java.regex.
We're trying to apply the Pattern ^[^~!@#$%^&*|]+$ to
strings passed in from a method that returns data from an
XML file. Unfortunately, the method that obtains the data
is in the platform we develop on, and we don't have the
source code. The platform provider has reviewed the problem
and has indicated that they feel that it is a bug in Java.
The string you see in the attached Java file is windows-1253
character (decimal) 146. As you can see, it is not in the
Pattern, and since the Pattern is trying to find strings
that do not have any of the indicated characters, we should
get a Pattern match. We don't. Not with that character.
The same seems to be the case with all characters between
131 and 160. Beyond that it seems to be okay.
Does anyone have any insight into why this would occur?
perhaps what we're doing wrong in our Pattern, or what the
platform developer may be doing wrong in their provision of
the data from the XML file?
Thanks!
(I'll paste the Java code here, but understand that we've
found that pasting the data tends to translate it to a
code-page other than the one the customer data was
originally in, so the problem seems to magically "go away".
Of course, it doesn't, because we still have to work with
the actual customer data, and preserve the integrity of the
data the customer actually entered.)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test
{
public static boolean test(String strInput)
{
strInput = "’";
System.out.println("strInput="+strInput);
Pattern patternTitle =
Pattern.compile("^[^~!@#$%^&*|]+$");
Matcher m = patternTitle.matcher(strInput);
boolean result = m.find();
return result;
}
}
|
|
| Back to top |
|
 |
Alan Krueger Guest
|
Posted: Sat Jun 18, 2005 8:28 pm Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
*bicker* wrote:
| Quote: | We've isolated a problem we're encountering to java.regex.
We're trying to apply the Pattern ^[^~!@#$%^&*|]+$ to
strings passed in from a method that returns data from an
XML file. Unfortunately, the method that obtains the data
is in the platform we develop on, and we don't have the
source code. The platform provider has reviewed the problem
and has indicated that they feel that it is a bug in Java.
|
You didn't specify which platform on which you're having the problem.
| Quote: | The string you see in the attached Java file is windows-1253
character (decimal) 146. As you can see, it is not in the
Pattern, and since the Pattern is trying to find strings
that do not have any of the indicated characters, we should
get a Pattern match. We don't. Not with that character.
|
Running the code you posted under jdk1.5.0_03 on Windows XP, it returns
true. What result are you seeing and under which JVM?
|
|
| Back to top |
|
 |
*bicker* Guest
|
Posted: Sat Jun 18, 2005 8:39 pm Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
A Sat, 18 Jun 2005 15:28:09 -0500, Alan Krueger
<wgzkid502 (AT) sneakemail (DOT) com> escribió:
| Quote: | You didn't specify which platform on which you're having the problem.
|
Sorry. This is JDK1.4.2_08 on Windows XP.
| Quote: | Running the code you posted under jdk1.5.0_03 on Windows XP, it returns
true. What result are you seeing and under which JVM?
|
We're getting false.
How did you access the code? If you copied and pasted it
out of the message, you'll indeed get the correct result
(true). It may be necessary to replace the character in
strInput manually (using the key-code Alt-0146).
--
bicker®
|
|
| Back to top |
|
 |
Dale King Guest
|
Posted: Mon Jun 20, 2005 2:59 am Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
*bicker* wrote:
| Quote: | A Sat, 18 Jun 2005 15:28:09 -0500, Alan Krueger
[email]wgzkid502 (AT) sneakemail (DOT) com[/email]> escribió:
You didn't specify which platform on which you're having the problem.
Sorry. This is JDK1.4.2_08 on Windows XP.
Running the code you posted under jdk1.5.0_03 on Windows XP, it returns
true. What result are you seeing and under which JVM?
We're getting false.
|
I get true as well in JDK1.5. Perhaps it was a bug in 1.4.
| Quote: | How did you access the code? If you copied and pasted it
out of the message, you'll indeed get the correct result
(true). It may be necessary to replace the character in
strInput manually (using the key-code Alt-0146).
|
Putting any non-ASCII character into a Java source file without escaping
it is a very bad idea. It means that your code can have different
behavior depending on which machine the code is compiled on. The Java
source file is just a stream of bytes. That stream must be translated
into characters using some character encoding. If you don't specify an
encoding on the command line it will use the default one for the
platform. The particular byte you are using (0x92) will have vastly
different translation between Windoze and Linux. On Windoze that curly
quote will translate to the unicode character 0x2019. On Linux (which
likely uses ISO-8859-1) the 0x92 will get translated into 0x0092 Unicode
which is the PU2 control character.
This is the reason that Sun includes the Native2Ascii program.
If it is a bug in 1.4.2 it is probably that it did not properly handle
the full Unicode set for negated character classes. I see bug 4872664
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4872664> in the
database that sounds exactly like what you descrbe, but it was
supposedly fixed in 1.4.2_04.
--
Dale King
|
|
| Back to top |
|
 |
*bicker* Guest
|
Posted: Mon Jun 20, 2005 9:52 am Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
A Mon, 20 Jun 2005 02:59:03 GMT, Dale King
<DaleWKing (AT) insightbb (DOT) nospam.com> escribió:
| Quote: | Putting any non-ASCII character into a Java source file without escaping
it is a very bad idea. It means that your code can have different
behavior depending on which machine the code is compiled on. The Java
source file is just a stream of bytes. That stream must be translated
into characters using some character encoding.
|
That's why I immediately went to the platform vendor. They
provide us a facility to bring XML data into our
application. They assured me that they read the encoding
from the XML file (in this case "windows-1250") and use
that. I confirmed what you suggested, that 0x92 is
converted to 0x2019, so it seems they're doing the right
thing there.
| Quote: | If it is a bug in 1.4.2 it is probably that it did not properly handle
the full Unicode set for negated character classes. I see bug 4872664
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4872664> in the
database that sounds exactly like what you descrbe, but it was
supposedly fixed in 1.4.2_04.
|
Thank you! I just checked my IDE, and I'm using
JDK1.4.2_01. I have JDK1.4.2_08 installed on my unit test
box, but all our customers have JVMs at 1.4.2_01 and all the
rest of my team is on JDK1.4.2_01. This must be the problem
we're encountering. I'll have everyone upgrade.
--
bicker®
|
|
| Back to top |
|
 |
Dale King Guest
|
Posted: Tue Jun 21, 2005 3:14 am Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
*bicker* wrote:
| Quote: | A Mon, 20 Jun 2005 02:59:03 GMT, Dale King
[email]DaleWKing (AT) insightbb (DOT) nospam.com[/email]> escribió:
Putting any non-ASCII character into a Java source file without escaping
it is a very bad idea. It means that your code can have different
behavior depending on which machine the code is compiled on. The Java
source file is just a stream of bytes. That stream must be translated
into characters using some character encoding.
That's why I immediately went to the platform vendor. They
provide us a facility to bring XML data into our
application. They assured me that they read the encoding
from the XML file (in this case "windows-1250") and use
that. I confirmed what you suggested, that 0x92 is
converted to 0x2019, so it seems they're doing the right
thing there.
|
The character conversion I was talking about had nothing to do with your
XML vendor, but in the java compiler. Your example showed this line:
strInput = "’";
which had the byt 0x92 in it. How the compiler handles that will differ
from one platform to another. In reality you probably don't have that
text in your program, but it comes from your XML parser, but I just
wanted to make sure you knew that anything other than ASCII may not work
like you think it will in a Java source file.
| Quote: | If it is a bug in 1.4.2 it is probably that it did not properly handle
the full Unicode set for negated character classes. I see bug 4872664
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4872664> in the
database that sounds exactly like what you descrbe, but it was
supposedly fixed in 1.4.2_04.
Thank you! I just checked my IDE, and I'm using
JDK1.4.2_01. I have JDK1.4.2_08 installed on my unit test
box, but all our customers have JVMs at 1.4.2_01 and all the
rest of my team is on JDK1.4.2_01. This must be the problem
we're encountering. I'll have everyone upgrade.
|
In the future when you suspect a bug you might want to do what I did and
search the bug database.
--
Dale King
|
|
| Back to top |
|
 |
*bicker* Guest
|
Posted: Tue Jun 21, 2005 11:04 am Post subject: Re: Problem with java.regex.Matcher? - Test.java (0/1) |
|
|
A Tue, 21 Jun 2005 03:14:34 GMT, Dale King
<DaleWKing (AT) insightbb (DOT) nospam.com> escribió:
| Quote: | Thank you! I just checked my IDE, and I'm using
JDK1.4.2_01. I have JDK1.4.2_08 installed on my unit test
box, but all our customers have JVMs at 1.4.2_01 and all the
rest of my team is on JDK1.4.2_01. This must be the problem
we're encountering. I'll have everyone upgrade.
In the future when you suspect a bug you might want to do what I did and
search the bug database.
|
To be honest, I was still convinced that either we or our
vendor was doing something wrong. I'll be sure to not make
such a hasty conclusion again! <grin>
--
bicker®
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|