PDA

View Full Version : Definitive UTF-8 response available ?


3wi
05-08-2005, 09:47 AM
Hi,

I have some problems with SOAP messages and "é" (french) caracter, translated in "é" (é).

More generally, I found a lot of similar questions in the forum.

Can anyone from Laszlo could tell us if we can send back accentuated caracters from Laszlo client to server through SOAP messages without this kind of problems ?

I forgot : I tried with 3.01b, and latest 3.0 version.

Regards,

hqm
05-08-2005, 12:36 PM
Laszlo sends and receives characters in UTF-8 encoding. However, when it submits data via the GET or POST method, the Flash player does not indentify the charset encoding in the Content-Type HTTP header, and there is no way to add the headers from the client side, so some web servers will assume the content is in some other default charset, and that might be the problem you are seeing.

What server are you using for the LPS and for SOAP?

3wi
05-08-2005, 12:45 PM
I'm using the standard Tomcat installation for LPS.

Regarding SOAP, I'm using Intersystems Caché's application server. I made trials directly from an independant SOAP client (XmlSpy) to Caché, and the accent caracters are fine interpreted.

Could you detail me the problem : is it between the Flash client and TomCat, or between Tomcat and the SOAP server ?

I made some traces at the Soap server level, that's where I found the é encoding instead of "é". But I don't know where this error took place.

hqm
05-08-2005, 01:10 PM
The way the information flows in a SOAP request is that the client in the Flash player makes a request, which is submitted via HTTP to the LPS server. The server then proxies the request to the SOAP back end server, accepts the response, and then sends it back to the Flash client.

Do you have a minimal size test case you can send me that demonstrates the problem? I would like to try to reproduce it using our system. We have some tests written using the Apache SOAP service library, and I would like to try it with that. Maybe if you have a simple SOAP service which just echoes the request, or something very simple like that. I just want to try to run the same example as you are running, because I know the maintainer here of the SOAP library went through some effort to make sure that UTF-8 charsets were working.

3wi
05-08-2005, 01:57 PM
You explanations gave me the following idea :
adding "-Dfile.encoding=UTF-8" to the "set JAVA_OPTS" line in catalina.bat.

And ... that solved my problem concerning the POSTing of "é" throught my SOAP server : I succeeded to store "é" in the database without encoding problems.

But it's not resolved completely, because the GET is not correct. Even if I verified that the WebService returns a "é", Laszlo application displays a not readable caracter (a square).

Here is an extract from the debugging trace :
«LzDataElement#62| <Field Value="�"/>

hqm
05-08-2005, 02:13 PM
Are you sure that your web service is encoding it's response in UTF-8? Can you get a raw HTTP response to see what the Content-Type is, and also to verify by looking at the bytes that UTF-8 is being sent, rather than ISO-8859-1

3wi
05-08-2005, 03:01 PM
The Content-Type of my soap response is text/xml; charset=UTF-8 (seen with IEWatch).

I have no idea right know how to analyse the bytes of the response ! A suggestion ?

hqm
05-08-2005, 03:46 PM
A quick check would be if you look at the raw response and there is a single character in there for your accented char, it is probably ISO-8859-1. If there are two or three chars representing it, it is probably UTF-8.

If you have a command line utility like "od" in Unix, you can get a hex dump of the file, and look at the characters there, or you can some other Hexadecimal editor, or use Emacs Hexl mode.

3wi
05-08-2005, 04:03 PM
Sorry for the question, but how can I get the raw response ?

I can have the response in IE or firefox for instance, but the raw response ?

hqm
05-08-2005, 04:51 PM
Sometimes you can use a command line program like "wget" to make an HTTP request and then save the response (including headers) to a file.

There are some other command line clients that people use, but wget is the one I am familiar with.

3wi
05-09-2005, 09:44 AM
I made some tests with different text editors with this snippet :
<canvas height="50" width="500">
<simplelayout axis="y"/>
<text text="é"/>
<text text="é"/>
<text text="& # 233;" /> <!-- I let the blanks to be able to see the meta caracters in this thread -->
</canvas>


Ouput results are :

- For Eclipse Plug-in :
[]
é
é

- For XMLSpy :
é
é
é


I tried with UltraEdit, but was unable to write this code in UTF-8 (the [] replacing the "é" automatically).


For me XMLSpy is the winner.
Is someone understanding something to this ?

What is the best IDE for coding Laslo XML files ?

k07032
11-30-2005, 07:09 AM
First, I'd like to thank you guys for giving out good idea to work around my problem in displaying foreign characters in Laszlo. However, I'm still having some issues and hope someone can give a hand.

In my web application, unfortunately, I have to use GET method, and I have a database that stores all the input coming from a Laszlo page. In the middle, I have tomcat sits between.

First scenario: Before, I am unable to display any foreign characters correctly after they are saved to the database. The characters are saved correctly in the database. However, the characters are displayed as non-readable for Single-byte and empty space for Double-byte. For instance, enter the character "é" in the Laszlo edittext and pass it to the controller and then save it off to the database. The character "é" get saved in the database. However, when the character is retrieved from the database and get displayed on the Laszlo page, I get "é".

Second scenario: Then, taking in the suggestions posted in here, I added the line "-Dfile.encoding...." to the catalina.bat and fixed my server code according to the suggestions posted in this page, http://wiki.openlaszlo.org/JSPUTF8, I finally get my Laszlo page to display the foreign characters (both SBCS and DBCS) correctly. However, now the characters that get saved to the database become incorrect. For instance, enter the character "é" in the Laszlo edittext and pass it to the controller and then save it off to the database. Instead of saving the character as "é", "é" get saved into the database. But, when the character is retrieved from the database and, surprisingly, the character is displayed correctly as "é" on the Laszlo page.

Third scenario: I also tried this. Keep the server code same as the second scenario and only removing the line "-Dfile.encoding..." from catalina.bat, the character, "é", get saved correctly into the database. However, when the character is retrieved it from the database, unlike the first scenario, I'm not getting the characters "é" but a square as "�" just like 3wi described.

So, is it possible to use GET method to accomplish BOTH saving foreign characters correctly into a database AND displaying the foreign characters correctly on a Laszlo page?
Also, I have recently upgraded to use Tomcat 5 and I can't find the catalina.bat file anymore under the bin directory. Now there are two executables tomcat5.exe and tomcat5w.exe for starting the service. So, what do I need to do about adding the line "-Dfile.encoding..." to the catalina.bat?

Appreciate any help.

illogic_code
09-04-2006, 06:35 PM
well... it appears that this will not be that "definitive" thread...

d~l
09-05-2006, 01:17 AM
more here .. (http://search.gmane.org/?query=UTF-8&group=gmane.comp.java.openlaszlo.user) .. and here .. (http://search.gmane.org/?query=UTF-8&group=gmane.comp.java.openlaszlo.devel)

illogic_code
09-07-2006, 12:57 PM
d~l, thank you very very much about these tips and links.

i already signed up on both lists and now have tons of info of laszlo coming on email. cool! :D

btw, i could now put everything working here to me very fine. U gave me a link before (laszlo user list on web) that have my save there. Its one function that translate the string that is comming. So i make a little change on it to become more generic and put on utils class. Like a friend sad: "working like coconuts!!!" lol.

my configuration here is:
SOLO app (proxied = false)
using datasets (not javarpc but i think this will work there too)
firebird 1.5 databank
only servlets (no jsps)
lps 3.3.3


first thing: i saw on a mail list that there are a bug on setQueryType("POST") that dont change the state of requests. I use querytype="POST" in the declaration of the dataset and this work fine. (the guy on mailist said that on this way work for here now)
now that for sure we have a POST...


public static String String2Charset(String raw, String szCharsetDst) {
byte[] myBytes = raw.getBytes();
// Charset decoder
CharsetDecoder dec = Charset.forName(szCharsetDst).newDecoder();

// Get the result as Bytes and decode it from szCharsetDst charset parameter
String decodedString = null;
try {
// Decode from charset parameter and create string
decodedString = dec.decode(ByteBuffer.wrap(myBytes)).toString();
} catch (CharacterCodingException e) {
e.printStackTrace();
return null;
}
return decodedString;
}


very sorry for dont put who made this function first coz i really dont know now. Btw, i modified it a bit, so i have 20% of the credits. lol.

i use it on my code on these way:


String xml = request.getParameter("dsRQ");
if (xml != null)
{
xml = Util.String2Charset(xml, "UTF-8");
}
else
{
logger.info("error. null XML.");
}


k07032: i think that these approach works ok to GET method too. Give a try there and report to us?

a strange thing happened here too is after i converted the string, all is fine, but when i tried to save on database i got error like "cannot transliterate between characterset" stuff. I think that perhaps is my configuration of the hibernate..dunno.. but i needed to put all the dataset as "NONE" for the charset, and then all work fine.

(phew... what a huge text)

hope it help somebody.

[]'s

Luís.

illogic_code
09-07-2006, 01:05 PM
ooops.
with this:

"but i needed to put all the dataset as "NONE" for the charset, and then all work fine."

i mean: the dataBASE (not dataset) on firebird.


sry