[Laszlo-dev] UPDATED For Review: Change 20090105-hqm-R Summary: fix url encoding problem
P T Withington
ptw at laszlosystems.com
Tue Jan 6 10:19:37 PST 2009
I think there are several points of confusion here:
a) as2 did not have EncodeURIComponent, it's version of `escape` was
enough like EncodeURIComponent, that I made an alias, for _only_ as2.
b) In the debugger, where I use EncodeURIComponent, I really want that
primitive, because it needs to work early on, cannot depend on other
LFC bits like browser.
c) Browser.urlEscape is just plain sloppy, because it does not make
the distinction that ES3 does between encodeURI (where the URI
reserved characters are _not_ escaped) and encodeURIComponent (where
the URI reserved characters _are_ escaped).
I believe a) and b) should be left as is, they do not affect the LZX
programmer
c) Needs sorting out.
1) We should deprecate the name urlEscape. It is non-standard.
2) We should have both encodeURI and encodeURIComponent. They are
both necessary.
3) We should use the platform implementation when it works, and work
around it when it does not.
For DHTML, we just use the platform.
For swf8, my escape mapping, which works for the debugger is not
sufficient for LZX. Perhaps we can use the same mechanism we used for
RegEx to invoke JS, which is known to work?
For swf9, I can't tell if there is a bug in the platform
implementation or not. It sounds like the problem was simply that we
weren't calling the correct platform function (we were calling escape,
as we did in swf8, when we should have been calling encodeURI).
On 2009-01-06, at 13:04EST, André Bargull wrote:
> "encodeURIComponent" is described in detail in "15.1.3 URI Handling
> Function Properties" of ECMA-262 3rd edt.
>
>
> On 1/6/2009 7:01 PM, André Bargull wrote:
>> The built-in "encodeURIComponent" will perform properly on chars >
>> 256, the actual problem was that "escape" was used ("escape" in
>> swf8 performs utf8-encoding, much like "encodeURIComponent" in
>> browser javascript. But there is one exception for swf8: it doesn't
>> encode alphanum chars, whereas in javascript's "encodeURIComponent"
>> alphanum _and_ "- _ . ! ~ * ' ( )") aren't encoded.)
>> So the main thing is to use "encodeURIComponent" instead of
>> "escape" for url-escaping. Next thing on a lower priority: how to
>> handle "- _ . ! ~ * ' ( )" ? Do it like swf8 and encode them too.
>> Or be compatible to the ECMAScript specification and don't encode
>> them..
>>
>>
>>
>> On 1/6/2009 6:46 PM, Henry Minsky wrote:
>>> OK, I'd be happy to use
>>>
>>> - in swf8: escape
>>> - in dhtml+swf9: encodeURIComponent [1] with a few modifications
>>> so it
>>> also encodes "- _ . ! ~ * ' ( )" into the appropriate UTF-8 encoding
>>>
>>> The original bug report was complaining about a failure of unicode
>>> encoding for foreign languages (Arabic? Hebrew? I am not sure), in
>>> swf9 so we need to make sure that the built-in encodeURIComponent in
>>> swf9 (and dhtml)
>>> are encoding unicode chars with values > 256 properly.
>>>
>>>
>>>
>>>
>>> On Tue, Jan 6, 2009 at 12:30 PM, André Bargull <andre.bargull at udo.edu
>>> > wrote:
>>>> Even with a constant table, it's still about 48 times slower on
>>>> my machine.
>>>> And Tucker's quote doesn't imply you shouldn't take an obvious,
>>>> more
>>>> effective alternative.
>>>>
>>>>
>>>> On 1/6/2009 6:13 PM, Henry Minsky wrote:
>>>>> That is a good question. I really take Tucker's quotation to
>>>>> heart
>>>>> though,
>>>>>
>>>>> Rules of Optimization:
>>>>> Rule 1: Don't do it.
>>>>> Rule 2 (for experts only): Don't do it yet.
>>>>> — M.A. Jackson
>>>>> "More computing sins are committed in the name of efficiency
>>>>> (without
>>>>> necessarily
>>>>> achieving it) than for any other single reason - including blind
>>>>> stupidity."
>>>>> — W.A. Wulf
>>>>> "We should forget about small efficiencies, say about 97% of the
>>>>> time:
>>>>> premature
>>>>> optimization is the root of all evil."
>>>>> — Donald Knuth
>>>>> "The best is the enemy of the good."
>>>>> — Voltaire
>>>>>
>>>>>
>>>>> Even if it is slow, it doesn't seem like a typical application
>>>>> would spend
>>>>> that much absolute time encoding data. If it is a noticable
>>>>> performance issue, I bet we could
>>>>> speed it up a lot with a lookup table hex[value]
>>>>> to encode the two digit sequences instead of a function call to
>>>>> value.toString().toUpperCase().
>>>>>
>>>>>
>>>>> var hex = {
>>>>> "%00", "%01", "%02", "%03", "%04", "%05", "%06", "%07",
>>>>> "%08", "%09", "%0a", "%0b", "%0c", "%0d", "%0e", "%0f",
>>>>> "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
>>>>> "%18", "%19", "%1a", "%1b", "%1c", "%1d", "%1e", "%1f",
>>>>> "%20", "%21", "%22", "%23", "%24", "%25", "%26", "%27",
>>>>> "%28", "%29", "%2a", "%2b", "%2c", "%2d", "%2e", "%2f",
>>>>> "%30", "%31", "%32", "%33", "%34", "%35", "%36", "%37",
>>>>> "%38", "%39", "%3a", "%3b", "%3c", "%3d", "%3e", "%3f",
>>>>> "%40", "%41", "%42", "%43", "%44", "%45", "%46", "%47",
>>>>> "%48", "%49", "%4a", "%4b", "%4c", "%4d", "%4e", "%4f",
>>>>> "%50", "%51", "%52", "%53", "%54", "%55", "%56", "%57",
>>>>> "%58", "%59", "%5a", "%5b", "%5c", "%5d", "%5e", "%5f",
>>>>> "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67",
>>>>> "%68", "%69", "%6a", "%6b", "%6c", "%6d", "%6e", "%6f",
>>>>> "%70", "%71", "%72", "%73", "%74", "%75", "%76", "%77",
>>>>> "%78", "%79", "%7a", "%7b", "%7c", "%7d", "%7e", "%7f",
>>>>> "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87",
>>>>> "%88", "%89", "%8a", "%8b", "%8c", "%8d", "%8e", "%8f",
>>>>> "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97",
>>>>> "%98", "%99", "%9a", "%9b", "%9c", "%9d", "%9e", "%9f",
>>>>> "%a0", "%a1", "%a2", "%a3", "%a4", "%a5", "%a6", "%a7",
>>>>> "%a8", "%a9", "%aa", "%ab", "%ac", "%ad", "%ae", "%af",
>>>>> "%b0", "%b1", "%b2", "%b3", "%b4", "%b5", "%b6", "%b7",
>>>>> "%b8", "%b9", "%ba", "%bb", "%bc", "%bd", "%be", "%bf",
>>>>> "%c0", "%c1", "%c2", "%c3", "%c4", "%c5", "%c6", "%c7",
>>>>> "%c8", "%c9", "%ca", "%cb", "%cc", "%cd", "%ce", "%cf",
>>>>> "%d0", "%d1", "%d2", "%d3", "%d4", "%d5", "%d6", "%d7",
>>>>> "%d8", "%d9", "%da", "%db", "%dc", "%dd", "%de", "%df",
>>>>> "%e0", "%e1", "%e2", "%e3", "%e4", "%e5", "%e6", "%e7",
>>>>> "%e8", "%e9", "%ea", "%eb", "%ec", "%ed", "%ee", "%ef",
>>>>> "%f0", "%f1", "%f2", "%f3", "%f4", "%f5", "%f6", "%f7",
>>>>> "%f8", "%f9", "%fa", "%fb", "%fc", "%fd", "%fe", "%ff"
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 6, 2009 at 12:07 PM, André Bargull <andre.bargull at udo.edu
>>>>> >
>>>>> wrote:
>>>>>> On 1/6/2009 5:49 PM, Henry Minsky wrote:
>>>>>>> I like having a single common url escape routine across all
>>>>>>> runtimes,
>>>>>>> it makes debugging simpler.
>>>>>> Even if it is _much_ slower? The following testcase (in swf8)
>>>>>> was about
>>>>>> 50
>>>>>> times slower when "escape_utf8" was used instead of "escape"...
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> <canvas bgcolor="0x898989" debug="true">
>>>>>> <button text="escape" >
>>>>>> <handler name="onclick" ><![CDATA[
>>>>>> var d = new Date();
>>>>>> for (var i=0; i<4000; ++i)escape("encode me éêè");
>>>>>> Debug.write(new Date()-d)
>>>>>> ]]></handler>
>>>>>> </button>
>>>>>>
>>>>>> <button x="100" text="escape_utf8" >
>>>>>> <handler name="onclick" ><![CDATA[
>>>>>> function escape_utf8 (s) {
>>>>>> var utf8 = "";
>>>>>> for (var i = 0, len = s.length; i < len; ++i) {
>>>>>> var c = s.charCodeAt(i);
>>>>>> if ((c >= 0x30 && c <= 0x39) // 0-9
>>>>>> || (c >= 0x41 && c <= 0x5A) // A-Z
>>>>>> || (c >= 0x61 && c <= 0x7A)) {// a-z
>>>>>> utf8 += s.charAt(i);
>>>>>> } else if (c <= 0x7F) {
>>>>>> // 0xxxxxxx
>>>>>> utf8 += "%" + (c).toString(16).toUpperCase();
>>>>>> } else if (c <= 0x7FF) {
>>>>>> // 110xxxxx 10xxxxxx
>>>>>> utf8 += "%" + ((c >> 6) |
>>>>>> 0xC0).toString(16).toUpperCase();
>>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> } else if (c <= 0xFFFF) {
>>>>>> // 1110xxxx 10xxxxxx 10xxxxxx
>>>>>> utf8 += "%" + ((c >> 12) |
>>>>>> 0xE0).toString(16).toUpperCase();
>>>>>> utf8 += "%" + (((c >> 6) & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> } else {
>>>>>> // 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
>>>>>> utf8 += "%" + ((c >> 18) |
>>>>>> 0xF0).toString(16).toUpperCase();
>>>>>> utf8 += "%" + (((c >> 12) & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> utf8 += "%" + (((c >> 6) & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>>> 0x80).toString(16).toUpperCase();
>>>>>> }
>>>>>> }
>>>>>> return utf8;
>>>>>> }
>>>>>>
>>>>>> var d = new Date();
>>>>>> for (var i=0; i<4000; ++i)escape_utf8("encode me éêè")
>>>>>> Debug.write(new Date()-d)
>>>>>> ]]></handler>
>>>>>> </button>
>>>>>> </canvas>
>>>>>> ---
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I just modified your utf8 escape routine to pad the extra zero
>>>>>>> when
>>>>>>> needed
>>>>>>>
>>>>>>>
>>>>>>> var escape_utf8 = function (s:String):String {
>>>>>>> var utf8 = "";
>>>>>>> for (var i = 0, len = s.length; i < len; ++i) {
>>>>>>> var c = s.charCodeAt(i);
>>>>>>> if ((c >= 0x30 && c <= 0x39) // 0-9
>>>>>>> || (c >= 0x41 && c <= 0x5A) // A-Z
>>>>>>> || (c >= 0x61 && c <= 0x7A)) {// a-z
>>>>>>> utf8 += s.charAt(i);
>>>>>>> } else if (c < 0x10) {
>>>>>>> // 0xxxxxxx
>>>>>>> utf8 += "%0" + (c).toString(16).toUpperCase();
>>>>>>> } else if (c <= 0x7F) {
>>>>>>> // 0xxxxxxx
>>>>>>> utf8 += "%" + (c).toString(16).toUpperCase();
>>>>>>> ...
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 6, 2009 at 11:49 AM, André Bargull <andre.bargull at udo.edu
>>>>>>> >
>>>>>>> wrote:
>>>>>>>> As an alternative, we could (maybe should?) use:
>>>>>>>> - in swf8: escape
>>>>>>>> - in dhtml+swf9: encodeURIComponent [1] with a few
>>>>>>>> modifications so it
>>>>>>>> also
>>>>>>>> encodes "- _ . ! ~ * ' ( )" into the appropriate UTF-8 encoding
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>> "https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Functions/encodeURIComponent
>>>>>>>> "
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/6/2009 5:29 PM, Henry Minsky wrote:
>>>>>>>>> Hang on, there's a bug in the escape_utf8 routine, it's
>>>>>>>>> encoding
>>>>>>>>> newline as "%A" instead of "%0A", I need
>>>>>>>>> to fix that.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 6, 2009 at 2:19 AM, Henry Minsky <henry.minsky at gmail.com
>>>>>>>>> >
>>>>>>>>> wrote:
>>>>>>>>>> Change 20090105-hqm-R by hqm at badtzmaru.home on 2009-01-05
>>>>>>>>>> 19:40:54
>>>>>>>>>> EST
>>>>>>>>>> in /Users/hqm/openlaszlo/trunk3/WEB-INF/lps/lfc
>>>>>>>>>> for http://svn.openlaszlo.org/openlaszlo/trunk/WEB-INF/lps/
>>>>>>>>>> lfc
>>>>>>>>>>
>>>>>>>>>> Summary: fix url encoding problem
>>>>>>>>>>
>>>>>>>>>> New Features:
>>>>>>>>>>
>>>>>>>>>> Bugs Fixed: LPP-7532
>>>>>>>>>>
>>>>>>>>>> Technical Reviewer: andre
>>>>>>>>>> QA Reviewer: ptw
>>>>>>>>>> Doc Reviewer: (pending)
>>>>>>>>>>
>>>>>>>>>> Documentation:
>>>>>>>>>>
>>>>>>>>>> Release Notes:
>>>>>>>>>>
>>>>>>>>>> The recommended way to url-escape strings is to call
>>>>>>>>>> lz.Browser.urlEscape.
>>>>>>>>>> This works similar to the Javascript encodeURIComponent
>>>>>>>>>> function, but
>>>>>>>>>> is
>>>>>>>>>> preferable
>>>>>>>>>> because there are some knows bugs with encodeURIComponent
>>>>>>>>>> on some
>>>>>>>>>> platforms.
>>>>>>>>>>
>>>>>>>>>> Details:
>>>>>>>>>>
>>>>>>>>>> Use Andre's utf-8 clean implementation of encodeURIComponent.
>>>>>>>>>>
>>>>>>>>>> Tests:
>>>>>>>>>>
>>>>>>>>>> demos/amazon/amazon.lzx in swf8,swf9,dhtml
>>>>>>>>>> test/lfc/data/alldata.lzx (alldata.lzx has bugs, but there
>>>>>>>>>> should be
>>>>>>>>>> no
>>>>>>>>>> regressions from behavior in trunk)
>>>>>>>>>> demos/lzpix/app.lzx in swf8,swf9,dhtml
>>>>>>>>>> demos/calendar/calendar.lzx in swf8,swf9,dhtml
>>>>>>>>>>
>>>>>>>>>> Files:
>>>>>>>>>> M kernel/swf/LzLoadQueue.as
>>>>>>>>>> M services/LzBrowser.lzs
>>>>>>>>>> M debugger/platform/swf9/LzFlashRemote.as
>>>>>>>>>> M data/LzParam.lzs
>>>>>>>>>> M compiler/LzRuntime.lzs
>>>>>>>>>> M compiler/LzBootstrapDebugService.lzs
>>>>>>>>>>
>>>>>>>>>> Changeset:
>>>>>>>>>> http://svn.openlaszlo.org/openlaszlo/patches/20090105-hqm-R.tar
>>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
More information about the Laszlo-dev
mailing list