[Laszlo-dev] UPDATED For Review: Change 20090105-hqm-R Summary: fix url encoding problem
André Bargull
andre.bargull at udo.edu
Tue Jan 6 10:04:31 PST 2009
"encodeURIComponent" is described in detail in "15.1.3 URI Handling
Function Properties" of ECMA-262 3rd edt.
On 1/6/2009 7:01 PM, André Bargull wrote:
> The built-in "encodeURIComponent" will perform properly on chars >
> 256, the actual problem was that "escape" was used ("escape" in swf8
> performs utf8-encoding, much like "encodeURIComponent" in browser
> javascript. But there is one exception for swf8: it doesn't encode
> alphanum chars, whereas in javascript's "encodeURIComponent" alphanum
> _and_ "- _ . ! ~ * ' ( )") aren't encoded.)
> So the main thing is to use "encodeURIComponent" instead of "escape"
> for url-escaping. Next thing on a lower priority: how to handle "- _
> . ! ~ * ' ( )" ? Do it like swf8 and encode them too. Or be compatible
> to the ECMAScript specification and don't encode them..
>
>
>
> On 1/6/2009 6:46 PM, Henry Minsky wrote:
>> OK, I'd be happy to use
>>
>> - in swf8: escape
>> - in dhtml+swf9: encodeURIComponent [1] with a few modifications so it
>> also encodes "- _ . ! ~ * ' ( )" into the appropriate UTF-8 encoding
>>
>> The original bug report was complaining about a failure of unicode
>> encoding for foreign languages (Arabic? Hebrew? I am not sure), in
>> swf9 so we need to make sure that the built-in encodeURIComponent in
>> swf9 (and dhtml)
>> are encoding unicode chars with values > 256 properly.
>>
>>
>>
>>
>> On Tue, Jan 6, 2009 at 12:30 PM, André Bargull
>> <andre.bargull at udo.edu> wrote:
>>> Even with a constant table, it's still about 48 times slower on my
>>> machine.
>>> And Tucker's quote doesn't imply you shouldn't take an obvious, more
>>> effective alternative.
>>>
>>>
>>> On 1/6/2009 6:13 PM, Henry Minsky wrote:
>>>> That is a good question. I really take Tucker's quotation to heart
>>>> though,
>>>>
>>>> Rules of Optimization:
>>>> Rule 1: Don't do it.
>>>> Rule 2 (for experts only): Don't do it yet.
>>>> — M.A. Jackson
>>>> "More computing sins are committed in the name of efficiency (without
>>>> necessarily
>>>> achieving it) than for any other single reason - including blind
>>>> stupidity."
>>>> — W.A. Wulf
>>>> "We should forget about small efficiencies, say about 97% of the time:
>>>> premature
>>>> optimization is the root of all evil."
>>>> — Donald Knuth
>>>> "The best is the enemy of the good."
>>>> — Voltaire
>>>>
>>>>
>>>> Even if it is slow, it doesn't seem like a typical application
>>>> would spend
>>>> that much absolute time encoding data. If it is a noticable
>>>> performance issue, I bet we could
>>>> speed it up a lot with a lookup table hex[value]
>>>> to encode the two digit sequences instead of a function call to
>>>> value.toString().toUpperCase().
>>>>
>>>>
>>>> var hex = {
>>>> "%00", "%01", "%02", "%03", "%04", "%05", "%06", "%07",
>>>> "%08", "%09", "%0a", "%0b", "%0c", "%0d", "%0e", "%0f",
>>>> "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
>>>> "%18", "%19", "%1a", "%1b", "%1c", "%1d", "%1e", "%1f",
>>>> "%20", "%21", "%22", "%23", "%24", "%25", "%26", "%27",
>>>> "%28", "%29", "%2a", "%2b", "%2c", "%2d", "%2e", "%2f",
>>>> "%30", "%31", "%32", "%33", "%34", "%35", "%36", "%37",
>>>> "%38", "%39", "%3a", "%3b", "%3c", "%3d", "%3e", "%3f",
>>>> "%40", "%41", "%42", "%43", "%44", "%45", "%46", "%47",
>>>> "%48", "%49", "%4a", "%4b", "%4c", "%4d", "%4e", "%4f",
>>>> "%50", "%51", "%52", "%53", "%54", "%55", "%56", "%57",
>>>> "%58", "%59", "%5a", "%5b", "%5c", "%5d", "%5e", "%5f",
>>>> "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67",
>>>> "%68", "%69", "%6a", "%6b", "%6c", "%6d", "%6e", "%6f",
>>>> "%70", "%71", "%72", "%73", "%74", "%75", "%76", "%77",
>>>> "%78", "%79", "%7a", "%7b", "%7c", "%7d", "%7e", "%7f",
>>>> "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87",
>>>> "%88", "%89", "%8a", "%8b", "%8c", "%8d", "%8e", "%8f",
>>>> "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97",
>>>> "%98", "%99", "%9a", "%9b", "%9c", "%9d", "%9e", "%9f",
>>>> "%a0", "%a1", "%a2", "%a3", "%a4", "%a5", "%a6", "%a7",
>>>> "%a8", "%a9", "%aa", "%ab", "%ac", "%ad", "%ae", "%af",
>>>> "%b0", "%b1", "%b2", "%b3", "%b4", "%b5", "%b6", "%b7",
>>>> "%b8", "%b9", "%ba", "%bb", "%bc", "%bd", "%be", "%bf",
>>>> "%c0", "%c1", "%c2", "%c3", "%c4", "%c5", "%c6", "%c7",
>>>> "%c8", "%c9", "%ca", "%cb", "%cc", "%cd", "%ce", "%cf",
>>>> "%d0", "%d1", "%d2", "%d3", "%d4", "%d5", "%d6", "%d7",
>>>> "%d8", "%d9", "%da", "%db", "%dc", "%dd", "%de", "%df",
>>>> "%e0", "%e1", "%e2", "%e3", "%e4", "%e5", "%e6", "%e7",
>>>> "%e8", "%e9", "%ea", "%eb", "%ec", "%ed", "%ee", "%ef",
>>>> "%f0", "%f1", "%f2", "%f3", "%f4", "%f5", "%f6", "%f7",
>>>> "%f8", "%f9", "%fa", "%fb", "%fc", "%fd", "%fe", "%ff"
>>>> };
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 6, 2009 at 12:07 PM, André Bargull <andre.bargull at udo.edu>
>>>> wrote:
>>>>> On 1/6/2009 5:49 PM, Henry Minsky wrote:
>>>>>> I like having a single common url escape routine across all
>>>>>> runtimes,
>>>>>> it makes debugging simpler.
>>>>> Even if it is _much_ slower? The following testcase (in swf8) was
>>>>> about
>>>>> 50
>>>>> times slower when "escape_utf8" was used instead of "escape"...
>>>>>
>>>>>
>>>>> ---
>>>>> <canvas bgcolor="0x898989" debug="true">
>>>>> <button text="escape" >
>>>>> <handler name="onclick" ><![CDATA[
>>>>> var d = new Date();
>>>>> for (var i=0; i<4000; ++i)escape("encode me éêè");
>>>>> Debug.write(new Date()-d)
>>>>> ]]></handler>
>>>>> </button>
>>>>>
>>>>> <button x="100" text="escape_utf8" >
>>>>> <handler name="onclick" ><![CDATA[
>>>>> function escape_utf8 (s) {
>>>>> var utf8 = "";
>>>>> for (var i = 0, len = s.length; i < len; ++i) {
>>>>> var c = s.charCodeAt(i);
>>>>> if ((c >= 0x30 && c <= 0x39) // 0-9
>>>>> || (c >= 0x41 && c <= 0x5A) // A-Z
>>>>> || (c >= 0x61 && c <= 0x7A)) {// a-z
>>>>> utf8 += s.charAt(i);
>>>>> } else if (c <= 0x7F) {
>>>>> // 0xxxxxxx
>>>>> utf8 += "%" + (c).toString(16).toUpperCase();
>>>>> } else if (c <= 0x7FF) {
>>>>> // 110xxxxx 10xxxxxx
>>>>> utf8 += "%" + ((c >> 6) |
>>>>> 0xC0).toString(16).toUpperCase();
>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> } else if (c <= 0xFFFF) {
>>>>> // 1110xxxx 10xxxxxx 10xxxxxx
>>>>> utf8 += "%" + ((c >> 12) |
>>>>> 0xE0).toString(16).toUpperCase();
>>>>> utf8 += "%" + (((c >> 6) & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> } else {
>>>>> // 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
>>>>> utf8 += "%" + ((c >> 18) |
>>>>> 0xF0).toString(16).toUpperCase();
>>>>> utf8 += "%" + (((c >> 12) & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> utf8 += "%" + (((c >> 6) & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> utf8 += "%" + ((c & 0x3F) |
>>>>> 0x80).toString(16).toUpperCase();
>>>>> }
>>>>> }
>>>>> return utf8;
>>>>> }
>>>>>
>>>>> var d = new Date();
>>>>> for (var i=0; i<4000; ++i)escape_utf8("encode me éêè")
>>>>> Debug.write(new Date()-d)
>>>>> ]]></handler>
>>>>> </button>
>>>>> </canvas>
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>>> I just modified your utf8 escape routine to pad the extra zero when
>>>>>> needed
>>>>>>
>>>>>>
>>>>>> var escape_utf8 = function (s:String):String {
>>>>>> var utf8 = "";
>>>>>> for (var i = 0, len = s.length; i < len; ++i) {
>>>>>> var c = s.charCodeAt(i);
>>>>>> if ((c >= 0x30 && c <= 0x39) // 0-9
>>>>>> || (c >= 0x41 && c <= 0x5A) // A-Z
>>>>>> || (c >= 0x61 && c <= 0x7A)) {// a-z
>>>>>> utf8 += s.charAt(i);
>>>>>> } else if (c < 0x10) {
>>>>>> // 0xxxxxxx
>>>>>> utf8 += "%0" + (c).toString(16).toUpperCase();
>>>>>> } else if (c <= 0x7F) {
>>>>>> // 0xxxxxxx
>>>>>> utf8 += "%" + (c).toString(16).toUpperCase();
>>>>>> ...
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 6, 2009 at 11:49 AM, André Bargull
>>>>>> <andre.bargull at udo.edu>
>>>>>> wrote:
>>>>>>> As an alternative, we could (maybe should?) use:
>>>>>>> - in swf8: escape
>>>>>>> - in dhtml+swf9: encodeURIComponent [1] with a few modifications
>>>>>>> so it
>>>>>>> also
>>>>>>> encodes "- _ . ! ~ * ' ( )" into the appropriate UTF-8 encoding
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> "https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Functions/encodeURIComponent"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/6/2009 5:29 PM, Henry Minsky wrote:
>>>>>>>> Hang on, there's a bug in the escape_utf8 routine, it's encoding
>>>>>>>> newline as "%A" instead of "%0A", I need
>>>>>>>> to fix that.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 6, 2009 at 2:19 AM, Henry Minsky
>>>>>>>> <henry.minsky at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Change 20090105-hqm-R by hqm at badtzmaru.home on 2009-01-05
>>>>>>>>> 19:40:54
>>>>>>>>> EST
>>>>>>>>> in /Users/hqm/openlaszlo/trunk3/WEB-INF/lps/lfc
>>>>>>>>> for http://svn.openlaszlo.org/openlaszlo/trunk/WEB-INF/lps/lfc
>>>>>>>>>
>>>>>>>>> Summary: fix url encoding problem
>>>>>>>>>
>>>>>>>>> New Features:
>>>>>>>>>
>>>>>>>>> Bugs Fixed: LPP-7532
>>>>>>>>>
>>>>>>>>> Technical Reviewer: andre
>>>>>>>>> QA Reviewer: ptw
>>>>>>>>> Doc Reviewer: (pending)
>>>>>>>>>
>>>>>>>>> Documentation:
>>>>>>>>>
>>>>>>>>> Release Notes:
>>>>>>>>>
>>>>>>>>> The recommended way to url-escape strings is to call
>>>>>>>>> lz.Browser.urlEscape.
>>>>>>>>> This works similar to the Javascript encodeURIComponent
>>>>>>>>> function, but
>>>>>>>>> is
>>>>>>>>> preferable
>>>>>>>>> because there are some knows bugs with encodeURIComponent on some
>>>>>>>>> platforms.
>>>>>>>>>
>>>>>>>>> Details:
>>>>>>>>>
>>>>>>>>> Use Andre's utf-8 clean implementation of encodeURIComponent.
>>>>>>>>>
>>>>>>>>> Tests:
>>>>>>>>>
>>>>>>>>> demos/amazon/amazon.lzx in swf8,swf9,dhtml
>>>>>>>>> test/lfc/data/alldata.lzx (alldata.lzx has bugs, but there
>>>>>>>>> should be
>>>>>>>>> no
>>>>>>>>> regressions from behavior in trunk)
>>>>>>>>> demos/lzpix/app.lzx in swf8,swf9,dhtml
>>>>>>>>> demos/calendar/calendar.lzx in swf8,swf9,dhtml
>>>>>>>>>
>>>>>>>>> Files:
>>>>>>>>> M kernel/swf/LzLoadQueue.as
>>>>>>>>> M services/LzBrowser.lzs
>>>>>>>>> M debugger/platform/swf9/LzFlashRemote.as
>>>>>>>>> M data/LzParam.lzs
>>>>>>>>> M compiler/LzRuntime.lzs
>>>>>>>>> M compiler/LzBootstrapDebugService.lzs
>>>>>>>>>
>>>>>>>>> Changeset:
>>>>>>>>> http://svn.openlaszlo.org/openlaszlo/patches/20090105-hqm-R.tar
>>>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
More information about the Laszlo-dev
mailing list