[Laszlo-checkins] r14096 - in openlaszlo/branches/4.4: . WEB-INF/lps/server/src/org/openlaszlo/compiler WEB-INF/lps/server/src/org/openlaszlo/css WEB-INF/lps/server/src/org/openlaszlo/utils test/css test/css/encoding

ptw@openlaszlo.org ptw at openlaszlo.org
Tue Jun 9 10:47:30 PDT 2009


Author: ptw
Date: 2009-06-09 10:47:22 -0700 (Tue, 09 Jun 2009)
New Revision: 14096

Added:
   openlaszlo/branches/4.4/test/css/encoding/
   openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css
   openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css
   openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx
Removed:
   openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css
   openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css
   openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css
   openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx
Modified:
   openlaszlo/branches/4.4/
   openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java
   openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
   openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
Log:
Merged revisions 14091-14095 via svnmerge from 
http://svn.openlaszlo.org/openlaszlo/trunk

.......
  r14091 | raju | 2009-06-09 12:25:06 -0400 (Tue, 09 Jun 2009) | 52 lines
  
  Change 20090608-raju-Y by raju at ip-90-186-160-71.web.vodafone.de on 2009-06-08 20:09:02 CEST
      in /Users/rajubitter/src/svn/openlaszlo/trunk-cssunicode
      for http://rajubitter@svn.openlaszlo.org/openlaszlo/trunk
  
  Summary: Fix for CSS parser uses incorrect file encoding
  
  New Features: Adds an optional @charset to the stylesheet tag, in case the user wants to use a CSS file in a different encoding then utf-8
  
  Bugs Fixed: LPP-8045
  
  Technical Reviewer: ptw
  QA Reviewer: (pending)
  Doc Reviewer: (pending)
  
  Documentation:
  
  Release Notes:
  
  Details:
  + StyleSheetCompiler.java: Handling for @charset on stylesheet tag added. Added a 
  2nd parameter with the encoding value to the CSSHandler.parse() call.
  
  + CSSHandler.java:
  parse() method takes the encoding from the LZX stylesheet tag as a 2nd parameter.
  getInputSource() method does a few more things now:
    1) checks for a possible BOM on the CSS file
    2) checks if a possible BOM conflicts with the value of the stylesheet tag's @charset
    3) if there's a BOM, the BOM bytes are removed from the input stream
  
  + FileUtils.java:
  Added method public static String detectBOMEncoding(BufferedInputStream in)
  The method returns the BOM marker interpreted as one of the following strings:
    UTF-8
    UTF-16LE
    UTF-16BE
  
  Tests:
  + test files in folder test/css/encoding
  The following test exist:
    1) iso8859-1_with_charset_attr.lzx
       Reading an iso-8859-2 encoded CSS file with some German special chars
    2) utf16BE_with_BOM.lzx
       Reading an utf-16 BE CSS file with BOM marker
    3) utf16LE_with_BOM.lzx
       Reading an utf-16 LE CSS file with BOM marker
    4) utf8_with_BOM_no_charset_attr.lzx
       Reading an utf-8 CSS file with BOM and no charset attribute on the stylesheet tag
    5) utf8_with_BOM_conflicting_charset_attr.lzx
       Reading a CSS with @charset value of utf-16, but CSS having a UTF-8 BOM marker, will
       throw a compile error
.......
  r14095 | raju | 2009-06-09 12:51:45 -0400 (Tue, 09 Jun 2009) | 25 lines
  
  Change 20090609-raju-a by raju at Atlantia.local on 2009-06-09 18:42:33 CEST
      in /Users/rajubitter/src/svn/openlaszlo/trunk-cssunicode
      for http://rajubitter@svn.openlaszlo.org/openlaszlo/trunk
  
  Summary: 
  
  New Features: These 2 css files have to be committed with skip-pre-commit-checks
  
  Bugs Fixed: LPP-8045
  
  Technical Reviewer: ptw
  QA Reviewer: ptw
  Doc Reviewer: (pending)
  
  Documentation:
  
  Release Notes:
  
  Details:
  Addition to commit 14091, missing CSS test files.
  
  
  Tests:
.......



Property changes on: openlaszlo/branches/4.4
___________________________________________________________________
Name: svnmerge-integrated
   - /openlaszlo/branches/4.1:1-10153 /openlaszlo/branches/4.2:1-12154,12181,13205,13778 /openlaszlo/branches/devildog:1-8432 /openlaszlo/branches/pagan-deities:1-7955,8825,10756-10920,10922-10928,10930-10935,11151,11207,11554,13476,13629 /openlaszlo/branches/paperpie:1-6504,6506-6574,6576-7135,7137-7235 /openlaszlo/branches/wafflecone:1-5746,5818-6068,6070-6205,6207-6213,6216-6265,6267-6368,6370-6431,6433-6450,6497,6509,6661,7097,7872 /openlaszlo/trunk:1-13937,13948-13952,13954-13968,13970,13972-13980,13982-13985,13987-14002,14021-14032,14034,14036-14067,14087
   + /openlaszlo/branches/4.1:1-10153 /openlaszlo/branches/4.2:1-12154,12181,13205,13778 /openlaszlo/branches/devildog:1-8432 /openlaszlo/branches/pagan-deities:1-7955,8825,10756-10920,10922-10928,10930-10935,11151,11207,11554,13476,13629 /openlaszlo/branches/paperpie:1-6504,6506-6574,6576-7135,7137-7235 /openlaszlo/branches/wafflecone:1-5746,5818-6068,6070-6205,6207-6213,6216-6265,6267-6368,6370-6431,6433-6450,6497,6509,6661,7097,7872 /openlaszlo/trunk:1-13937,13948-13952,13954-13968,13970,13972-13980,13982-13985,13987-14002,14021-14032,14034,14036-14067,14087,14091-14095

Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java	2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java	2009-06-09 17:47:22 UTC (rev 14096)
@@ -1,9 +1,9 @@
 /* *****************************************************************************
- * StyleSheetCompiler.java
+* StyleSheetCompiler.java
 * ****************************************************************************/
 
 /* J_LZ_COPYRIGHT_BEGIN *******************************************************
-* Copyright 2001-2008 Laszlo Systems, Inc.  All Rights Reserved.              *
+* Copyright 2001-2009 Laszlo Systems, Inc.  All Rights Reserved.              *
 * Use is subject to license terms.                                            *
 * J_LZ_COPYRIGHT_END *********************************************************/
 
@@ -29,6 +29,7 @@
     private static Logger mLogger = Logger.getLogger(StyleSheetCompiler.class);
 
     private static final String SRC_ATTR_NAME = "src";
+    private static final String CHARSET_ATTR_NAME = "charset";
 
     StyleSheetCompiler(CompilationEnvironment env) {
         super(env);
@@ -44,9 +45,9 @@
 
     public void compile(Element element) {
         try {
-        	if (mLogger.isInfoEnabled()) {
+            if (mLogger.isInfoEnabled()) {
             mLogger.info("StyleSheetCompiler.compile called!");
-        	}
+            }
 
             if (!element.getChildren().isEmpty()) {
                 throw new CompilationError("<stylesheet> elements can't have children",
@@ -56,11 +57,21 @@
             String pathname = null;
             String stylesheetText = element.getText();
             String src = element.getAttributeValue(SRC_ATTR_NAME);
+            String encoding = element.getAttributeValue(CHARSET_ATTR_NAME);
+            if (encoding != null) {
+                if (mLogger.isDebugEnabled()) {
+                mLogger.info("@charset=" + encoding + " found on stylesheet tag");
+                }
+            } else {
+                if (mLogger.isDebugEnabled()) {
+                mLogger.info("no attribute @charset found on stylesheet tag, using default value " + encoding);
+                }
+           }
 
             if (src != null) {
-            	if (mLogger.isInfoEnabled()) {
-                mLogger.info("reading in stylesheet from src=" + src);
-            	}
+                if (mLogger.isInfoEnabled()) {
+                mLogger.info("reading in stylesheet from src=\"" + src + "\"");
+                }
                 // Find the css file
                 // Using the FileResolver accomplishes two nice things:
                 // 1, it searches the standard directory include paths
@@ -83,9 +94,9 @@
                 if (! resolvedFile.exists() ) {
                     resolvedFile = mEnv.resolve(src, base);
                     if (resolvedFile.exists()) {
-                    	if (mLogger.isInfoEnabled()) {
+                        if (mLogger.isInfoEnabled()) {
                         mLogger.info("Resolved css file to a file that exists!");
-                    	}
+                        }
                     } else {
                         mLogger.error("Could not resolve css file to a file that exists.");
                         throw new CompilationError("Could not find css file " + src);
@@ -93,14 +104,14 @@
                 }
 
                 // Actually parse and compile the stylesheet! W00t!
-                CSSHandler fileHandler = CSSHandler.parse( resolvedFile );
+                CSSHandler fileHandler = CSSHandler.parse( resolvedFile, encoding );
                 this.compile(fileHandler, element);
 
 
             } else if (stylesheetText != null && (!"".equals(stylesheetText))) {
-            	if (mLogger.isInfoEnabled()) {
+                if (mLogger.isInfoEnabled()) {
                 mLogger.info("inline stylesheet");
-            	}
+                }
                 CSSHandler inlineHandler = CSSHandler.parse(stylesheetText);
                 this.compile(inlineHandler, element);
                 //
@@ -156,9 +167,9 @@
     }
 
     void compile(CSSHandler handler, Element element) throws CompilationError {
-    	if (mLogger.isDebugEnabled()) {
+        if (mLogger.isDebugEnabled()) {
         mLogger.debug("compiling CSSHandler using new unique names");
-    	}
+        }
         String script = "";
         for (int i=0; i < handler.mRuleList.size(); i++) {
             Rule rule = (Rule)handler.mRuleList.get(i);
@@ -223,9 +234,9 @@
     }
 
     String buildConditionalSelectorJS(Condition cond, SimpleSelector simpleSelector) {
-    	if (mLogger.isDebugEnabled()) {
+        if (mLogger.isDebugEnabled()) {
         mLogger.debug("Conditional selector: " + cond.toString());
-    	}
+        }
         String condString = "no_match";
         switch (cond.getConditionType()) {
             case Condition.SAC_ID_CONDITION: /* #id */
@@ -234,9 +245,9 @@
                 break;
 
              case Condition.SAC_ATTRIBUTE_CONDITION: // [attr] or [attr="val"] or elem[attr="val"]
-            	 if (mLogger.isDebugEnabled()) {
+                 if (mLogger.isDebugEnabled()) {
                 mLogger.debug("Attribute condition");
-            	 }
+                 }
                 AttributeCondition attrCond = (AttributeCondition) cond;
                 String name  = attrCond.getLocalName();
                 String value = attrCond.getValue();
@@ -247,9 +258,9 @@
                 // localName of the null string. We don't write out the simple selector if
                 // it's not specified.
                 if (simpleSelector != null) {
-                	if (mLogger.isDebugEnabled()) {
+                    if (mLogger.isDebugEnabled()) {
                     mLogger.debug("simple selector:" + simpleSelector.toString());
-                	}
+                    }
                     if (simpleSelector.getSelectorType() == Selector.SAC_ELEMENT_NODE_SELECTOR) {
 
                         ElementSelector es = (ElementSelector)simpleSelector;

Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java	2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java	2009-06-09 17:47:22 UTC (rev 14096)
@@ -3,19 +3,23 @@
 * ****************************************************************************/
 
 /* J_LZ_COPYRIGHT_BEGIN *******************************************************
-* Copyright 2001-2008 Laszlo Systems, Inc.  All Rights Reserved.              *
+* Copyright 2001-2009 Laszlo Systems, Inc.  All Rights Reserved.              *
 * Use is subject to license terms.                                            *
 * J_LZ_COPYRIGHT_END *********************************************************/
 
 package org.openlaszlo.css;
 
 import java.io.*;
+import java.nio.CharBuffer;
 import java.util.*;
 import java.util.regex.*;
 import org.w3c.css.sac.*;
 import org.apache.log4j.*;
 import org.jdom.*;
 
+import org.openlaszlo.utils.FileUtils;
+import org.openlaszlo.compiler.CompilationError;
+
 /**
  * Handler used to parse CSS file and process style rules on a document element.
  *
@@ -30,9 +34,9 @@
     /** Logger. */
     private static Logger mLogger = Logger.getLogger(CSSHandler.class);
 
-    /** CSS parser factory. */ 
+    /** CSS parser factory. */
     private static org.w3c.css.sac.helpers.ParserFactory mCSSParserFactory = null;
-    
+
     static {
         // This system property is required for the SAC ParserFactory.
         if (System.getProperty("org.w3c.css.sac.parser") == null) {
@@ -45,10 +49,10 @@
    /**
      * Entry point to creating a CSSHandler to read from an external
      *    stylesheet file
-     * @param rootDir the directory where cssFile exists.
-     * @param cssFile the css file to read.
+     * @param file the css file to read.
+     * @param String value of an optional charset attribute on the stylesheet tag
      */
-    public static CSSHandler parse(File file)
+    public static CSSHandler parse(File file, String charsetAttrValue)
            throws CSSException {
        try {
            mLogger.info("creating CSSHandler");
@@ -56,7 +60,8 @@
            Parser parser = mCSSParserFactory.makeParser();
            parser.setDocumentHandler(handler);
            parser.setErrorHandler(handler);
-           parser.parseStyleSheet(handler.getInputSource());
+           mLogger.info("Trying to parse CSS with charset setting of " + charsetAttrValue);
+           parser.parseStyleSheet(handler.getInputSource(charsetAttrValue,file.getPath()));
            return handler;
        } catch (CSSParseException e) {
            mLogger.error("got css parse exception");
@@ -123,15 +128,15 @@
         mRuleList = new Vector();
         mFileDependencies = getFullPath();
     }
-    
+
     /** protected constructor */
     CSSHandler(String cssString) {
         mFile = null; // No file associated with inline css
         mRuleList = new Vector();
         mFileDependencies = ""; // inline css doesn't add any file dependencies
     }
-    
 
+
     /** Helper function to log and throw an error. */
     void throwCSSException(String errMsg) throws CSSException {
         mLogger.error(errMsg);
@@ -144,19 +149,93 @@
             return mFile.getCanonicalPath();
         } catch (IOException e) {
             mLogger.error("Exception getting canonical path of: " + mFile + ", " + e.getMessage());
-            return ""; 
+            return "";
         }
     }
 
-    /** @return InputSource object pointing to the CSS file. */
-    InputSource getInputSource() throws FileNotFoundException {
-        InputSource is =
-            new InputSource(new FileReader(mFile));
-//         is.setEncoding("ISO-8859-1");
-        return is;                                         
+    /** @param charsetAttrValue charset value from the stylesheet tag in LZX
+     *  @param the name of the CSS file we need to parse
+     *  @return InputSource object pointing to the CSS file. */
+    InputSource getInputSource(String charsetAttrValue, String fileName) throws FileNotFoundException  {
+        // Detect if there's a BOM with encoding information on the file.
+        // If there's a BOM that shouldn't conflict with a possible @charset
+        // attribute of the stylesheet tag in LZX, e.g. <stylesheet charset="iso-8859-15" />
+        BufferedInputStream bis = null;
+        InputSource inputSource = null;
+        InputStreamReader isr = null;
+        // Encoding read from BOM in CSS file
+        String bomEncoding = null;
+        // Encoding used for opening CSS file
+        String encoding = "utf-8";
+        try {
+            bis = new BufferedInputStream(new FileInputStream(mFile));
+            bomEncoding =  FileUtils.detectBOMEncoding(bis);
+        } catch (IOException e) {
+            mLogger.error("IOException during BOM detection:\n" + e.getMessage());
+            throw new CompilationError("IO Exception while trying to open file");
+        }
+
+        // If we got a BOM encoding value, check if there no conflicting declaration
+        // on the stylesheet tag
+        if (bomEncoding != null) {
+            if (charsetAttrValue != null && !charsetAttrValue.toUpperCase().equals(bomEncoding.toUpperCase())) {
+                throw new CompilationError("<stylesheet charset=\"" + charsetAttrValue + "\"> conflicts with BOM "
+                        + bomEncoding + " for CSS file " + fileName + ".");
+            }
+            encoding = bomEncoding;
+        } else if (charsetAttrValue != null) {
+            encoding = charsetAttrValue;
+            if (mLogger.isDebugEnabled()) {
+            mLogger.debug("Using encoding from LZX <stylesheet charset=\"" + encoding + "\">");
+            }
+        }
+
+        // Parse CSS file now
+        /* New code for reading stream */
+
+        try {
+            if (mLogger.isDebugEnabled()) {
+             mLogger.debug("Opening CSS file " + fileName + " using encoding " + encoding);
+            }
+            if (bomEncoding != null && bomEncoding.toUpperCase().equals("UTF-8")) {
+                // TODO: Check what the 2nd parameter on the PushBackInputStream constructor means
+                PushbackInputStream internalIn = new PushbackInputStream(new FileInputStream(mFile), 3);
+                // skip the first 3 bytes
+                internalIn.skip(3);
+                isr = new InputStreamReader(internalIn, encoding);
+                inputSource = new InputSource(isr);
+                if (mLogger.isDebugEnabled()) {
+                mLogger.debug("Skip first 3 bytes containing UTF-8 BOM");
+                }
+            } else if (bomEncoding != null &&
+                       (bomEncoding.toUpperCase().equals("UTF-16LE") || bomEncoding.toUpperCase().equals("UTF-16BE"))) {
+                // BOM for UTF-16
+                PushbackInputStream internalIn = new PushbackInputStream(new FileInputStream(mFile), 3);
+                // skip the first 2 bytes
+                internalIn.skip(2);
+                isr = new InputStreamReader(internalIn, encoding);
+                inputSource = new InputSource(isr);
+                if (mLogger.isDebugEnabled()) {
+                mLogger.debug("Skip first 3 bytes containing UTF-16 BOM");
+                }
+            } else {
+                if (mLogger.isDebugEnabled()) {
+                mLogger.debug("No need to skip bytes");
+                }
+                // no BOM, just use the normal InputStreamStreader
+                inputSource = new InputSource(new InputStreamReader(new FileInputStream(mFile), encoding));
+            }
+
+        } catch (UnsupportedEncodingException e) {
+            mLogger.error("Unsupported encoding in file: " + mFile + ", " + e.getMessage());
+        } catch (IOException e) {
+            mLogger.error("Error skipping BOM for InputStreamReader: " + e.getMessage());
+        }
+
+        return inputSource;
     }
 
-    /** 
+    /**
      * Get a string containing a list CSS files required by the parse. Includes
      * imported CSS files.
      * @return a list of CSS files separated by two file separators characters.
@@ -219,7 +298,7 @@
             CSSHandler handler = new CSSHandler(new File(uri));
             Parser parser = mCSSParserFactory.makeParser();
             parser.setDocumentHandler(handler);
-            parser.parseStyleSheet(handler.getInputSource());
+            parser.parseStyleSheet(handler.getInputSource(null,uri));
         } catch (Exception e) {
             mLogger.error("Exception", e);
             throw new CSSException(e.getMessage());
@@ -268,7 +347,7 @@
     //--------------------------------------------------------------------------
     // helper methods
     //--------------------------------------------------------------------------
-    
+
     /** @return an RGB formatted hex string like #FFFFFF. */
     String getRGBString(LexicalUnit lu) {
       int rr = lu.getLexicalUnitType() == LexicalUnit.SAC_PERCENTAGE ?
@@ -288,7 +367,7 @@
         + (gg < 16 ? "0" : "") + Integer.toHexString(gg).toUpperCase()
         + (bb < 16 ? "0" : "") + Integer.toHexString(bb).toUpperCase();
     }
-    
+
     /**
       * Convert LexicalUnit to a Javascript value (represented
       * as a String).

Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java	2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java	2009-06-09 17:47:22 UTC (rev 14096)
@@ -151,8 +151,8 @@
         }
         pattern = tmp;
     }
-        
 
+
     /** Attempt to deduce the encoding of an XML file, by looking for the "encoding" attribute in the
      * XML declaration.
      * Default is to return "UTF-8"
@@ -165,7 +165,7 @@
         ByteArrayOutputStream bout = new ByteArrayOutputStream();
         send(input, bout);
         Perl5Matcher matcher = new Perl5Matcher();
-        
+
         byte [] array = bout.toByteArray();
         // We will ignore the byte order mark encoding for now,
         // hopefully no one is going to be using UTF16. I don't want
@@ -173,17 +173,82 @@
         // directive conflicts with the byte order mark.
         int skip = stripByteOrderMark( array );
         ByteArrayInputStream bais = new ByteArrayInputStream( array, skip, array.length );
-        
+
         if (matcher.contains(new String(array, 0, Math.min( 1024, array.length )), pattern)) {
             MatchResult result = matcher.getMatch();
             String encoding = result.group(1);
-            return new InputStreamReader( bais, encoding ); 
+            return new InputStreamReader( bais, encoding );
         } else {
-            return new InputStreamReader( bais, defaultEncoding ); 
+            return new InputStreamReader( bais, defaultEncoding );
         }
     }
 
     /**
+     * Retrieve the encoding of a text file based on a possibly existing
+     * Byte Order Marker (BOM) within the leading bytes of the file.
+     *
+     * see http://www.w3.org/TR/CSS21/syndata.html#charset.
+     *
+     * @param in BufferedInputStream, must be positioned at the first byte of the file
+     * @return the encoding if a BOM is found (UTF-8,UTF-16BE,UTF-16LE)
+     * @throws IOException
+     */
+    public static String detectBOMEncoding(BufferedInputStream in) throws IOException {
+        // The Byte Order Marker (BOM) is contained within the leading bytes of a
+        // if present. For UTF encoding there are 3 BOMs we need to check.
+        byte[][] utfBOMList = {
+                { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF },     // UTF-8
+                { (byte) 0xFE, (byte) 0xFF },                 // UTF-16BE
+                { (byte) 0xFF, (byte) 0xFE }                // UTF-16LE
+            };
+        String[] encodings = { "UTF-8", "UTF-16BE", "UTF-16LE" };
+
+        int maxBytesToRead = 100; // maximum number of bytes which need to be read
+        in.mark(maxBytesToRead + 1);
+        int found = -1; // index into leadingBytes if there is a match
+        byte[] buffer = new byte[maxBytesToRead];
+
+        // Read bytes that might contain BOM
+        int results = in.read(buffer); // max number of bytes read to determine
+        if (results == -1) {
+            mLogger.error("Reading bytes was unsuccessful!");
+            throw new IOException();
+        } else if (mLogger.isDebugEnabled()) {
+            mLogger.debug("Read the following bytes: " + new String(buffer));
+        }
+
+        // find a match
+        for (int i = 0; i < utfBOMList.length; i++) {
+            byte[] bytes = utfBOMList[i];
+            found = i;
+            if (mLogger.isDebugEnabled()) {
+            mLogger.debug("Testing for " + encodings[i] + " BOM!");
+            }
+            for (int j = 0; j < bytes.length; j++) {
+                if (bytes[j] != buffer[j]) {
+                    found = -1;
+                    break;
+                }
+            }
+            if (found != -1) {
+                if (mLogger.isDebugEnabled()) {
+                mLogger.debug("Found BOM on file, encoding is " + encodings[found]);
+                }
+                break;
+            }
+        }
+
+        if (found != -1) {
+            return encodings[found];
+        } else {
+            if (mLogger.isDebugEnabled()) {
+            mLogger.debug("no BOM found in file!");
+            }
+            return null;
+        }
+    }
+
+    /**
      * Set up a reader for an XML file with the correct charset encoding, and strip
      * out any Unicode Byte Order Mark if there is one present. We need to scan the file
      * once to try to parse the charset encoding in the XML declaration.
@@ -243,7 +308,7 @@
         try {
             // We need to peek at the stream and if the first three chars
             // are a UTF-8 or UTF-16 encoded BOM (byte order mark) we will
-            // discard them. 
+            // discard them.
             int c1 = ((int) raw[0]) & 0xff;
             int c2 = ((int) raw[1]) & 0xff;
             int c3 = ((int) raw[2]) & 0xff;
@@ -736,8 +801,8 @@
         }
     }
 
-    
 
+
     /**
        Find maximum common prefix of path1 and path2
      */
@@ -756,7 +821,7 @@
         } else {
             return path1.substring(0, i);
         }
-        
+
     }
 
 

Copied: openlaszlo/branches/4.4/test/css/encoding (from rev 14095, openlaszlo/trunk/test/css/encoding)

Deleted: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css

Copied: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/iso8859-1_with_charset_attr.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/iso8859-1_with_charset_attr.lzx)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css

Copied: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16BE_with_BOM.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16BE_with_BOM.lzx)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css

Copied: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16LE_with_BOM.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16LE_with_BOM.lzx)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_no_BOM_no_charset_attr.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_no_charset_attr.css)

Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx

Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx)



More information about the Laszlo-checkins mailing list