[Laszlo-checkins] r14096 - in openlaszlo/branches/4.4: . WEB-INF/lps/server/src/org/openlaszlo/compiler WEB-INF/lps/server/src/org/openlaszlo/css WEB-INF/lps/server/src/org/openlaszlo/utils test/css test/css/encoding
ptw@openlaszlo.org
ptw at openlaszlo.org
Tue Jun 9 10:47:30 PDT 2009
Author: ptw
Date: 2009-06-09 10:47:22 -0700 (Tue, 09 Jun 2009)
New Revision: 14096
Added:
openlaszlo/branches/4.4/test/css/encoding/
openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css
openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx
openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css
openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx
Removed:
openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css
openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx
openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css
openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css
openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx
Modified:
openlaszlo/branches/4.4/
openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java
openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
Log:
Merged revisions 14091-14095 via svnmerge from
http://svn.openlaszlo.org/openlaszlo/trunk
.......
r14091 | raju | 2009-06-09 12:25:06 -0400 (Tue, 09 Jun 2009) | 52 lines
Change 20090608-raju-Y by raju at ip-90-186-160-71.web.vodafone.de on 2009-06-08 20:09:02 CEST
in /Users/rajubitter/src/svn/openlaszlo/trunk-cssunicode
for http://rajubitter@svn.openlaszlo.org/openlaszlo/trunk
Summary: Fix for CSS parser uses incorrect file encoding
New Features: Adds an optional @charset to the stylesheet tag, in case the user wants to use a CSS file in a different encoding then utf-8
Bugs Fixed: LPP-8045
Technical Reviewer: ptw
QA Reviewer: (pending)
Doc Reviewer: (pending)
Documentation:
Release Notes:
Details:
+ StyleSheetCompiler.java: Handling for @charset on stylesheet tag added. Added a
2nd parameter with the encoding value to the CSSHandler.parse() call.
+ CSSHandler.java:
parse() method takes the encoding from the LZX stylesheet tag as a 2nd parameter.
getInputSource() method does a few more things now:
1) checks for a possible BOM on the CSS file
2) checks if a possible BOM conflicts with the value of the stylesheet tag's @charset
3) if there's a BOM, the BOM bytes are removed from the input stream
+ FileUtils.java:
Added method public static String detectBOMEncoding(BufferedInputStream in)
The method returns the BOM marker interpreted as one of the following strings:
UTF-8
UTF-16LE
UTF-16BE
Tests:
+ test files in folder test/css/encoding
The following test exist:
1) iso8859-1_with_charset_attr.lzx
Reading an iso-8859-2 encoded CSS file with some German special chars
2) utf16BE_with_BOM.lzx
Reading an utf-16 BE CSS file with BOM marker
3) utf16LE_with_BOM.lzx
Reading an utf-16 LE CSS file with BOM marker
4) utf8_with_BOM_no_charset_attr.lzx
Reading an utf-8 CSS file with BOM and no charset attribute on the stylesheet tag
5) utf8_with_BOM_conflicting_charset_attr.lzx
Reading a CSS with @charset value of utf-16, but CSS having a UTF-8 BOM marker, will
throw a compile error
.......
r14095 | raju | 2009-06-09 12:51:45 -0400 (Tue, 09 Jun 2009) | 25 lines
Change 20090609-raju-a by raju at Atlantia.local on 2009-06-09 18:42:33 CEST
in /Users/rajubitter/src/svn/openlaszlo/trunk-cssunicode
for http://rajubitter@svn.openlaszlo.org/openlaszlo/trunk
Summary:
New Features: These 2 css files have to be committed with skip-pre-commit-checks
Bugs Fixed: LPP-8045
Technical Reviewer: ptw
QA Reviewer: ptw
Doc Reviewer: (pending)
Documentation:
Release Notes:
Details:
Addition to commit 14091, missing CSS test files.
Tests:
.......
Property changes on: openlaszlo/branches/4.4
___________________________________________________________________
Name: svnmerge-integrated
- /openlaszlo/branches/4.1:1-10153 /openlaszlo/branches/4.2:1-12154,12181,13205,13778 /openlaszlo/branches/devildog:1-8432 /openlaszlo/branches/pagan-deities:1-7955,8825,10756-10920,10922-10928,10930-10935,11151,11207,11554,13476,13629 /openlaszlo/branches/paperpie:1-6504,6506-6574,6576-7135,7137-7235 /openlaszlo/branches/wafflecone:1-5746,5818-6068,6070-6205,6207-6213,6216-6265,6267-6368,6370-6431,6433-6450,6497,6509,6661,7097,7872 /openlaszlo/trunk:1-13937,13948-13952,13954-13968,13970,13972-13980,13982-13985,13987-14002,14021-14032,14034,14036-14067,14087
+ /openlaszlo/branches/4.1:1-10153 /openlaszlo/branches/4.2:1-12154,12181,13205,13778 /openlaszlo/branches/devildog:1-8432 /openlaszlo/branches/pagan-deities:1-7955,8825,10756-10920,10922-10928,10930-10935,11151,11207,11554,13476,13629 /openlaszlo/branches/paperpie:1-6504,6506-6574,6576-7135,7137-7235 /openlaszlo/branches/wafflecone:1-5746,5818-6068,6070-6205,6207-6213,6216-6265,6267-6368,6370-6431,6433-6450,6497,6509,6661,7097,7872 /openlaszlo/trunk:1-13937,13948-13952,13954-13968,13970,13972-13980,13982-13985,13987-14002,14021-14032,14034,14036-14067,14087,14091-14095
Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java 2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/compiler/StyleSheetCompiler.java 2009-06-09 17:47:22 UTC (rev 14096)
@@ -1,9 +1,9 @@
/* *****************************************************************************
- * StyleSheetCompiler.java
+* StyleSheetCompiler.java
* ****************************************************************************/
/* J_LZ_COPYRIGHT_BEGIN *******************************************************
-* Copyright 2001-2008 Laszlo Systems, Inc. All Rights Reserved. *
+* Copyright 2001-2009 Laszlo Systems, Inc. All Rights Reserved. *
* Use is subject to license terms. *
* J_LZ_COPYRIGHT_END *********************************************************/
@@ -29,6 +29,7 @@
private static Logger mLogger = Logger.getLogger(StyleSheetCompiler.class);
private static final String SRC_ATTR_NAME = "src";
+ private static final String CHARSET_ATTR_NAME = "charset";
StyleSheetCompiler(CompilationEnvironment env) {
super(env);
@@ -44,9 +45,9 @@
public void compile(Element element) {
try {
- if (mLogger.isInfoEnabled()) {
+ if (mLogger.isInfoEnabled()) {
mLogger.info("StyleSheetCompiler.compile called!");
- }
+ }
if (!element.getChildren().isEmpty()) {
throw new CompilationError("<stylesheet> elements can't have children",
@@ -56,11 +57,21 @@
String pathname = null;
String stylesheetText = element.getText();
String src = element.getAttributeValue(SRC_ATTR_NAME);
+ String encoding = element.getAttributeValue(CHARSET_ATTR_NAME);
+ if (encoding != null) {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.info("@charset=" + encoding + " found on stylesheet tag");
+ }
+ } else {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.info("no attribute @charset found on stylesheet tag, using default value " + encoding);
+ }
+ }
if (src != null) {
- if (mLogger.isInfoEnabled()) {
- mLogger.info("reading in stylesheet from src=" + src);
- }
+ if (mLogger.isInfoEnabled()) {
+ mLogger.info("reading in stylesheet from src=\"" + src + "\"");
+ }
// Find the css file
// Using the FileResolver accomplishes two nice things:
// 1, it searches the standard directory include paths
@@ -83,9 +94,9 @@
if (! resolvedFile.exists() ) {
resolvedFile = mEnv.resolve(src, base);
if (resolvedFile.exists()) {
- if (mLogger.isInfoEnabled()) {
+ if (mLogger.isInfoEnabled()) {
mLogger.info("Resolved css file to a file that exists!");
- }
+ }
} else {
mLogger.error("Could not resolve css file to a file that exists.");
throw new CompilationError("Could not find css file " + src);
@@ -93,14 +104,14 @@
}
// Actually parse and compile the stylesheet! W00t!
- CSSHandler fileHandler = CSSHandler.parse( resolvedFile );
+ CSSHandler fileHandler = CSSHandler.parse( resolvedFile, encoding );
this.compile(fileHandler, element);
} else if (stylesheetText != null && (!"".equals(stylesheetText))) {
- if (mLogger.isInfoEnabled()) {
+ if (mLogger.isInfoEnabled()) {
mLogger.info("inline stylesheet");
- }
+ }
CSSHandler inlineHandler = CSSHandler.parse(stylesheetText);
this.compile(inlineHandler, element);
//
@@ -156,9 +167,9 @@
}
void compile(CSSHandler handler, Element element) throws CompilationError {
- if (mLogger.isDebugEnabled()) {
+ if (mLogger.isDebugEnabled()) {
mLogger.debug("compiling CSSHandler using new unique names");
- }
+ }
String script = "";
for (int i=0; i < handler.mRuleList.size(); i++) {
Rule rule = (Rule)handler.mRuleList.get(i);
@@ -223,9 +234,9 @@
}
String buildConditionalSelectorJS(Condition cond, SimpleSelector simpleSelector) {
- if (mLogger.isDebugEnabled()) {
+ if (mLogger.isDebugEnabled()) {
mLogger.debug("Conditional selector: " + cond.toString());
- }
+ }
String condString = "no_match";
switch (cond.getConditionType()) {
case Condition.SAC_ID_CONDITION: /* #id */
@@ -234,9 +245,9 @@
break;
case Condition.SAC_ATTRIBUTE_CONDITION: // [attr] or [attr="val"] or elem[attr="val"]
- if (mLogger.isDebugEnabled()) {
+ if (mLogger.isDebugEnabled()) {
mLogger.debug("Attribute condition");
- }
+ }
AttributeCondition attrCond = (AttributeCondition) cond;
String name = attrCond.getLocalName();
String value = attrCond.getValue();
@@ -247,9 +258,9 @@
// localName of the null string. We don't write out the simple selector if
// it's not specified.
if (simpleSelector != null) {
- if (mLogger.isDebugEnabled()) {
+ if (mLogger.isDebugEnabled()) {
mLogger.debug("simple selector:" + simpleSelector.toString());
- }
+ }
if (simpleSelector.getSelectorType() == Selector.SAC_ELEMENT_NODE_SELECTOR) {
ElementSelector es = (ElementSelector)simpleSelector;
Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java 2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java 2009-06-09 17:47:22 UTC (rev 14096)
@@ -3,19 +3,23 @@
* ****************************************************************************/
/* J_LZ_COPYRIGHT_BEGIN *******************************************************
-* Copyright 2001-2008 Laszlo Systems, Inc. All Rights Reserved. *
+* Copyright 2001-2009 Laszlo Systems, Inc. All Rights Reserved. *
* Use is subject to license terms. *
* J_LZ_COPYRIGHT_END *********************************************************/
package org.openlaszlo.css;
import java.io.*;
+import java.nio.CharBuffer;
import java.util.*;
import java.util.regex.*;
import org.w3c.css.sac.*;
import org.apache.log4j.*;
import org.jdom.*;
+import org.openlaszlo.utils.FileUtils;
+import org.openlaszlo.compiler.CompilationError;
+
/**
* Handler used to parse CSS file and process style rules on a document element.
*
@@ -30,9 +34,9 @@
/** Logger. */
private static Logger mLogger = Logger.getLogger(CSSHandler.class);
- /** CSS parser factory. */
+ /** CSS parser factory. */
private static org.w3c.css.sac.helpers.ParserFactory mCSSParserFactory = null;
-
+
static {
// This system property is required for the SAC ParserFactory.
if (System.getProperty("org.w3c.css.sac.parser") == null) {
@@ -45,10 +49,10 @@
/**
* Entry point to creating a CSSHandler to read from an external
* stylesheet file
- * @param rootDir the directory where cssFile exists.
- * @param cssFile the css file to read.
+ * @param file the css file to read.
+ * @param String value of an optional charset attribute on the stylesheet tag
*/
- public static CSSHandler parse(File file)
+ public static CSSHandler parse(File file, String charsetAttrValue)
throws CSSException {
try {
mLogger.info("creating CSSHandler");
@@ -56,7 +60,8 @@
Parser parser = mCSSParserFactory.makeParser();
parser.setDocumentHandler(handler);
parser.setErrorHandler(handler);
- parser.parseStyleSheet(handler.getInputSource());
+ mLogger.info("Trying to parse CSS with charset setting of " + charsetAttrValue);
+ parser.parseStyleSheet(handler.getInputSource(charsetAttrValue,file.getPath()));
return handler;
} catch (CSSParseException e) {
mLogger.error("got css parse exception");
@@ -123,15 +128,15 @@
mRuleList = new Vector();
mFileDependencies = getFullPath();
}
-
+
/** protected constructor */
CSSHandler(String cssString) {
mFile = null; // No file associated with inline css
mRuleList = new Vector();
mFileDependencies = ""; // inline css doesn't add any file dependencies
}
-
+
/** Helper function to log and throw an error. */
void throwCSSException(String errMsg) throws CSSException {
mLogger.error(errMsg);
@@ -144,19 +149,93 @@
return mFile.getCanonicalPath();
} catch (IOException e) {
mLogger.error("Exception getting canonical path of: " + mFile + ", " + e.getMessage());
- return "";
+ return "";
}
}
- /** @return InputSource object pointing to the CSS file. */
- InputSource getInputSource() throws FileNotFoundException {
- InputSource is =
- new InputSource(new FileReader(mFile));
-// is.setEncoding("ISO-8859-1");
- return is;
+ /** @param charsetAttrValue charset value from the stylesheet tag in LZX
+ * @param the name of the CSS file we need to parse
+ * @return InputSource object pointing to the CSS file. */
+ InputSource getInputSource(String charsetAttrValue, String fileName) throws FileNotFoundException {
+ // Detect if there's a BOM with encoding information on the file.
+ // If there's a BOM that shouldn't conflict with a possible @charset
+ // attribute of the stylesheet tag in LZX, e.g. <stylesheet charset="iso-8859-15" />
+ BufferedInputStream bis = null;
+ InputSource inputSource = null;
+ InputStreamReader isr = null;
+ // Encoding read from BOM in CSS file
+ String bomEncoding = null;
+ // Encoding used for opening CSS file
+ String encoding = "utf-8";
+ try {
+ bis = new BufferedInputStream(new FileInputStream(mFile));
+ bomEncoding = FileUtils.detectBOMEncoding(bis);
+ } catch (IOException e) {
+ mLogger.error("IOException during BOM detection:\n" + e.getMessage());
+ throw new CompilationError("IO Exception while trying to open file");
+ }
+
+ // If we got a BOM encoding value, check if there no conflicting declaration
+ // on the stylesheet tag
+ if (bomEncoding != null) {
+ if (charsetAttrValue != null && !charsetAttrValue.toUpperCase().equals(bomEncoding.toUpperCase())) {
+ throw new CompilationError("<stylesheet charset=\"" + charsetAttrValue + "\"> conflicts with BOM "
+ + bomEncoding + " for CSS file " + fileName + ".");
+ }
+ encoding = bomEncoding;
+ } else if (charsetAttrValue != null) {
+ encoding = charsetAttrValue;
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Using encoding from LZX <stylesheet charset=\"" + encoding + "\">");
+ }
+ }
+
+ // Parse CSS file now
+ /* New code for reading stream */
+
+ try {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Opening CSS file " + fileName + " using encoding " + encoding);
+ }
+ if (bomEncoding != null && bomEncoding.toUpperCase().equals("UTF-8")) {
+ // TODO: Check what the 2nd parameter on the PushBackInputStream constructor means
+ PushbackInputStream internalIn = new PushbackInputStream(new FileInputStream(mFile), 3);
+ // skip the first 3 bytes
+ internalIn.skip(3);
+ isr = new InputStreamReader(internalIn, encoding);
+ inputSource = new InputSource(isr);
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Skip first 3 bytes containing UTF-8 BOM");
+ }
+ } else if (bomEncoding != null &&
+ (bomEncoding.toUpperCase().equals("UTF-16LE") || bomEncoding.toUpperCase().equals("UTF-16BE"))) {
+ // BOM for UTF-16
+ PushbackInputStream internalIn = new PushbackInputStream(new FileInputStream(mFile), 3);
+ // skip the first 2 bytes
+ internalIn.skip(2);
+ isr = new InputStreamReader(internalIn, encoding);
+ inputSource = new InputSource(isr);
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Skip first 3 bytes containing UTF-16 BOM");
+ }
+ } else {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("No need to skip bytes");
+ }
+ // no BOM, just use the normal InputStreamStreader
+ inputSource = new InputSource(new InputStreamReader(new FileInputStream(mFile), encoding));
+ }
+
+ } catch (UnsupportedEncodingException e) {
+ mLogger.error("Unsupported encoding in file: " + mFile + ", " + e.getMessage());
+ } catch (IOException e) {
+ mLogger.error("Error skipping BOM for InputStreamReader: " + e.getMessage());
+ }
+
+ return inputSource;
}
- /**
+ /**
* Get a string containing a list CSS files required by the parse. Includes
* imported CSS files.
* @return a list of CSS files separated by two file separators characters.
@@ -219,7 +298,7 @@
CSSHandler handler = new CSSHandler(new File(uri));
Parser parser = mCSSParserFactory.makeParser();
parser.setDocumentHandler(handler);
- parser.parseStyleSheet(handler.getInputSource());
+ parser.parseStyleSheet(handler.getInputSource(null,uri));
} catch (Exception e) {
mLogger.error("Exception", e);
throw new CSSException(e.getMessage());
@@ -268,7 +347,7 @@
//--------------------------------------------------------------------------
// helper methods
//--------------------------------------------------------------------------
-
+
/** @return an RGB formatted hex string like #FFFFFF. */
String getRGBString(LexicalUnit lu) {
int rr = lu.getLexicalUnitType() == LexicalUnit.SAC_PERCENTAGE ?
@@ -288,7 +367,7 @@
+ (gg < 16 ? "0" : "") + Integer.toHexString(gg).toUpperCase()
+ (bb < 16 ? "0" : "") + Integer.toHexString(bb).toUpperCase();
}
-
+
/**
* Convert LexicalUnit to a Javascript value (represented
* as a String).
Modified: openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
===================================================================
--- openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java 2009-06-09 16:51:45 UTC (rev 14095)
+++ openlaszlo/branches/4.4/WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java 2009-06-09 17:47:22 UTC (rev 14096)
@@ -151,8 +151,8 @@
}
pattern = tmp;
}
-
+
/** Attempt to deduce the encoding of an XML file, by looking for the "encoding" attribute in the
* XML declaration.
* Default is to return "UTF-8"
@@ -165,7 +165,7 @@
ByteArrayOutputStream bout = new ByteArrayOutputStream();
send(input, bout);
Perl5Matcher matcher = new Perl5Matcher();
-
+
byte [] array = bout.toByteArray();
// We will ignore the byte order mark encoding for now,
// hopefully no one is going to be using UTF16. I don't want
@@ -173,17 +173,82 @@
// directive conflicts with the byte order mark.
int skip = stripByteOrderMark( array );
ByteArrayInputStream bais = new ByteArrayInputStream( array, skip, array.length );
-
+
if (matcher.contains(new String(array, 0, Math.min( 1024, array.length )), pattern)) {
MatchResult result = matcher.getMatch();
String encoding = result.group(1);
- return new InputStreamReader( bais, encoding );
+ return new InputStreamReader( bais, encoding );
} else {
- return new InputStreamReader( bais, defaultEncoding );
+ return new InputStreamReader( bais, defaultEncoding );
}
}
/**
+ * Retrieve the encoding of a text file based on a possibly existing
+ * Byte Order Marker (BOM) within the leading bytes of the file.
+ *
+ * see http://www.w3.org/TR/CSS21/syndata.html#charset.
+ *
+ * @param in BufferedInputStream, must be positioned at the first byte of the file
+ * @return the encoding if a BOM is found (UTF-8,UTF-16BE,UTF-16LE)
+ * @throws IOException
+ */
+ public static String detectBOMEncoding(BufferedInputStream in) throws IOException {
+ // The Byte Order Marker (BOM) is contained within the leading bytes of a
+ // if present. For UTF encoding there are 3 BOMs we need to check.
+ byte[][] utfBOMList = {
+ { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF }, // UTF-8
+ { (byte) 0xFE, (byte) 0xFF }, // UTF-16BE
+ { (byte) 0xFF, (byte) 0xFE } // UTF-16LE
+ };
+ String[] encodings = { "UTF-8", "UTF-16BE", "UTF-16LE" };
+
+ int maxBytesToRead = 100; // maximum number of bytes which need to be read
+ in.mark(maxBytesToRead + 1);
+ int found = -1; // index into leadingBytes if there is a match
+ byte[] buffer = new byte[maxBytesToRead];
+
+ // Read bytes that might contain BOM
+ int results = in.read(buffer); // max number of bytes read to determine
+ if (results == -1) {
+ mLogger.error("Reading bytes was unsuccessful!");
+ throw new IOException();
+ } else if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Read the following bytes: " + new String(buffer));
+ }
+
+ // find a match
+ for (int i = 0; i < utfBOMList.length; i++) {
+ byte[] bytes = utfBOMList[i];
+ found = i;
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Testing for " + encodings[i] + " BOM!");
+ }
+ for (int j = 0; j < bytes.length; j++) {
+ if (bytes[j] != buffer[j]) {
+ found = -1;
+ break;
+ }
+ }
+ if (found != -1) {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("Found BOM on file, encoding is " + encodings[found]);
+ }
+ break;
+ }
+ }
+
+ if (found != -1) {
+ return encodings[found];
+ } else {
+ if (mLogger.isDebugEnabled()) {
+ mLogger.debug("no BOM found in file!");
+ }
+ return null;
+ }
+ }
+
+ /**
* Set up a reader for an XML file with the correct charset encoding, and strip
* out any Unicode Byte Order Mark if there is one present. We need to scan the file
* once to try to parse the charset encoding in the XML declaration.
@@ -243,7 +308,7 @@
try {
// We need to peek at the stream and if the first three chars
// are a UTF-8 or UTF-16 encoded BOM (byte order mark) we will
- // discard them.
+ // discard them.
int c1 = ((int) raw[0]) & 0xff;
int c2 = ((int) raw[1]) & 0xff;
int c3 = ((int) raw[2]) & 0xff;
@@ -736,8 +801,8 @@
}
}
-
+
/**
Find maximum common prefix of path1 and path2
*/
@@ -756,7 +821,7 @@
} else {
return path1.substring(0, i);
}
-
+
}
Copied: openlaszlo/branches/4.4/test/css/encoding (from rev 14095, openlaszlo/trunk/test/css/encoding)
Deleted: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css
Copied: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/iso8859-1_with_charset_attr.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/iso8859-1_with_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/iso8859-1_with_charset_attr.lzx)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css
Copied: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16BE_with_BOM.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/utf16BE_with_BOM.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16BE_with_BOM.lzx)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css
Copied: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16LE_with_BOM.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/utf16LE_with_BOM.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf16LE_with_BOM.lzx)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_no_BOM_no_charset_attr.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_no_BOM_no_charset_attr.lzx)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_conflicting_charset_attr.lzx)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.css (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_no_charset_attr.css)
Deleted: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx
Copied: openlaszlo/branches/4.4/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx (from rev 14095, openlaszlo/trunk/test/css/encoding/utf8_with_BOM_no_charset_attr.lzx)
More information about the Laszlo-checkins
mailing list