--- lib/HTML/Encoding.pm.orig	2011-03-27 14:53:03.000000000 -0700
+++ lib/HTML/Encoding.pm	2011-03-27 15:28:44.000000000 -0700
@@ -523,20 +523,20 @@
 
 =head1 WARNING
 
-The interface and implementation are guranteed to change before this
+The interface and implementation are guaranteed to change before this
 module reaches version 1.00! Please send feedback to the author of
 this module.
 
 =head1 DESCRIPTION
 
 HTML::Encoding helps to determine the encoding of HTML and XML/XHTML
-documents...
+documents.
 
 =head1 DEFAULT ENCODINGS
 
-Most routines need to know some suspected character encodings which
+Most routines need to know some suspected character encodings; these
 can be provided through the C<encodings> option. This option always
-defaults to the $HTML::Encoding::DEFAULT_ENCODINGS array reference
+defaults to the $HTML::Encoding::DEFAULT_ENCODINGS array reference,
 which means the following encodings are considered by default:
 
   * ISO-8859-1
@@ -546,7 +546,7 @@
   * UTF-32BE
   * UTF-8
 
-If you change the values or pass custom values to the routines note
+If you change the values or pass custom values to the routines, note
 that L<Encode> must support them in order for this module to work
 correctly.
 
@@ -554,7 +554,7 @@
 
 C<encoding_from_xml_document>, C<encoding_from_html_document>, and
 C<encoding_from_http_message> return in list context the encoding
-source and the encoding name, possible encoding sources are
+source and the encoding name. Possible encoding sources are:
 
   * protocol         (Content-Type: text/html;charset=encoding)
   * bom              (leading U+FEFF)
@@ -565,21 +565,21 @@
 
 =head1 ROUTINES
 
-Routines exported by this module at user option. By default, nothing
-is exported.
+Routines may be exported by this module at the user's option. By
+default, nothing is exported.
 
 =over 2
 
 =item encoding_from_content_type($content_type)
 
 Takes a byte string and uses L<HTTP::Headers::Util> to extract the
-charset parameter from the C<Content-Type> header value and returns
+charset parameter from the C<Content-Type> header value. Returns
 its value or C<undef> (or an empty list in list context) if there
 is no such value. Only the first component will be examined
 (HTTP/1.1 only allows for one component), any backslash escapes in
 strings will be unescaped, all leading and trailing quote marks
 and white-space characters will be removed, all white-space will be
-collapsed to a single space, empty charset values will be ignored
+collapsed to a single space, empty charset values will be ignored,
 and no case folding is performed.
 
 Examples:
@@ -596,28 +596,28 @@
   | "text/html;charset=\" UTF-8 \""         | 'UTF-8'   |
   +-----------------------------------------+-----------+
 
-If you pass a string with the UTF-8 flag turned on the string will
+If you pass a string with the UTF-8 flag turned on, the string will
 be converted to bytes before it is passed to L<HTTP::Headers::Util>.
-The return value will thus never have the UTF-8 flag turned on (this
-might change in future versions).
+The return value will thus never have the UTF-8 flag turned on. (This
+might change in future versions.)
 
 =item encoding_from_byte_order_mark($octets [, %options])
 
-Takes a sequence of octets and attempts to read a byte order mark
-at the beginning of the octet sequence. It will go through the list
-of $options{encodings} or the list of default encodings if no
-encodings are specified and match the beginning of the string against
-any byte order mark octet sequence found.
-
-The result can be ambiguous, for example qq(\xFF\xFE\x00\x00) could
-be both, a complete BOM in UTF-32LE or a UTF-16LE BOM followed by a
-U+0000 character. It is also possible that C<$octets> starts with
+Takes a sequence of octets and attempts to read a byte order mark at the
+beginning of the octet sequence. It will go through the list of
+$options{encodings} (or the list of default encodings if no encodings
+are specified) and match the beginning of the string against any byte
+order mark octet sequence found.
+
+The result can be ambiguous. For example, qq(\xFF\xFE\x00\x00) could
+be either a complete BOM in UTF-32LE or a UTF-16LE BOM followed by a
+U+0000 character. It is also possible for C<$octets> to start with
 something that looks like a byte order mark but actually is not.
 
-encoding_from_byte_order_mark sorts the list of possible encodings
-by the length of their BOM octet sequence and returns in scalar
-context only the encoding with the longest match, and all encodings
-ordered by length of their BOM octet sequence in list context.
+encoding_from_byte_order_mark sorts the list of possible encodings by
+the length of their BOM octet sequence. In scalar context, it returns
+only the encoding with the longest match. In list context, it returns
+all encodings ordered by length of their BOM octet sequence.
 
 Examples:
 
@@ -634,9 +634,9 @@
   | "\x2B\x2F\x76\x38\x2D"  | UTF-7      | qw(UTF-7)             |
   +-------------------------+------------+-----------------------+
 
-Note however that for UTF-7 it is in theory possible that the U+FEFF
-combines with other characters in which case such detection would fail,
-for example consider:
+Note, however, that for UTF-7 it is theoretically possible for U+FEFF to
+combine with other characters, in which case such detection would fail.
+For example, consider:
 
   +--------------------------------------+-----------+-----------+
   | Input                                | Encodings | Result    |
@@ -649,15 +649,17 @@
 relevant for most applications as there should never be need to use
 UTF-7 in the encoding list for existing documents.
 
-If no BOM can be found it returns C<undef> in scalar context and an
-empty list in list context. This routine should not be used with
-strings with the UTF-8 flag turned on. 
+If no BOM can be found, it returns C<undef> in scalar context or an
+empty list in list context.
+
+This routine should not be used with strings with the UTF-8 flag turned
+on. 
 
 =item encoding_from_xml_declaration($declaration)
 
 Attempts to extract the value of the encoding pseudo-attribute in an XML
 declaration or text declaration in the character string $declaration. If
-there does not appear to be such a value it returns nothing. This would
+there does not appear to be such a value, it returns nothing. This would
 typically be used with the return values of xml_declaration_from_octets.
 Normalizes whitespaces like encoding_from_content_type.
 
@@ -688,12 +690,14 @@
 =item xml_declaration_from_octets($octets [, %options])
 
 Attempts to find a ">" character in the byte string $octets using the
-encodings in $encodings and upon success attempts to find a preceding
-"<" character. Returns all the strings found this way in the order of
-number of successful matches in list context and the best match in
-scalar context. Should probably be combined with the only user of this
-routine, encoding_from_xml_declaration... You can modify the list of
-suspected encodings using $options{encodings};
+encodings in $encodings, and upon success attempts to find a preceding
+"<" character. In list context, returns all the strings found this way
+in the order of number of successful matches; or in scalar context,
+returns the best match. You can modify the list of suspected encodings
+using $options{encodings};
+
+(Should probably be combined with the only user of this routine,
+encoding_from_xml_declaration...)
 
 =item encoding_from_first_chars($octets [, %options])
 
@@ -707,9 +711,9 @@
 document is a HTML document) to get at least a base encoding which can be
 used to decode enough of the document to find <meta> elements using
 encoding_from_meta_element. $options{whitespace} defaults to qw/CR LF SP TB/.
-Returns nothing if unsuccessful. Returns the matching encodings in order
-of the number of octets matched in list context and the best match in
-scalar context.
+Returns nothing if unsuccessful. In list context, returns the matching
+encodings in order of the number of octets matched. In scalar context,
+returns the best match.
 
 Examples:
 
@@ -742,14 +746,14 @@
   <meta http-equiv=Content-Type content='...'>
   
 are found, uses encoding_from_content_type to extract the charset
-parameter. It returns all such encodings it could find in document
-order in list context or the first encoding in scalar context (it
-will currently look for others regardless of calling context) or
-nothing if that fails for some reason.
+parameter. It returns (in list context) all such encodings it could find
+in document order, or (in scalar context) the first encoding, or nothing
+if that fails for some reason. (Currently it will look for any and all
+encodings even when called in scalar context.)
 
-Note that there are many edge cases where this does not yield in
+Note that there are many edge cases where this does not yield
 "proper" results depending on the capabilities of the HTML::Parser
-version and the options you pass for it, for example,
+version and the options you pass for it. For example:
 
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" [
     <!ENTITY content_type "text/html;charset=utf-8">
@@ -759,19 +763,19 @@
   <p>...</p>
 
 This would likely not detect the C<utf-8> value if HTML::Parser
-does not resolve the entity. This should however only be a concern
+does not resolve the entity. This should, however, only be a concern
 for documents specifically crafted to break the encoding detection.
 
 =item encoding_from_xml_document($octets, [, %options])
 
-Uses encoding_from_byte_order_mark to detect the encoding using a
-byte order mark in the byte string and returns the return value of
-that routine if it succeeds. Uses xml_declaration_from_octets and
-encoding_from_xml_declaration and returns the encoding for which
-the latter routine found most matches in scalar context, and all
-encodings ordered by number of occurences in list context. It
-does not return a value of neither byte order mark not inbound
-declarations declare a character encoding.
+Uses encoding_from_byte_order_mark to detect the encoding using a byte
+order mark in the byte string. Returns the return value of that routine
+if it succeeds. Uses xml_declaration_from_octets and
+encoding_from_xml_declaration, and (in scalar context) returns the
+encoding for which the latter routine found most matches, or (in list
+context) all encodings ordered by number of occurences. It does not
+return a value of neither byte order mark not inbound declarations
+declare a character encoding.
 
 Examples:
 
@@ -787,12 +791,12 @@
   +----------------------------+----------+-----------+----------+
 
 Lacking a return value from this routine and higher-level protocol
-information (such as protocol encoding defaults) processors would
+information (such as protocol encoding defaults), processors would
 be required to assume that the document is UTF-8 encoded.
 
-Note however that the return value depends on the set of suspected
+Note, however, that the return value depends on the set of suspected
 encodings you pass to it. For example, by default, EBCDIC encodings
-would not be considered and thus for
+would not be considered, and thus for
 
   <?xml version='1.0' encoding='cp37'?>
   
@@ -803,7 +807,7 @@
 
 Uses encoding_from_xml_document and encoding_from_meta_element to
 determine the encoding of HTML documents. If $options{xhtml} is
-set to a false value uses encoding_from_byte_order_mark and 
+set to a false value, uses encoding_from_byte_order_mark and 
 encoding_from_meta_element to determine the encoding. The xhtml
 option is on by default. The $options{encodings} can be used to
 modify the suspected encodings and $options{parser_options} can
@@ -811,13 +815,13 @@
 encoding_from_meta_element (see the relevant documentation).
 
 Returns nothing if no declaration could be found, the winning
-declaration in scalar context and a list of encoding source
-and encoding name in list context, see ENCODING SOURCES.
+declaration in scalar context, or a list of encoding source
+and encoding name in list context. See L</"ENCODING SOURCES">.
 
 ...
 
 Other problems arise from differences between HTML and XHTML syntax
-and encoding detection rules, for example, the input could be
+and encoding detection rules. For example, the input could be:
 
   Content-Type: text/html
 
@@ -829,14 +833,14 @@
   <title></title>
   <p>...</p>
 
-This is a perfectly legal HTML 4.01 document and implementations
-might be expected to consider the document ISO-8859-2 encoded as
-XML rules for encoding detection do not apply to HTML documents.
-This module attempts to avoid making decisions which rules apply
-for a specific document and would thus by default return 'utf-8'
-for this input.
+This is a perfectly legal HTML 4.01 document and implementations might
+be expected to consider the document to have ISO-8859-2 encoding, as XML
+rules for encoding detection do not apply to HTML documents.  This
+module attempts to avoid making decisions on which rules apply for a
+specific document, and would thus by default return 'utf-8' for this
+input.
 
-On the other hand, if the input omits the encoding declaration,
+On the other hand, if the input omits the encoding declaration, thus:
 
   Content-Type: text/html
 
@@ -848,8 +852,10 @@
   <title></title>
   <p>...</p>
 
-It would return 'iso-8859-2'. Similar problems would arise from
-other differences between HTML and XHTML, for example consider
+it would return 'iso-8859-2'.
+
+Similar problems would arise from other differences between HTML and
+XHTML. For example, consider:
 
   Content-Type: text/html
 
@@ -864,69 +870,70 @@
   
 If this is processed using HTML rules, the first > will end the
 processing instruction and the XHTML document type declaration
-would be the relevant declaration for the document, if it is
+would be the relevant declaration for the document. If it is
 processed using XHTML rules, the ?> will end the processing
 instruction and the HTML document type declaration would be the
 relevant declaration.
 
-IOW, an application would need to assume a certain character
-encoding (family) to process enough of the document to determine
-whether it is XHTML or HTML and the result of this detection would
-depend on which processing rules are assumed in order to process it.
-It is thus in essence not possible to write a "perfect" detection
-algorithm, which is why this routine attempts to avoid making any
-decisions on this matter.
+In other words, an application would need to assume a certain character
+encoding (family) to process enough of the document to determine whether
+it is XHTML or HTML, and the result of this detection would depend on
+which processing rules are assumed in order to process it.  It is thus
+in essence not possible to write a "perfect" detection algorithm, which
+is why this routine attempts to avoid making any decisions on this
+matter.
 
 =item encoding_from_http_message($message [, %options])
 
-Determines the encoding of HTML / XML / XHTML documents enclosed
-in HTTP message. $message is an object compatible to L<HTTP::Message>,
-e.g. a L<HTTP::Response> object. %options is a hash with the following
-possible entries:
+Determines the encoding of HTML/XML/XHTML documents enclosed in an HTTP
+message. $message is an object compatible withL<HTTP::Message>, e.g. a
+L<HTTP::Response> object. %options is a hash with the following possible
+entries:
 
 =over 2
 
 =item encodings
 
-array references of suspected character encodings, defaults to
+Array references of suspected character encodings; defaults to
 C<$HTML::Encoding::DEFAULT_ENCODINGS>.
 
 =item is_html
 
 Regular expression matched against the content_type of the message
-to determine whether to use HTML rules for the entity body, defaults
+to determine whether to use HTML rules for the entity body; defaults
 to C<qr{^text/html$}i>.
 
 =item is_xml
 
 Regular expression matched against the content_type of the message
-to determine whether to use XML rules for the entity body, defaults
+to determine whether to use XML rules for the entity body; defaults
 to C<qr{^.+/(?:.+\+)?xml$}i>.
 
 =item is_text_xml
 
 Regular expression matched against the content_type of the message
-to determine whether to use text/html rules for the message, defaults
+to determine whether to use text/html rules for the message; defaults
 to C<qr{^text/(?:.+\+)?xml$}i>. This will only be checked if is_xml
-matches aswell.
+matches as well.
 
 =item html_default
 
-Default encoding for documents determined (by is_html) as HTML,
+Default encoding for documents determined (by is_html) as HTML;
 defaults to C<ISO-8859-1>.
 
 =item xml_default
 
-Default encoding for documents determined (by is_xml) as XML,
+Default encoding for documents determined (by is_xml) as XML;
 defaults to C<UTF-8>.
 
 =item text_xml_default
 
-Default encoding for documents determined (by is_text_xml) as text/xml,
-defaults to C<undef> in which case the default is ignored. This should
-be set to C<US-ASCII> if desired as this module is by default
-inconsistent with RFC 3023 which requires that for text/xml documents
-without a charset parameter in the HTTP header C<US-ASCII> is assumed.
+Default encoding for documents determined (by is_text_xml) as text/xml;
+defaults to C<undef>, in which case the default is ignored. This should
+be set to C<US-ASCII> if desired, as this module is by default
+inconsistent with RFC 3023; that RFC requires that for text/xml
+documents without a charset parameter in the HTTP header, C<US-ASCII> is
+assumed.
 
 This requirement is inconsistent with RFC 2616 (HTTP/1.1) which requires
 to assume C<ISO-8859-1>, has been widely ignored and is thus disabled by
@@ -935,18 +942,18 @@
 =item xhtml
 
 Whether the routine should look for an encoding declaration in the
-XML declaration of the document (if any), defaults to C<1>.
+XML declaration of the document (if any); defaults to C<1>.
 
 =item default
 
 Whether the relevant default value should be returned when no other
-information can be determined, defaults to C<1>.
+information can be determined; defaults to C<1>.
 
 =back
 
-This is furhter possibly inconsistent with XML MIME types that differ
-in other ways from application/xml, for example if the MIME Type does
-not allow for a charset parameter in which case applications might be
+This is possibly further inconsistent with XML MIME types that differ
+in other ways from application/xml (for example, if the MIME type does
+not allow for a charset parameter), in which case applications might be
 expected to ignore the charset parameter if erroneously provided.
 
 =back
@@ -954,17 +961,17 @@
 =head1 EBCDIC SUPPORT
 
 By default, this module does not support EBCDIC encodings. To enable
-support for EBCDIC encodings you can either change the
+support for EBCDIC encodings, you can either change the
 $HTML::Encodings::DEFAULT_ENCODINGS array reference or pass the
-encodings to the routines you use using the encodings option, for
-example
+encodings to the routines you use using the encodings option; for
+example:
 
   my @try = qw/UTF-8 UTF-16LE cp500 posix-bc .../;
   my $enc = encoding_from_xml_document($doc, encodings => \@try);
 
 Note that there are some subtle differences between various EBCDIC
-encodings, for example C<!> is mapped to 0x5A in C<posix-bc> and
-to 0x4F in C<cp500>; these differences might affect processing in
+encodings. For example, C<!> is mapped to 0x5A in C<posix-bc> and
+to 0x4F in C<cp500>. These differences might affect processing in
 yet undetermined ways.
 
 =head1 TODO
@@ -994,4 +1001,8 @@
   Copyright (c) 2004-2008 Bjoern Hoehrmann <bjoern@hoehrmann.de>.
   This module is licensed under the same terms as Perl itself.
 
+  This document has been edited for grammar, spelling, and clarity by
+  Larry Gilbert <l2g@macports.org> for the MacPorts Project. (Some
+  especially opaque passages have been left alone.)
+
 =cut