The new WASD 8.0 NCS feature was intensively beta-tested, works O.K. and
seems to be very useful. A number of charset conversion functions were
prepared while beta-testing.
* The distribution includes:
- NCS_CONVERT utility for file, stream and DCL symbol conversion;
- NCS_ALIAS.COM to build SYSTEM-wide charset alias name table;
and a number of charset declarations and conversion functions for NCS:
* $MISC section - misc useful conversions:
- CRLF_TO_CR, CRLF_TO_LF, CR_TO_CRLF, CR_TO_LF, LF_TO_CR, LF_TO_CRLF -
line termination conversion functions;
- DBLQUOTE - quote doubling function for DCL;
- URLENCODE, URLDECODE - URL encodind/decoding functions;
- HTMLENCODE, HTMLDECODE - HTML encodind/decoding functions;
- TEXT_TO_HTML - simple text-to-HTML conversion support function;
- URL_TO_VMS - simple URL to VMS filename conversion support function.
* EN section - english-related charsets:
- ISO88591 - ISO-8859-1 declaration;
- ISO88591_TO_MULTI, MULTI_TO_ISO88591 - ISO-8859-1/DEC_multinational
conversion functions (they do differ a little);
- ISO88591_TO_UTF8 - ISO-8859-1 to UTF-8 conversion function (tere are
little differences here too).
* RU section - cyrillic-related charsets:
- KOI8R, KOI8, ISO88595, CP1251, CP866, MACCYR - KOI8-r, KOI8,
ISO-8859-5, Windows-1251, ALT (DOS) and MAC-cyrillic declarations.
- *_TO_* - a number of corresponding inter-charset conversion
functions;
- *_TO_UTF8 - the above to UTF-8 conversion functions;
- *_TO_ISO88591 - the above to latin ISO-8859-1 transliteration
functions.
Any functions can be installed alone and used with WASD.
Here is the sample WASD configuration (cyrillic):
HTTPD$CONFIG.CONF
----------------------------------------------------------------
[CharsetConvert]
# Latin charset configuration (ignores the DEC-multinational/iso-8859-1 diffs)
iso-8859-1 iso-8859-1,latin-1
iso-8859-1 utf-8 ISO88591_to_UTF8=2
# Cyrillic charset configuration (including transliteration to iso-8859-1)
# The "translit" charset alias is used (don't use iso-8859-1 - it will
# cause the russian documents to be transliterated as most browsers report
# iso-8859-1 as the first Accept-Charset: for any language).
koi8-r koi8-r,koi8
koi8-r cp-1251,windows-1251,win-1251 KOI8r_to_CP1251=2
koi8-r iso-8859-5 KOI8r_to_ISO88595=2
koi8-r cp-866,CP866 KOI8r_to_CP866=2
koi8-r x-mac-cyrillic,mac-cyrillic,mac-cyr KOI8r_to_MacCyr=2
koi8-r utf-8 KOI8r_to_UTF8=2
koi8-r translit KOI8r_to_ISO88591=2
cp-1251 cp-1251,windows-1251,win-1251
cp-1251 koi8-r,koi8 CP1251_to_KOI8r=2
cp-1251 iso-8859-5 CP1251_to_ISO88595=2
cp-1251 cp-866,CP866 CP1251_to_CP866=2
cp-1251 x-mac-cyrillic,mac-cyrillic,mac-cyr CP1251_to_MacCyr=2
cp-1251 utf-8 CP1251_to_UTF8=2
cp-1251 translit CP1251_to_ISO88591=2
windows-1251 cp-1251,windows-1251,win-1251
windows-1251 koi8-r,koi8 CP1251_to_KOI8r=2
windows-1251 iso-8859-5 CP1251_to_ISO88595=2
windows-1251 cp-866,CP866 CP1251_to_CP866=2
windows-1251 x-mac-cyrillic,mac-cyrillic,mac-cyr CP1251_to_MacCyr=2
windows-1251 utf-8 CP1251_to_UTF8=2
windows-1251 translit CP1251_to_ISO88591=2
iso-8859-5 iso-8859-5
iso-8859-5 koi8-r,koi8 ISO88595_to_KOI8r=2
iso-8859-5 cp-1251,windows-1251,win-1251 ISO88595_to_CP1251=2
iso-8859-5 cp-866,CP866 ISO88595_to_CP866=2
iso-8859-5 x-mac-cyrillic,mac-cyrillic,mac-cyr ISO88595_to_MacCyr=2
iso-8859-5 utf-8 ISO88595_to_UTF8=2
iso-8859-5 translit ISO88595_to_ISO88591=2
cp-866 cp-866,CP866
cp-866 koi8-r,koi8 CP866_to_KOI8r=2
cp-866 iso-8859-5 CP866_to_ISO88595=2
cp-866 cp-1251,windows-1251,win-1251 CP866_to_CP1251=2
cp-866 x-mac-cyrillic,mac-cyrillic,mac-cyr CP866_to_MacCyr=2
cp-866 utf-8 CP866_to_UTF8=2
cp-866 translit CP866_to_ISO88591=2
x-mac-cyrillic x-mac-cyrillic,mac-cyrillic,mac-cyr
x-mac-cyrillic koi8-r,koi8 MacCyr_to_KOI8r=2
x-mac-cyrillic iso-8859-5 MacCyr_to_ISO88595=2
x-mac-cyrillic cp-1251,windows-1251,win-1251 MacCyr_to_CP1251=2
x-mac-cyrillic cp-866,CP866 MacCyr_to_CP866=2
x-mac-cyrillic utf-8 MacCyr_to_UTF8=2
x-mac-cyrillic translit MacCyr_to_ISO88591=2
----------------------------------------------------------------
HTTPD$MAP.CONF (for manual charset selection)
The way for the client to select charset manually.
Sometimes it's the only way to override the "strange"
browser configuration.
It's supposed that the english documents are located in
the /en/* tree and the russian - in the /ru/* tree.
----------------------------------------------------------------
redirect /koi/* /* http=accept-charset=koi8-r
redirect /win/* /* http=accept-charset=windows-1251
redirect /iso/* /* http=accept-charset=iso-8859-5
redirect /alt/* /* http=accept-charset=CP866
redirect /mac/* /* http=accept-charset=x-mac-cyrillic
redirect /utf/* /* http=accept-charset=utf-8
redirect /lat/ru/* /ru/* http=accept-charset=translit
redirect /lat/* /* http=accept-charset=iso-8859-1
----------------------------------------------------------------
NCS_CONVERT utility can be useful for DCL scripting for various
functions including (but not limited to) charset conversion of POST/PUT
requests (not available in WASD yet).
Some documents can be pre-converted to the mostly used charset
before exporting using NCS_CONVERT.
A number of useful convertion functions with /SYMBOL option can be
used in scripting.
Some useful examples:
$ NCS_CONVERT/CONV=ISO88591_TO_UTF8 FILE.TXT FILE_UTF.TXT
converts FILE.TXT from ISO-8859-1 to UTF-8 and writes it into FILE_UTF.TXT
$ NCS_CONVERT/CONV=ISO88591_TO_UTF8 FILE.TXT
converts FILE.TXT to UTF-8 and writes it to SYS$OUTPUT
$ NCS_CONVERT/CONV=UTF8_TO_ISO88591 HTTP$INPUT FILE.TXT
converts the PUT data from UTF-8 to ISO-8859-1 and writes it into FILE.TXT
$ NCS_CONVERT/CONV=UTF8_TO_ISO88591/SYMBOL=WWW_FORM_TEXT
converts the WWW_FORM_TEXT DCL symbol (the text
field of the UTF-8 charset HTML form) to ISO-8859-1
$ NCS_CONVERT/CONV=HTMLENCODE
HTMLencodes the text and writes it to SYS$OUTPUT
NCS_CONVERT is available from S&B download page
http://www.S-and-B.ru/en/download/