Mazovia encoding

Mazovia encoding
Kermit	MAZOVIA
Alias(es)	cp667, cp790, cp991, MAZ
Language(s)	Polish
Classification	Extended ASCII, OEM code page
Based on	OEM-US
Other related encoding(s)	Fidonet Mazovia (MFD),; Mazovia 157,; FreeDOS-991
	v ; t ; e ;

Last updated May 27, 2024

Mazovia encoding is a character set used under DOS to represent Polish text. The character set derives from code page 437, with specific positions modified to accommodate Polish letters. Notably, the Mazovia encoding maintains the block graphic characters from code page 437, distinguishing it from IBM's later official Central European code page 852, which failed to preserve all block graphics, leading to incorrect display in programs such as Norton Commander.

The Mazovia encoding was designed in 1984 by Jan Klimowicz of IMM [ pl ]. It was designed as part of a project to develop and produce a Polish IBM PC clone codenamed "Mazovia 1016 [ pl ]". The code page was specifically optimized for the peripheral devices commonly used with the Mazovia 1016 computer, including a graphics card with dual switchable graphics, a keyboard with US English and Russian layouts, and printers with Polish fonts. The Mazovia encoding gained widespread acceptance and distribution in Poland when the Polish National Bank (NBP) adopted it as a standard in 1986. The NBP played a significant role in facilitating the production of compatible computers by Ipaco, which utilized Taiwanese components under the guidance of Zbigniew Jakubas [ pl ] and Krzysztof Sochacki.

Some ambiguity exists in the official code page assignment for the Mazovia encoding:

PTS-DOS and S/DOS support this encoding under code page 667 (CP667).^[1] The same encoding was also called code page 991 (CP991) in some Polish software,^{[nb 1]} however, the FreeDOS implementation of code page 991 seems not to be identical to this original encoding. The DOS code page switching file NECPINW.CPI for NEC Pinwriters supports the Mazovia encoding under both code pages 667 and 991.^[1] FreeDOS has meanwhile introduced support for a variant of the Mazovia encoding under code page 790 (CP790) as well. The Fujitsu DL6400 (Pro) / DL6600 (Pro) printers support the Mazovia encoding as well.^[2] This encoding is known as code page 3843 in Star printers.

Character set

Each character is shown with its equivalent Unicode code point.^[3] Only the second half of the table (128–255) is shown, all of the first half (0–127) being the same as ASCII and code page 437.

Several variants of this encoding exists:

Mazovia with curly quotation marks („ is at 9D and ” is at A9). FreeDOS supports this variant under code page 790.
Mazovia 157 (ś is at 9D instead of 9E)
Fido Mazovia (ć is at 0x87 instead of 8D and Ć is at 0x80 instead of 0x95)
zł Mazovia (złoty sign at 0x9B, like in the original ROM of the Mazovia 1016 computer). This variant was also supported by EGAPL v3.2, a DOS TSR providing polish glyphs that was popular in Poland in the 90's. FreeDOS supports this variant under code page 991 (which also has § (section sign) at 0xA8), although the original definition of code page 991, which pre-dates FreeDOS, appears to have been identical to code page 667.

These variants are not fully compliant with the definition of code page 667 and should therefore not be associated with this number.

Code page 667
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
8x 128	Ç 00C7	ü 00FC	é 00E9	â 00E2	ä 00E4	à 00E0	ą 0105	ç 00E7	ê 00EA	ë 00EB	è 00E8	ï 00EF	î 00EE	ć 0107	Ä 00C4	Ą 0104
9x 144	Ę 0118	ę 0119	ł 0142	ô 00F4	ö 00F6	Ć 0106	û 00FB	ù 00F9	Ś 015A	Ö 00D6	Ü 00DC	¢ 00A2	Ł 0141	¥ 00A5	ś 015B	ƒ 0192
Ax 160	Ź 0179	Ż 017B	ó 00F3	Ó 00D3	ń 0144	Ń 0143	ź 017A	ż 017C	¿ 00BF	⌐ 2310	¬ 00AC	½ 00BD	¼ 00BC	¡ 00A1	« 00AB	» 00BB
Bx 176	░ 2591	▒ 2592	▓ 2593	│ 2502	┤ 2524	╡ 2561	╢ 2562	╖ 2556	╕ 2555	╣ 2563	║ 2551	╗ 2557	╝ 255D	╜ 255C	╛ 255B	┐ 2510
Cx 192	└ 2514	┴ 2534	┬ 252C	├ 251C	─ 2500	┼ 253C	╞ 255E	╟ 255F	╚ 255A	╔ 2554	╩ 2569	╦ 2566	╠ 2560	═ 2550	╬ 256C	╧ 2567
Dx 208	╨ 2568	╤ 2564	╥ 2565	╙ 2559	╘ 2558	╒ 2552	╓ 2553	╫ 256B	╪ 256A	┘ 2518	┌ 250C	█ 2588	▄ 2584	▌ 258C	▐ 2590	▀ 2580
Ex 224	α 03B1	ß 00DF	Γ 0393	π 03C0	Σ 03A3	σ 03C3	µ 00B5	τ 03C4	Φ 03A6	Θ 0398	Ω 03A9	δ 03B4	∞ 221E	φ 03C6	ε 03B5	∩ 2229
Fx 240	≡ 2261	± 00B1	≥ 2265	≤ 2264	⌠ 2320	⌡ 2321	÷ 00F7	≈ 2248	° 00B0	∙ 2219	· 00B7	√ 221A	ⁿ 207F	² 00B2	■ 25A0	NBSP

Differences from code page 437

Notes

↑ The Polish text converter PLC, developed by Marcin Gryszkalis between 1997-1999, supports the standard Mazovia encoding under code page 991 as well as under the symbolic handle MAZ. The Fidonet Mazovia encoding is supported under symbolic handle MFD instead.

Related Research Articles

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

The yen and yuan sign (¥) is a currency sign used for the Japanese yen and the Chinese yuan currencies when writing in Latin scripts. This character resembles a capital letter Y with a single or double horizontal stroke. The symbol is usually placed before the value it represents, for example: ¥50, or JP¥50 and CN¥50 when disambiguation is needed. When writing in Japanese and Chinese, the Japanese kanji and Chinese character is written following the amount, for example ５０円 in Japan, and ５０元 or ５０圆 in China.

<span class="mw-page-title-main">Dingbat</span> Typographic symbol class

In typography, a dingbat is an ornament, specifically, a glyph used in typesetting, often employed to create box frames, or as a dinkus. Some of the dingbat symbols have been used as signature marks or used in bookbinding to order sections.

4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in Microsoft DOS and Windows. It was written by Rex C. Conn and Tom Rawson and first released in 1989. Compared to the default, it has a large number of enhancements.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

<span class="mw-page-title-main">Code page 850</span> Computer character set for Latin scripts

Code page 850 is a code page used under DOS operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 is the primary code page and default OEM code page in many countries, including various English-speaking locales, whilst other English-speaking locales default to the hardware code page 437.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

The Kamenický encoding, named for the brothers Jiří and Marian Kamenický, was a code page for personal computers running DOS, very popular in Czechoslovakia around 1985–1995. Another name for this encoding is KEYBCS2, the name of the terminate-and-stay-resident utility which implemented the matching keyboard driver. It was also named KAMENICKY.

<span class="mw-page-title-main">Code page 866</span> Computer character set for Russian

Code page 866 is a code page used under DOS and OS/2 in Russia to write Cyrillic script. It is based on the "alternative code page" developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR. The code page was widely used during the DOS era because it preserves all of the pseudographic symbols of code page 437 and maintains alphabetic order of Cyrillic letters. Initially this encoding was only available in the Russian version of MS-DOS 4.01 (1990), but with MS-DOS 6.22 it became available in any language version.

Code page 852 is a code page used under DOS to write Central European languages that use Latin script.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

Personal Printer Data Stream is a general name for a family of page description language used by IBM printers, which includes all Proprinter, Quietwriter, Quickwriter, LaserPrinter 4019, and LaserPrinter 4029 commands.

ESC/P, short for Epson Standard Code for Printers and sometimes styled Escape/P, is a printer control language developed by Epson to control computer printers. It was mainly used in dot matrix printers and some inkjet printers, and is still widely used in many receipt thermal printers. During the era of dot matrix printers, it was also used by other manufacturers, sometimes in modified form. At the time, it was a popular mechanism to add formatting to printed text, and was widely supported in software.

In computing, a hardware code page (HWCP) refers to a code page supported natively by a hardware device such as a display adapter or printer. The glyphs to present the characters are stored in the alphanumeric character generator's resident read-only memory and are thus not user-changeable. They are available for use by the system without having to load any font definitions into the device first. Startup messages issued by a PC's System BIOS or displayed by an operating system before initializing its own code page switching logic and font management and before switching to graphics mode are displayed in a computer's default hardware code page.

CWI-2 is a Hungarian code page frequently used in the 1980s and early 1990s. If this code page is erroneously interpreted as code page 437, it will still be fairly readable.

<span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

The GEM character set is the character set of Digital Research's graphical user interface GEM on Intel platforms. It is based on code page 437, the original character set of the IBM PC.

Mac OS Sámi is a character encoding used on classic Mac OS to represent the Sámi languages and the Finnish Kalo language. While not used in any official Apple product, it has been used in various fonts designed to support Sámi languages under classic Mac OS, including those from Evertype. FreeDOS calls it code page 58630.

References

1 2 Paul, Matthias R. (2001) [1996]. "Specification and reference documentation for NECPINW". NECPINW.CPI - DOS code page switching driver for NEC Pinwriters (2.08 ed.). FILESPEC.TXT from NECPI208.ZIP. Archived from the original on 2017-09-10. Retrieved 2013-04-22.
↑ Fujitsu DL6400/DL6600 Dot Matrix Printer User's Manual (PDF). Fujitsu Limited. April 1994. C147-E015-01EN. Archived (PDF) from the original on 2016-06-14. Retrieved 2016-06-14.
↑ Pinwriter Familie - Pinwriter - Epromsockel - Zusätzliche Zeichensätze / Schriftarten (Printed reference manual for optional font and codepage EPROMs for NEC Pinwriters, including custom variants) (in German) (00 3/93 ed.). NEC Deutschland GmbH. 1993. (NB. Some dot matrix printers of the NEC Pinwriter series, namely the P3200/P3300 (P20/P30), P6200/P6300 (P60/P70), P9300 (P90), P7200/P7300 (P62/P72), P22Q/P32Q, P3800/P3900 (P42Q/P52Q), P1200/P1300 (P2Q/P3Q), P2000 (P2X) and P8000 (P72X), supported the installation of optional font EPROMs, where this encoding was included in ROM #8 "Polish". It could be invoked via escape sequence ESC R (n) with (n) = 21.)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[NB_PLC-2] The Polish text converter PLC, developed by Marcin Gryszkalis between 1997-1999, supports the standard Mazovia encoding under code page 991 as well as under the symbolic handle MAZ. The Fidonet Mazovia encoding is supported under symbolic handle MFD instead.

[Paul_2001_NECPINW-1] 1 2 Paul, Matthias R. (2001) [1996]. "Specification and reference documentation for NECPINW". NECPINW.CPI - DOS code page switching driver for NEC Pinwriters (2.08 ed.). FILESPEC.TXT from NECPI208.ZIP. Archived from the original on 2017-09-10. Retrieved 2013-04-22.

[Fujitsu_1994_DL6400_DL6600-3] Fujitsu DL6400/DL6600 Dot Matrix Printer User's Manual (PDF). Fujitsu Limited. April 1994. C147-E015-01EN. Archived (PDF) from the original on 2016-06-14. Retrieved 2016-06-14.

[NEC_1993_EPROMs-4] Pinwriter Familie - Pinwriter - Epromsockel - Zusätzliche Zeichensätze / Schriftarten (Printed reference manual for optional font and codepage EPROMs for NEC Pinwriters, including custom variants) (in German) (00 3/93 ed.). NEC Deutschland GmbH. 1993. (NB. Some dot matrix printers of the NEC Pinwriter series, namely the P3200/P3300 (P20/P30), P6200/P6300 (P60/P70), P9300 (P90), P7200/P7300 (P62/P72), P22Q/P32Q, P3800/P3900 (P42Q/P52Q), P1200/P1300 (P2Q/P3Q), P2000 (P2X) and P8000 (P72X), supported the installation of optional font EPROMs, where this encoding was included in ROM #8 "Polish". It could be invoked via escape sequence ESC R (n) with (n) = 21.)

[1]

[nb 1]

[2]

[3]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Barents Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 668 708 720 737 770 773 775 776 777 778 850 851 852 853 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 897 899 903 904 932 936 942 949 950 951 1034 1040 1042 1043 1044 1098 1115 1116 1117 1118 1127 3846 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1124 1133
Windows code pages	CER-GS 932 936 (GBK) 950 1169 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1056 1057 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Mazovia encoding

Contents

Character set

See also

Notes

Related Research Articles

References