• boxes.utf8

    From Rob Swindell to All on Sunday, July 07, 2019 15:48:37
    UTF-8 encoded sample plain-text file ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

    Markus Kuhn [ˈmaʳkʊs kuːn] <mkuhn@acm.org> — 1999-08-20


    The ASCII compatible UTF-8 encoding of ISO 10646 and Unicode
    plain-text files is defined in RFC 2279 and in ISO 10646-1 Annex R.


    Using Unicode/UTF-8, you can write in emails and source code things such as

    Mathematics and Sciences:

    ∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),

    ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),

    2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm

    Linguistics and dictionaries:

    ði ıntəˈnæʃənəl fəˈnɛtık əsoʊsiˈeıʃn
    Y [ˈʏpsilɔn], Yen [jɛn], Yoga [ˈjoːgɑ]

    APL:

    ((V⍳V)=⍳⍴V)/V←,V ⌷←⍳→⍴∆∇⊃‾⍎⍕⌈

    Nicer typography in plain text files:

    ╔══════════════════════════════════════════╗
    ║ ║
    ║ • ‘single’ and “double” quotes ║
    ║ ║
    ║ • Curly apostrophes: “We’ve been here” ║
    ║ ║
    ║ • Latin-1 apostrophe and accents: '´` ║
    ║ ║
    ║ • ‚deutsche‘ „Anführungszeichen“ ║
    ║ ║
    ║ • †, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║
    ║ ║
    ║ • ASCII safety test: 1lI|, 0OD, 8B ║
    ║ ╭─────────╮ ║
    ║ • the euro symbol: │ 14.95 € │ ║
    ║ ╰─────────╯ ║
    ╚══════════════════════════════════════════╝

    Greek (in Polytonic):

    The Greek anthem:

    Σὲ γνωρίζω ἀπὸ τὴν κόψη
    τοῦ σπαθιοῦ τὴν τρομερή,
    σὲ γνωρίζω ἀπὸ τὴν ὄψη
    ποὺ μὲ βία μετράει τὴ γῆ.

    ᾿Απ᾿ τὰ κόκκαλα βγαλμένη
    τῶν ῾Ελλήνων τὰ ἱερά
    καὶ σὰν πρῶτα ἀνδρειωμένη
    χαῖρε, ὦ χαῖρε, ᾿Ελευθεριά!

    From a speech of Demosthenes in the 4th century BC:

    Οὐχὶ ταὐτὰ παρίσταταί μοι γιγνώσκειν, ὦ ἄνδρες ᾿Αθηναῖοι,
    ὅταν τ᾿ εἰς τὰ πράγματα ἀποβλέψω καὶ ὅταν πρὸς τοὺς
    λόγους οὓς ἀκούω· τοὺς μὲν γὰρ λόγους περὶ τοῦ
    τιμωρήσασθαι Φίλιππον ὁρῶ γιγνομένους, τὰ δὲ πράγματ᾿
    εἰς τοῦτο προήκοντα, ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αὐτοὶ
    πρότερον κακῶς σκέψασθαι δέον. οὐδέν οὖν ἄλλο μοι δοκοῦσιν
    οἱ τὰ τοιαῦτα λέγοντες ἢ τὴν ὑπόθεσιν, περὶ ἧς βουλεύεσθαι,
    οὐχὶ τὴν οὖσαν παριστάντες ὑμῖν ἁμαρτάνειν. ἐγὼ δέ, ὅτι μέν
    ποτ᾿ ἐξῆν τῇ πόλει καὶ τὰ αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον
    τιμωρήσασθαι, καὶ μάλ᾿ ἀκριβῶς οἶδα· ἐπ᾿ ἐμοῦ γάρ, οὐ πάλαι
    γέγονεν ταῦτ᾿ ἀμφότερα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν
    προλαβεῖν ἡμῖν εἶναι τὴν πρώτην, ὅπως τοὺς συμμάχους
    σώσομεν. ἐὰν γὰρ τοῦτο βεβαίως ὑπάρξῃ, τότε καὶ περὶ τοῦ
    τίνα τιμωρήσεταί τις καὶ ὃν τρόπον ἐξέσται σκοπεῖν· πρὶν δὲ
    τὴν ἀρχὴν ὀρθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι περὶ τῆς
    τελευτῆς ὁντινοῦν ποιεῖσθαι λόγον.

    Δημοσθένους, Γ´ ᾿Ολυνθιακὸς

    Georgian:

    From a Unicode conference invitation:

    გთხოვთ ახლავე გაიაროთ რეგისტრაცია Unicode-ის მეათე საერთაშორისო
    კონფერენციაზე დასასწრებად, რომელიც გაიმართება 10-12 მარტს,
    ქ. მაინცში, გერმანიაში. კონფერენცია შეჰკრებს ერთად მსოფლიოს
    ექსპერტებს ისეთ დარგებში როგორიცაა ინტერნეტი და Unicode-ი,
    ინტერნაციონალიზაცია და ლოკალიზაცია, Unicode-ის გამოყენება
    ოპერაციულ სისტემებსა, და გამოყენებით პროგრამებში, შრიფტებში,
    ტექსტების დამუშავებასა და მრავალენოვან კომპიუტერულ სისტემებში.

    Russian:

    From a Unicode conference invitation:

    Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
    Unicode, которая состоится 10-12 марта 1997 года в Майнце в Германии.
    Конференция соберет широкий круг экспертов по вопросам глобального
    Интернета и Unicode, локализации и интернационализации, воплощению и
    применению Unicode в различных операционных системах и программных
    приложениях, шрифтах, верстке и многоязычных компьютерных системах.

    Thai (UCS Level 2):

    Excerpt from a poetry on The Romance of The Three Kingdoms (a Chinese
    classic 'San Gua'):

    [----------------------------|------------------------]
    ๏ แผ่นดินฮั่นเสื่อมโทรมแสนสังเวช พระปกเกศกองบู๊กู้ขึ้นใหม่
    สิบสองกษัตริย์ก่อนหน้าแลถัดไป สององค์ไซร้โง่เขลาเบาปัญญา
    ทรงนับถือขันทีเป็นที่พึ่ง บ้านเมืองจึงวิปริตเป็นนักหนา
    โฮจิ๋นเรียกทัพทั่วหัวเมืองมา หมายจะฆ่ามดชั่วตัวสำคัญ
    เหมือนขับไสไล่เสือจากเคหา รับหมาป่าเข้ามาเลยอาสัญ
    ฝ่ายอ้องอุ้นยุแยกให้แตกกัน ใช้สาวนั้นเป็นชนวนชื่นชวนใจ
    พลันลิฉุยกุยกีกลับก่อเหตุ ช่างอาเพศจริงหนาฟ้าร้องไห้
    ต้องรบราฆ่าฟันจนบรรลัย ฤๅหาใครค้ำชูกู้บรรลังก์ ฯ

    (The above is a two-column text. If combining characters are handled
    correctly, the lines of the second column should be aligned with the
    | character above.)

    Ethiopian:

    Proverbs in the Amharic language:

    ሰማይ አይታረስ ንጉሥ አይከሰስ።
    ብላ ካለኝ እንደአባቴ በቆመጠኝ።
    ጌጥ ያለቤቱ ቁምጥና ነው።
    ደሀ በሕልሙ ቅቤ ባይጠጣ ንጣት በገደለው።
    የአፍ ወለምታ በቅቤ አይታሽም።
    አይጥ በበላ ዳዋ ተመታ።
    ሲተረጉሙ ይደረግሙ።
    ቀስ በቀስ፥ ዕንቁላል በእግሩ ይሄዳል።
    ድር ቢያብር አንበሳ ያስር።
    ሰው እንደቤቱ እንጅ እንደ ጉረቤቱ አይተዳደርም።
    እግዜር የከፈተውን ጉሮሮ ሳይዘጋው አይድርም።
    የጎረቤት ሌባ፥ ቢያዩት ይስቅ ባያዩት ያጠልቅ።
    ሥራ ከመፍታት ልጄን ላፋታት።
    ዓባይ ማደሪያ የለው፥ ግንድ ይዞ ይዞራል።
    የእስላም አገሩ መካ የአሞራ አገሩ ዋርካ።
    ተንጋሎ ቢተፉ ተመልሶ ባፉ።
    ወዳጅህ ማር ቢሆን ጨርስህ አትላሰው።
    እግርህን በፍራሽህ ልክ ዘርጋ።

    Runes:

    ᚻᛖ ᚳᚹᚫᚦ ᚦᚫᛏ ᚻᛖ ᛒᚢᛞᛖ ᚩᚾ ᚦᚫᛗ ᛚᚪᚾᛞᛖ ᚾᚩᚱᚦᚹᛖᚪᚱᛞᚢᛗ ᚹᛁᚦ ᚦᚪ ᚹᛖᛥᚫ

    (Old English, which transcribed into Latin reads 'He cwaeth that he
    bude thaem lande northweardum with tha Westsae.' and means 'He said
    that he lived in the northern land near the Western Sea.')

    Braille:

    ⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌

    ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠙⠑⠁⠙⠒ ⠞⠕ ⠃⠑⠛⠔ ⠺⠊⠹⠲ ⡹⠻⠑ ⠊⠎ ⠝⠕ ⠙⠳⠃⠞
    ⠱⠁⠞⠑⠧⠻ ⠁⠃⠳⠞ ⠹⠁⠞⠲ ⡹⠑ ⠗⠑⠛⠊⠌⠻ ⠕⠋ ⠙⠊⠎ ⠃⠥⠗⠊⠁⠇ ⠺⠁⠎
    ⠎⠊⠛⠝⠫ ⠃⠹ ⠹⠑ ⠊⠇⠻⠛⠹⠍⠁⠝⠂ ⠹⠑ ⠊⠇⠻⠅⠂ ⠹⠑ ⠥⠝⠙⠻⠞⠁⠅⠻⠂
    ⠁⠝⠙ ⠹⠑ ⠡⠊⠑⠋ ⠍⠳⠗⠝⠻⠲ ⡎⠊⠗⠕⠕⠛⠑ ⠎⠊⠛⠝⠫ ⠊⠞⠲ ⡁⠝⠙
    ⡎⠊⠗⠕⠕⠛⠑⠰⠎ ⠝⠁⠍⠑ ⠺⠁⠎ ⠛⠕⠕⠙ ⠥⠏⠕⠝ ⠰⡡⠁⠝⠛⠑⠂ ⠋⠕⠗ ⠁⠝⠹⠹⠔⠛ ⠙⠑
    ⠡⠕⠎⠑ ⠞⠕ ⠏⠥⠞ ⠙⠊⠎ ⠙⠁⠝⠙ ⠞⠕⠲

    ⡕⠇⠙ ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲

    ⡍⠔⠙⠖ ⡊ ⠙⠕⠝⠰⠞ ⠍⠑⠁⠝ ⠞⠕ ⠎⠁⠹ ⠹⠁⠞ ⡊ ⠅⠝⠪⠂ ⠕⠋ ⠍⠹
    ⠪⠝ ⠅⠝⠪⠇⠫⠛⠑⠂ ⠱⠁⠞ ⠹⠻⠑ ⠊⠎ ⠏⠜⠞⠊⠊⠥⠇⠜⠇⠹ ⠙⠑⠁⠙ ⠁⠃⠳⠞
    ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ ⡊ ⠍⠊⠣⠞ ⠙⠁⠧⠑ ⠃⠑⠲ ⠔⠊⠇⠔⠫⠂ ⠍⠹⠎⠑⠇⠋⠂ ⠞⠕
    ⠗⠑⠛⠜⠙ ⠁ ⠊⠕⠋⠋⠔⠤⠝⠁⠊⠇ ⠁⠎ ⠹⠑ ⠙⠑⠁⠙⠑⠌ ⠏⠊⠑⠊⠑ ⠕⠋ ⠊⠗⠕⠝⠍⠕⠝⠛⠻⠹
    ⠔ ⠹⠑ ⠞⠗⠁⠙⠑⠲ ⡃⠥⠞ ⠹⠑ ⠺⠊⠎⠙⠕⠍ ⠕⠋ ⠳⠗ ⠁⠝⠊⠑⠌⠕⠗⠎
    ⠊⠎ ⠔ ⠹⠑ ⠎⠊⠍⠊⠇⠑⠆ ⠁⠝⠙ ⠍⠹ ⠥⠝⠙⠁⠇⠇⠪⠫ ⠙⠁⠝⠙⠎
    ⠩⠁⠇⠇ ⠝⠕⠞ ⠙⠊⠌⠥⠗⠃ ⠊⠞⠂ ⠕⠗ ⠹⠑ ⡊⠳⠝⠞⠗⠹⠰⠎ ⠙⠕⠝⠑ ⠋⠕⠗⠲ ⡹⠳
    ⠺⠊⠇⠇ ⠹⠻⠑⠋⠕⠗⠑ ⠏⠻⠍⠊⠞ ⠍⠑ ⠞⠕ ⠗⠑⠏⠑⠁⠞⠂ ⠑⠍⠏⠙⠁⠞⠊⠊⠁⠇⠇⠹⠂ ⠹⠁⠞
    ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲

    (The first couple of paragraphs of "A Christmas Carol" by Dickens)

    Compact font selection example text:

    ABCDEFGHIJKLMNOPQRSTUVWXYZ /0123456789
    abcdefghijklmnopqrstuvwxyz £©µÀÆÖÞßéöÿ
    –—‘“”„†•…‰™œŠŸž€ ΑΒΓΔΩαβγδω АБВГДабвгд
    ∀∂∈ℝ∧∪≡∞ ↑↗↨↻⇣ ┐┼╔╘░►☺♀ fi�⑀₂ἠḂӥẄɐː⍎אԱა

    Greetings in various languages:

    Hello world, Καλημέρα κόσμε, コンニチハ

    Box drawing alignment tests: █

    ╔══╦══╗ ┌──┬──┐ ╭──┬──╮ ╭──┬──╮ ┏━━┳━━┓ ┎┒┏┑ ╷ ╻ ┏┯┓ ┌┰┐ ▊ ╱╲╱╲╳╳╳
    ║┌─╨─┐║ │╔═╧═╗│ │╒═╪═╕│ │╓─╁─╖│ ┃┌─╂─┐┃ ┗╃╄┙ ╶┼╴╺╋╸┠┼┨ ┝╋┥ ▋ ╲╱╲╱╳╳╳
    ║│╲ ╱│║ │║ ║│ ││ │ ││ │║ ┃ ║│ ┃│ ╿ │┃ ┍╅╆┓ ╵ ╹ ┗┷┛ └┸┘ ▌ ╱╲╱╲╳╳╳
    ╠╡ ╳ ╞╣ ├╢ ╟┤ ├┼─┼─┼┤ ├╫─╂─╫┤ ┣┿╾┼╼┿┫ ┕┛┖┚ ┌┄┄┐ ╎ ┏┅┅┓ ┋ ▍ ╲╱╲╱╳╳╳
    ║│╱ ╲│║ │║ ║│ ││ │ ││ │║ ┃ ║│ ┃│ ╽ │┃ ░░▒▒▓▓██ ┊ ┆ ╎ ╏ ┇ ┋ ▎
    ║└─╥─┘║ │╚═╤═╝│ │╘═╪═╛│ │╙─╀─╜│ ┃└─╂─┘┃ ░░▒▒▓▓██ ┊ ┆ ╎ ╏ ┇ ┋ ▏
    ╚══╩══╝ └──┴──┘ ╰──┴──╯ ╰──┴──╯ ┗━━┻━━┛ └╌╌┘ ╎ ┗╍╍┛ ┋ ▁▂▃▄▅▆▇█


    digital man

    Synchronet/BBS Terminology Definition #64:
    SMTP = Simple Message Transfer Protocol
    Norco, CA WX: 79.0F, 54.0% humidity, 14 mph ESE wind, 0.00 inches rain/24hrs
  • From Rob Swindell to All on Sunday, July 07, 2019 18:03:32
    Re: boxes.utf8
    By: Rob Swindell to All on Sun Jul 07 2019 03:48 pm

    UTF-8 encoded sample plain-text file
    ────────────────────────────────────

    Markus Kuhn [¿ma¿k¿s ku¿n] <mkuhn@acm.org> ─ 1999-08-20

    Test reply from a CP437 terminal.

    digital man

    Synchronet/BBS Terminology Definition #61:
    SEXPOTS = Synchronet External Plain Old Telephone System (POTS) service
    Norco, CA WX: 73.9°F, 61.0% humidity, 14 mph ENE wind, 0.00 inches rain/24hrs
  • From Maurice Kinal@1:153/7001 to Rob Swindell on Monday, July 08, 2019 02:19:46
    Hey Rob!

    Is this what was supposed to happen?

    UTF-8 encoded sample plain-text file ────────────────────────────────────

    Markus Kuhn [¿ma¿k¿s ku¿n] <mkuhn@acm.org> ─ 1999-08-20

    All I did was to convert the above from cp437 to utf-8 using iconv.

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From Rob Swindell to Maurice Kinal on Sunday, July 07, 2019 20:07:29
    Re: boxes.utf8
    By: Maurice Kinal to Rob Swindell on Mon Jul 08 2019 02:19 am

    Hey Rob!

    Is this what was supposed to happen?

    UTF-8 encoded sample plain-text file
    ────────────────────────────────────

    Markus Kuhn [¿ma¿k¿s ku¿n] <mkuhn@acm.org> ─ 1999-08-20

    All I did was to convert the above from cp437 to utf-8 using iconv.

    Yes, looks correct. You have to do that manually?

    digital man

    This Is Spinal Tap quote #43:
    I feel my role in the band is ... kind of like lukewarm water.
    Norco, CA WX: 66.3°F, 75.0% humidity, 9 mph ENE wind, 0.00 inches rain/24hrs
  • From Maurice Kinal@1:153/7001 to Rob Swindell on Monday, July 08, 2019 05:07:25
    Hey Rob!

    Yes, looks correct. You have to do that manually?

    Yes. How else could it be done? It would have been better if the original had
    already done the conversion if that is what they are after. Also mixing regular 8-bit encodings with utf8 should be avoided despite the fact that it works ... sort of ... not really.

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From Rob Swindell to Maurice Kinal on Sunday, July 07, 2019 23:36:11
    Re: boxes.utf8
    By: Maurice Kinal to Rob Swindell on Mon Jul 08 2019 05:07 am

    Hey Rob!

    Yes, looks correct. You have to do that manually?

    Yes. How else could it be done?

    Your message reader/viewer could assume CP437 if there's no CHRS: kludge and do the conversion automatically.

    It would have been better if the original
    had already done the conversion if that is what they are after. Also mixing regular 8-bit encodings with utf8 should be avoided despite the fact that it works ... sort of ... not really.

    I wasn't trying to mix "regular 8-bit encodings" with UTF-8. I posted what was supposed to be pure UTF-8, but my signature included a CP437 "degree" symbol (248). Other than that, it should have been pure UTF-8. And on my last test, I had the signature successfully auto-converted to UTF-8 when the body text is UTF-8.

    I also posted a test where I replied to and quoted the original UTF-8 message into a pure CP437 message. There was no mixture in that message: it was all CP437.

    digital man

    Synchronet "Real Fact" #74:
    Vertrauen went online (as a WWIV BBS running on a 10MHz PC-XT clone) in 1988. Norco, CA WX: 60.5°F, 89.0% humidity, 0 mph ESE wind, 0.00 inches rain/24hrs
  • From Maurice Kinal@1:153/7001 to Rob Swindell on Monday, July 08, 2019 09:02:43
    Hey Rob!

    Your message reader/viewer could assume CP437 if there's no
    CHRS: kludge and do the conversion automatically.

    It could but it doesn't. It makes no assumptions and if they are indeed cp437 they still are.

    but my signature included a CP437 "degree" symbol (248)

    Which I see still is. If I farm it out as a text stream and pipe that to "iconv -f cp437" it returns;

    Norco, CA WX: 60.5°F

    which is now utf8 -> U+00B0 or c2 b0 if you prefer. The original is still cp437.

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From waldo kitty@1:3634/12 to Maurice Kinal on Monday, July 08, 2019 14:16:36
    Re: boxes.utf8
    By: Maurice Kinal to Rob Swindell on Mon Jul 08 2019 05:07:25

    Yes, looks correct. You have to do that manually?

    Yes. How else could it be done? It would have been better if the
    original
    had already done the conversion if that is what they are after. Also
    mixing
    regular 8-bit encodings with utf8 should be avoided despite the fact that
    it
    works ... sort of ... not really.

    they all looked OK from here other than a word wrapping problem... i'm logged in from my console terminal via ssh and using sbbs' new unicode stuff... all i can say is "This Is Noice!"... ANSI works, CP437 above 127 are translated to unicode points properly, 99% of everything just seems to work as desired... whoohoo!


    )/\(aldo
    --- SBBSecho 3.07-Linux
    * Origin: SouthEast Star Mail HUB - SESTAR (1:3634/12)
  • From Maurice Kinal@1:153/7001 to waldo kitty on Monday, July 08, 2019 20:46:28
    Hey waldo!

    they all looked OK from here other than a word wrapping problem

    Same situation here although I haven't gone over everything with a fine tooth comb ... yet. Haven't checked for any missing 0x8d's. So far so good. :-)

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)