If you think your shell can only display text, you are wrong ! Well ... half wrong.
In fact, your shell can only print text, but it has features to display stylized text and modern shell support unicode. Today we will see how to make use of that to create more pretty application.
Short disclaimer: I assume you are on a unix environment with a unicode compatible terminal.
If this is not the case (Windows users I'm talking to you), it is still possible to work out what is exposed here, but it needs some tools and/or libraries that are out of the scope of this article.
Feel free to try googling on how to do it if you are interested. |
|
Paint me like one of your french girl
You probably have noticed that your terminal can display colors,
it is a feature that makes use of escape sequences (or ANSI escape
codes). Those sequences are a serie of characters preceded by the
Escape character, which is the 33th character in the ASCII table.
They will be interpreted by the shell and will result in multiple
actions applied to the output of the terminal. You can use ANSI
codes for setting the style of your text (italic, bold, ...),
moving your cursor and a bit much more.
The sequence starts with the character ESC (hex
0x1B
/octal 033
) and the
Control Sequence Introducer [
to let the shell
know you are starting to give control sequence codes:
\033[ # most commonly used
\x1B[
Then you append your control sequence (complete list).
Here is a description of the most commonly used sequence:
Finally you end the sequence with a m
Extended colors works by adding the sequences
5;x
where x is color index (0..255) or
2;r;g;b
where r,g,b are red, green and blue
color channels (out of 255). You can print out every colors of
your terminal using the following script:
#!/bin/bash
# This program is free software. It comes without any warranty, to
# the extent permitted by applicable law. You can redistribute it
# and/or modify it under the terms of the Do What The Fuck You Want
# To Public License, Version 2, as published by Sam Hocevar. See
# http://sam.zoy.org/wtfpl/COPYING for more details.
for fgbg in 38 48 ; do #Foreground/Background
for color in {0..256} ; do #Colors
#Display the color
echo -en "\e[${fgbg};5;${color}m ${color}\t\e[0m"
#Display 10 colors per lines
if [ $((($color + 1) % 10)) == 0 ] ; then
echo #New line
fi
done
echo #New line
done
exit 0
You should now be able to stylize your output with colors and
stylized text.
What we've got here is failure to communicate.
Let's talk about unicode to customize a bit more our shell. >
Unicode is a computing industry standard for the consistent
encoding, representation, and handling of text expressed in most
of the world's writing systems.
Says wikipedia, in short it is a standard set
of more than 120,000 characters.
It allows you to insert some special characters like symbols,
emoticons, arabic letters, mathematics symbols and much more in
your shell. But as unicode is just the set, it needs a coding to
transmit the position of the characters in the table and for that
you can use plenty of encoding (UTF7/8/16/32)...
Small flashback, ASCII is a 7-bit coding that allow to print 127 characters. Since it was not much, we start to see 8-bit set of characters like ISO-8859-15. But it is still not enough so (big shortcut here) unicode appears.
Unicode it most commonly transmitted with utf8 and utf7 when used in mails, utf-16 and 32 are not efficient since they require respectivly 2 and 4 bytes for every characters (even for just a 0x01
...)
Still, Windows uses utf16, so when you save your document as "unicode" it means "unicode with utf16 coding".
Only utf32 can store more than 21 bits, all other utf coding uses a serie of bits to extend with other bytes and loses space for information in the process.
UTF8 is the best fitted because it is a variable length encoding (so its less heavy for sending single byte characters) and use a single bit to switch length (utf7 uses +
character, utf16 uses the full range of bytes from 0x11011xxx
).
The first bit that is not used in ASCII is used in utf8 to specify that the length is more than one byte.
So for a characters that need more than 7 bits, its leading byte starts with 1
s, followed by a 0
which gives 1110xxxx
. the next bytes will have all start with 10
as this 10xxxxxx
.
The result 1110xxxx 10xxxxxx 10xxxxxx
can store up to 16 bits.
The longuest series of bytes `11111110 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
can store 42 bits even if the standart limit to 4 bytes (21 bits).
For instance, to encode the yin yang symbol ☯ which is the 9775th
characters in the unicode table, we convert this number to binary
which gives:
00100110 00101111
It does not fit in one single byte, so we split it
00100110 00---101111
our last byte will be 10*101111*
we still have 100110 00
, which does not fit in
a single byte, so we apply the same process
10---0110 00
our second byte will be 10*011000*
Finally we have 10
that can fit, so we count 3 total bytes, our first byte will starts with 1110
.
The full coded number will be:
11100010 10011000 10101111
In order to display it correctly you need 2 things:
If you terminal cannot interpret utf8 it will split one characters into many or the wrong character.
If you see boxes or questions marks, your font is incompatible.
Displaying unicode is pretty straigh forward. You can use the
\u2620
syntax or the bytes with
\xE2\x98\xA0
(since utf8 is ASCII
compatible...).
$ echo -e "\xE2\x98\xA0"
☠
You can then customize your boring script with cool arrows ▲ ▶ ► ▼, emoticons ☃☺☻✌☹♡♥, and so on...
This will give to the user an advanced output that can fit more properly in the context of the application.
If you followed right, you know that with escape sequences you can
move cursor where you want and that unicode can print symbols.
Because it is not easy to write \033[A
to move
the cursor up one line and not all terminal uses theses keys (some
uses ctrl+K
like the old televideo 920C
terminal), it was important to have a way to translate an action
into the correct sequence for the current terminal.
That's why curses was created, now replaced by ncurses. It is
available in C and have many bindings. You can use it to change
color, draw borders, move cursor around, get inputs, menus...
Here is a simple python example:
from curses import wrapper
def main(stdscr):
stdscr.border(0)
stdscr.addstr(2,2, "Hello World", curses.A_STANDOUT)
stdscr.refresh()
stdscr.getkey()
wrapper(main)
Finally, combining everything we saw previously, the very talented
Justine Tunney created hiptext which, given an image or a video,
is able to convert it to a series of escapes sequences and unicode
characters to display it.
You can compile it using the very simple following procedure:
# Installing depedency:
apt-get install build-essential libpng12-dev libjpeg-dev \
libfreetype6-dev libgif-dev ragel libavformat-dev libavcodec-dev \
libswscale-dev libgflags-dev libgoogle-glog-dev
# compile and install
make && make install
Then you can use it like this:
hiptext image.png
That will print the output directly to stdout on screen, you can
save it to file for later use by redirecting to a file:
hiptext -xterm256unicode video.mp4 1> output.sh
If you want to see a great example output, this article provide a
short demonstration you can run with this command (ensure your
terminal is set to a size more than 105x80
to
enjoy it perfectly):
curl -L https://cdn.rawgit.com/cyrbil/Script-Bazar/master/TermVideo/video256unicode.bz2 | bzcat
This conclude our short introduction on how to create great
terminal applications with colors, unicode, borders and even
images or videos.
Enjoy & code safe