Happened to read a game from LinkedIn. An recruiter published a series words of '0' and '1' on my LinkedIn timeline. Which brought me back to the funny time struggling with huge logs from a large number of embedded cards. Can't wait to decode with shell :_)
Problem Recitation
Here is the original post. The contents are as below:
01001101 01111001 00100000 01100101 01101101 01100001 01101001 01101100 00100000 01101001 01110011 00100000 01101100 01110011 01110100 01100101 01110110 01100101 01101110 01000000 01110110 01101101 01110111 01100001 01110010 01100101 00101110 01100011 01101111 01101101 |
Solution
The problem could be quickly solved with programming language for sure. Perl will be my preference and I'd attach a Perl version in this article as well.
However, as a fan of shell cli, I will share the thoughts to resolve in shell cli. Let's save it to a text file, e.g. "original.txt". A traditional style of cli one-time quick parsing is to build a pipeline. Each time we focus on one thing only and communicate with pipe as mentioned on
With command "which", manpage and experience, we can find MacOS offers traditional Unix executables as "bc" (actually an interpreter) and "xxd" which can satisfy common Hex and ASC works.
The high level design in mind is: (1) Data Regularization and split data to datum items (2) For each data item: (2.1) Translate (0|1) binary string to Hex (2.2) Translate Hex back to ASC (3) Streamline result datum item back to sentence
Note 1: >The current post is to describe the thoughts with best memory. Well which does not suggest ones to formly take the tedious steps in real life. In practice, it shall happen just in couple of minutes. The model lives in mind on most of the on-hand tasklet. Using shell, we can keep our focus on the original logical problem and just pay a very limited bandwidth of our mind to finish the side path. The most valuable thing in daily life is concentration. If using Python or IPython will be faster and costs less bandwidth, go and get it without loosing focus on the original problem to solve.
Note 2: >A simple example of Map-Reduce is to count books in an library. We split to each shelf and count books on the specific shelf -- Map. When all have finished, we add numbers in sum -- Reduce.
Implementation
Step 1: data regularization
Using command "tr" to replace space with "CR" is a common step to split task for further processing per line with transient processes. Typically we consider a "Map" step at first.
Step 2: Process each datum
Translate 0|1 string to Hex
For each string representing binary value with (0|1)+, it could be translated to Hex with bc command.
echo "obase=16; ibase=2; 01001101"| bc |
Here obase and ibase are keywords as inputs to bc interpreter to identify the base number of input and output.
Note 3: >bc, (Basic Calculator) is much more powerful than a base number translator. It is an interpreter for calculation tasks. And here we also have other choice. The philosophy is to just pick up what we are familiar and save focus on real problem.
Translate Hex to ASC
This step is also traditional. xxd command is usually picked up to dump hex or reverse hex to text.
echo "4D"| xxd -r -p |
There are alternatives as od, hexdump to complete this step.
Step 3: The Final Command
Since we don't add CR into pipe bytes, there is no specific "Reduce" step to take care. The bytes will be processed in a streamline: 8 bytes of '0|1' are extracted by CR -> 8 bytes of '0|1' are translated to two bytes of a Hex value -> the 2 bytes Hex value is translated to one ASC byte -> Untill stdin closed.
for x in $(cat original.txt|tr ' ' "\n"); do echo "obase=16; ibase=2; $x"| bc| xxd -r -p; done |
Alternatives
The perl one-line-cli version. Cheers on powerful text processing engine.
cat original.txt |perl -lape '$_=pack"(B8)*",@F' |
Have fun!
Change Logs
2017-04-15 | Init the post to share a shell toy.