Despite all the wonderful text-oriented and processing tools in UNIX, one tool is surprisingly absent: an ability to generate some kind of text-based graph from an input stream. This would be useful for all sorts of things, but most notably for “eye-balling” the relative frequencies of similar data-sets. Such data-sets could be: logs of every sort, file-types in a directory, version control statistics, etc. The graph could be a simple thing, such as dashes that take up to the width of the current TTY. But as far as I could tell, no such tool exists.
Until now.
So here I present a step-by-step instruction on how to write such a tool in Perl. If you are hasty, you can simply download the the tool, rename it to just “histogram”, make it executable, and put it in your bin directory.
In the posts that follow, I’ll detail its usage and construction.
Here’s a quick example usage and output:
$ histogram ‘/ sshd\[[0-9]*\]: Connection closed by UNKNOWN/ { print substr($3,1,2) }’ /var/log/secure*
00:———————————————————————-
01:————————————————————————–
02:—————————————————————————
03:————————————————————————–
04:————————————————————————–
05:—————————————————————————
06:—————————————————————————
07:—————————————————————————–
08:—————————————————————————-
09:————————————————————————-
10:—————————————————————————-
11:—————————————————————————
12:———————————————————————–
13:————————————————————————-
14:——————————————-
15:—————————————
16:———————————–
17:————————————-
18:————————————–
19:————————————
20:————————————-
21:————————————
22:—————————————-
23:————————————–
What we’re trying to do here is get an idea how many times hackers are trying to penetrate the system with SSH attempts. So we use awk to look through the /var/log/secure logs for a string like “sshd … Connection closed by UNKNOWN” and print out the hour of the day each time the message occurred. Histogram then does the rest and prints out a “graph” so we can get an idea of the distribution of attack times — were they in the morning, the afternoon, all day, or what? In this case, it seems from midnight to about 2pm.