February 24, 2007

Geo Lookup on the Command Line

Category: Programming,UNIX Scripting — Raffael Marty @ 8:56 pm

By now you should know that I really like command line tools which operate well when applied to data through a pipe. I have posted quite a few tips already to do data manipulation on the command line. Today I wanted a quick way to lookup IP address locations and add them to a log file. After investigating a few free databases, I came accross Geo::IPFree, a Perl library which does the trick. So here is how you add the country code. First, this is the format of my log entries:

10/13/2005 20:25:54.032145,195.141.211.178,195.131.61.44,2071,135

I want to get the country of the source address (first IP in the log). Here we go:

cat pflog.csv | perl -M'Geo::IPfree' -na -F/,/ -e '($country,$country_name)=Geo::IPfree::LookUp($F[1]);chomp; print "$_,$country_name\n"'

And here the output:

10/13/2005 20:24:33.494358,62.245.243.139,212.254.111.99,,echo request,Europe

Very simple!

February 20, 2007

RSA 2007

Category: Uncategorized — Raffael Marty @ 1:30 am

Everybody blogged about RSA, I guess I am the only one who has not gotten around to do so until now. I did some pretty interesting survey of companies and how they use visualization in their products. I will try to publish some of my findings at a later point. The other thing which is always incredible about RSA is the people and the networking. I don’t know how many parties there were in total, but there were a lot. Just the parties I knew about were in the high teens. I even went to the Gala for a little bit but unfortunately I left too early. Otherwise I would have seen this life:

http://www.youtube.com/results?search_query=geeks+dance+rsa

I wish I could dance like this guy. Dude, he got moves!

February 17, 2007

R – Statistics and Plots

Category: Visualization — Raffael Marty @ 11:31 pm

I can’t believe it. I was fighting with R (the statistical package) for a while now. All the restrictions about data types and such are driving me crazy. It constantly complains that something is not a numerical type if I try to generate a histogram, etc. Well, I just found THE solution: R Commander:

apt-get install r-cran-rcmdr
R --gui=tk
library(Rcmdr)

You are in business! I love it! It just does things for you! I am back in business and can continue writing my book!

February 9, 2007

Web Server Log 3D Plot

Category: Log Analysis,Visualization — Raffael Marty @ 11:53 pm

I came accross this very well done Web Log Analysis. The author uses a 3D scatter plot to plot certain aspects of his Web server log. He uses gnuplot to do so. What I like in particular is his discussion of the output and the way he positions scatter plots to find correlated event fields.

February 4, 2007

Anonymizing Log Entries

Category: Log Analysis,Visualization — Raffael Marty @ 2:12 pm

I am finally biting the bullet. I will start to really anonymize my graphs. In order to do so, I was trying to find a tool on the Web which does that. Well, as you can probably imagine, there is non which does exactly what I wanted. So i wrote my own anonymization script. To safe you some hassle, also download the Anonymous.pm file.

This is how you use the script on a CSV file:

cat /tmp/log | ./anonymize.pl -c 1 -p user

This will replace all the values in column one with usernames of the form: "userX". If you are anonymizing IP addresses, run the tool without the prefix (-p) and it will do that automatically for you.

Credits to  John Kristoff who wrote the Anonymous.pm module for Perl.