February 24, 2007
By now you should know that I really like command line tools which operate well when applied to data through a pipe. I have posted quite a few tips already to do data manipulation on the command line. Today I wanted a quick way to lookup IP address locations and add them to a log file. After investigating a few free databases, I came accross Geo::IPFree, a Perl library which does the trick. So here is how you add the country code. First, this is the format of my log entries:
I want to get the country of the source address (first IP in the log). Here we go:
cat pflog.csv | perl -M'Geo::IPfree' -na -F/,/ -e '($country,$country_name)=Geo::IPfree::LookUp($F);chomp; print "$_,$country_name\n"'
And here the output:
10/13/2005 20:24:33.494358,18.104.22.168,22.214.171.124,,echo request,Europe
February 20, 2007
Everybody blogged about RSA, I guess I am the only one who has not gotten around to do so until now. I did some pretty interesting survey of companies and how they use visualization in their products. I will try to publish some of my findings at a later point. The other thing which is always incredible about RSA is the people and the networking. I don’t know how many parties there were in total, but there were a lot. Just the parties I knew about were in the high teens. I even went to the Gala for a little bit but unfortunately I left too early. Otherwise I would have seen this life:
I wish I could dance like this guy. Dude, he got moves!
February 17, 2007
I can’t believe it. I was fighting with R (the statistical package) for a while now. All the restrictions about data types and such are driving me crazy. It constantly complains that something is not a numerical type if I try to generate a histogram, etc. Well, I just found THE solution: R Commander:
apt-get install r-cran-rcmdr
You are in business! I love it! It just does things for you! I am back in business and can continue writing my book!
February 9, 2007
I came accross this very well done Web Log Analysis. The author uses a 3D scatter plot to plot certain aspects of his Web server log. He uses gnuplot to do so. What I like in particular is his discussion of the output and the way he positions scatter plots to find correlated event fields.
February 4, 2007
I am finally biting the bullet. I will start to really anonymize my graphs. In order to do so, I was trying to find a tool on the Web which does that. Well, as you can probably imagine, there is non which does exactly what I wanted. So i wrote my own anonymization script. To safe you some hassle, also download the Anonymous.pm file.
This is how you use the script on a CSV file:
cat /tmp/log | ./anonymize.pl -c 1 -p user
This will replace all the values in column one with usernames of the form: "userX". If you are anonymizing IP addresses, run the tool without the prefix (-p) and it will do that automatically for you.
Credits toÂ John Kristoff who wrote the Anonymous.pm module for Perl.