March 11, 2007
I am still playing with R to generate graphs. I have to say, after some initial frustrations, I think I start to get it. Here are some steps to generate parallel coordinate graphs in R:
Dataset < - read.table("/home/ram/foo2_200.csv", header=FALSE, sep=",")
require(lattice)
parallel(~Dataset,data=Dataset)
That's the very basics to generate a parallel coordinate plot. An interesting addition to this is to use a different command to generate the parallel coordinate plots:
parallel(~Dataset|Dataset$V4,data=Dataset)
This will generate n different parallel coordinates, one for each of the values in Dataset$V4 [where Dataset$V4 is the fourth column of our data].
Following is my last attempt. I wanted to change the axis labels. The y-axis I was sucessful in changing. For the x-axes, I was not able to change the labeling. I guess this is precisely the problem with R. Simple things are fairly simple to do, but if you want to change specific details, it gets quite messy quite quickly.
parallel(~Dataset,data=Dataset,varnames=c("Source","Port","Destination","Action"))

March 9, 2007
I just returned from a hearty breakfast on the 22nd floor of my hotel, overlooking Frankfurt. Great hotel, great views! I was flipping through the pages of the ISSA journal. I haven’t really posted any article reviews in a long time. I got too frustrated, I guess. There is this article, I just can’t resist but making two quick comments. The article was posted in the January 2007 issue and is about managing passwords. The first thing that hit me is that this author actually gives us two email addresses in the “About the Author” box. Why would I need two addresses? Isn’t one enough? Anyways. Sorry. What I was really confused about is that the author talks, in the very first paragraph, about:
“I cannot wait for the day when my PC offers two-factor authentication. -snip- I can’t begin to quantify the convenience that will come from having to convince just my PC that I am who I say I am, and then letting it handle the task of convincing the myriad financial institutions, -snip- that I am who I say I am.”
Wow. Maybe the author should read up on two-factor authentication and the topic of single sign on. They are not the same. And believe me, two-factor authentication is not going to ease your life! It’s one more form of authentication. How can that be easier than two? But again. Single Sign On is not two-factor authentication. It’s a fairly big step between two-factor authentication and single sign on! And I am not sure whether I really want that. Topic attack surface!
March 6, 2007
I love travelling, not because I have to cram myself into a small seat for 9 hours, but because I usually get a lot of reading done. I was reading this paper about Preparing for Security Event Management by the 360is group. I like the article, there are a lot of good points about what to look out for in a SIM/SEM/ESM deployment. However, some fundamental concepts I disagree with:
The first step in deploying a SEM (Security Event Management Solution) should be to get an inventory, to do an assessment. At least according to the paper. Well, I disagree. The very first step has to be to define the use-cases you are after. What’s the objective. What are you hoping to get out of your ESM (Enterprise Security Manager [I use these terms interchangeably here]? Answer this question and it will drive the entire deployment! Out of the use-cases you will learn what data sources you need. Then you will see how much staff you need, procedures will result from that, etc.
The second step, after the use-case development, should be the assessment of your environment. What do you have? Get an inventory of logging devices (make sure you actually also capture the non-logging security devices!) and all your assets. I know, you are going to tell me right away that there is no way you will get a list of all assets, but get at least one of your critical ones!
Another point that I disagree with is the step about “Simplify”. It talks about cleaning up the security landscape. Throwing out old security devices, getting logging configured correctly, etc. Well, while I agree that the logging of all the devices needs to be visited and configured correctly, the task of re-architecting the security environment is not part of a ESM deployment. You will miserably fail if you do that. The ESM project will be big enough as it is, don’t lump this housr-keeping step into it as well. This is really a separate project that falls under: “Do your IT security right”.
March 4, 2007
I am reading a lot of papers again and I keep finding research which just doesn’t get it. Or at least they are not capable of cleanly communicating the research. If you are doing research on visualization, do that. Don’t get into topics you don’t know anything about. Don’t get into common event formats for example! Make assumptions. Let experts in that field make a call. Let them figure some things out. You can say what you did to overcome your challenge, but don’t – please don’t – mention those things as a contribution of your work. It actually does two things to your research: First it distracts people from what you really did, and second, it misleads others that are not experts in that field to believe that you have a viable solution. So, visualization is visualization and not log normalization!
February 24, 2007
By now you should know that I really like command line tools which operate well when applied to data through a pipe. I have posted quite a few tips already to do data manipulation on the command line. Today I wanted a quick way to lookup IP address locations and add them to a log file. After investigating a few free databases, I came accross Geo::IPFree, a Perl library which does the trick. So here is how you add the country code. First, this is the format of my log entries:
10/13/2005 20:25:54.032145,195.141.211.178,195.131.61.44,2071,135
I want to get the country of the source address (first IP in the log). Here we go:
cat pflog.csv | perl -M'Geo::IPfree' -na -F/,/ -e '($country,$country_name)=Geo::IPfree::LookUp($F[1]);chomp; print "$_,$country_name\n"'
And here the output:
10/13/2005 20:24:33.494358,62.245.243.139,212.254.111.99,,echo request,Europe
Very simple!
February 20, 2007
Everybody blogged about RSA, I guess I am the only one who has not gotten around to do so until now. I did some pretty interesting survey of companies and how they use visualization in their products. I will try to publish some of my findings at a later point. The other thing which is always incredible about RSA is the people and the networking. I don’t know how many parties there were in total, but there were a lot. Just the parties I knew about were in the high teens. I even went to the Gala for a little bit but unfortunately I left too early. Otherwise I would have seen this life:
http://www.youtube.com/results?search_query=geeks+dance+rsa
I wish I could dance like this guy. Dude, he got moves!
February 17, 2007
I can’t believe it. I was fighting with R (the statistical package) for a while now. All the restrictions about data types and such are driving me crazy. It constantly complains that something is not a numerical type if I try to generate a histogram, etc. Well, I just found THE solution: R Commander:
apt-get install r-cran-rcmdr
R --gui=tk
library(Rcmdr)
You are in business! I love it! It just does things for you! I am back in business and can continue writing my book!
February 9, 2007
I came accross this very well done Web Log Analysis. The author uses a 3D scatter plot to plot certain aspects of his Web server log. He uses gnuplot to do so. What I like in particular is his discussion of the output and the way he positions scatter plots to find correlated event fields.
February 4, 2007
I am finally biting the bullet. I will start to really anonymize my graphs. In order to do so, I was trying to find a tool on the Web which does that. Well, as you can probably imagine, there is non which does exactly what I wanted. So i wrote my own anonymization script. To safe you some hassle, also download the Anonymous.pm file.
This is how you use the script on a CSV file:
cat /tmp/log | ./anonymize.pl -c 1 -p user
This will replace all the values in column one with usernames of the form: "userX". If you are anonymizing IP addresses, run the tool without the prefix (-p) and it will do that automatically for you.
Credits to John Kristoff who wrote the Anonymous.pm module for Perl.
January 30, 2007
I am still waiting for that one company which is going to develop the univeral agent!
What am I talking about? Well, there is all this agent-based technology out there. You have to deploy some sort of code on all of your machines to monitor/enforce/… something. The problem is that nobody likes to run these pieces of code on their machines. There are complicated approval processes, risk analysis issues, security concerns, etc. which have to be overcome. Then there is the problem of incompatible code, various agents running on the same machine, performance problems, and so on.
Why does nobody build a well-desgined agent framework with all the bells and whistles of remotely managed software. Deployment, upgrades, monitoring, logging, etc. Then make it a plug-in architecture. You offer the most important functionality already in the agent and let other vendors build plug-ins which do some actual work. You would have to deploy and manage exactly one agent, instead of dozens of them.
Well, maybe this will remain wishful thinking.