April 26, 2006
Something I ran into a couple of times this week is how to do an easy dns lookup on the command line:
cat data | perl -M'Socket' -na -F/,/ -e '$dns=gethostbyaddr(inet_aton($F),AF_INET)||$F; print "$dns,$F,$F\n"'
The code assumes that you have an IP address in the first column. It uses -F/,/ to split the input into arrays, does a DNS lookup on the first column and returns either the dns name or if that was not found, it returns the original IP address.
Quite a while ago, the goal of loganalysis.org was to collect log files of all kinds of devices to build up a repository for the community. Unfortunately that effort has not been too successful. I just stumbled accross a new effort driven by splunk:
www.splunk.com/base. There are quite a few syslog messages on there already. What i don’t like is that most of the messages are some kind of exceptions of some java applications. I don’t really care about those things. Well. Hopefully there are going to be more people adding logs…
April 13, 2006
I was just emailing someone who suggested a thesis on the topic of filtering event streams to get rid of false positives. This is what I replied:
Filtering seems to be the obvious approach to take in order to get to the important events in an event stream.However, filtering is not really what you want to do. You can filter all day and you still end up with a lot of stuff that you have not filtered (e.g., new things will show up and you will have to filter again). Do the math: 1Mio events a day. Assum you come up with a lot of filters that filter out 500K events. You still have 500K events left. What you need to do is prioritization. You need to have those things that are important trickle up! You can still apply filtering after that, but prioritize first!
Here is a very important concept in SIM: Don’t spend processing time on unimportant things!
April 4, 2006
I was working on AfterGlow the other night and I realized that adding feature after feature starts to slow down the thing quite a bit (you need to be a genious to figure that one out!). So that prompted me to look for Perl performance analyzers and indeed I found something that’s pretty useful.
Run your perl script with:
perl -d:DProf and then run
dprofpp. This will show you how much time was spent in each of the subroutines. It helped me pinpoint that most of the time was spent in the getColor() call. The logical solution was to introduce a cache for the colors and guess what – AfterGlow 1.5.1 will be faster
This is a sample output of dprofpp:
Total Elapsed Time = 11.69959 Seconds
User+System Time = 8.969595 Seconds
%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.5 7.310 9.900 120000 0.0001 0.0001 main::getColor
29.1 2.615 2.615 116993 0.0000 0.0000 main::subnet
0.89 0.080 0.080 20000 0.0000 0.0000 main::getEventName
0.22 0.020 0.020 20000 0.0000 0.0000 main::getSourceName
0.22 0.020 0.020 20000 0.0000 0.0000 main::getTargetName
0.11 0.010 0.010 1 0.0100 0.0100 main::BEGIN
0.00 - -0.000 1 - - Exporter::import
0.00 - -0.000 1 - - Getopt::Std::getopts
0.00 - -0.000 1 - - main::propertyfile
0.00 - -0.000 1 - - main::init
- - -0.025 116993 - - main::field