March 24, 2012

Advanced Network Graph Visualization with AfterGlow

Filed under: Log Analysis,Programming,Visualization — Raffael Marty @ 12:49 pm

There are cases where you need fairly sophisticated logic to visualize data. Network graphs are a great way to help a viewer understand relationships in data. In my last blog post, I explained how to visualize network traffic. Today I am showing you how to extend your visualization with some more complicated configurations.

This blog post was inspired by an AfterGlow user who emailed me last week asking how he could keep a list of port numbers to drive the color in his graph. Here is the code snippet that I suggested he use:

variable=@ports=qw(22 80 53 110);
color="green" if (grep(/^\Q$fields[0]\E$/,@ports))

Put this in a configuration file and invoke AfterGlow with it:

perl afterglow.pl -c file.config | ...

What this does is color all nodes green if they are part of the list of ports (22, 80, 53, 110). I am using $fields[0] to reference the first column of data. You could also use the function fields() to reference any column in the data.

Another way to define the variable is by looking it up in a file. Here is an example:

variable=open(TOR,"tor.csv"); @tor=; close(TOR);
color="red" if (grep(/^\Q$fields[1]\E$/,@tor))

This time you put the list of items in a file and read it into an array. Remember, it’s just Perl code that you execute after the variable= statement. Anything goes!

I am curious what you will come up with. Post your experiments and questions on secviz.org!

Read more about how to use AfterGlow in security visualization.

September 8, 2011

Logging Guidelines Enable Actions

Filed under: Log Analysis,Programming — Raffael Marty @ 10:05 am

Log BookAnalyzing log files can be a very time consuming process and it doesn’t seem to get any easier. In the past 12 years I have been on both sides of the table. I have analyzed terabytes of logs and I have written a lot of code that generates logs. When I started writing Loggly’s middleware, I thought it was going to be really easy and fun to finally write the perfect application logs. Guess what, I was wrong. Although I have seen pretty much any log format out there, I had the hardest time coming up with a decent log format for ourselves. What’s a good log format anyways? The short answer is: “One that enables analytics or actions.”

I was sufficiently motivated to come up with a good log format that I decided to write a paper about application logging guidelines. The paper has two main parts: Logging Guidelines and a reference architecture for a cloud service. In the first part I am covering the questions of when to log, what to log, and how to log. It’s not as easy as you might think. The most important thing to constantly keep in mind is the use of the logs. Especially for the question on what to log you need to keep the log consumer in mind. Are the logs consumed by a human? Are they consumed by a log management tool? What are the people looking at the logs trying to do? Debugging the application? Monitoring performance? Detecting security violations? Depending on the answers to these questions, you might change the places in your code that you emit log records. (Or even better you log in all places and add a use-case indicator as a field to your logs.)

The paper is a starting point and not a definite guide. I would expect readers to challenge it and come up with improvements and refinements of use-cases and also the exact contents of the log records. I’d love to hear from practitioners and get a dialog going.

As a side note: CEE, the Common Event Expression standard, covers parts of what I am talking about in the paper. However, the paper’s focus is mainly on defining guidelines for application developers; establishing a baseline of when log entries should be recorded and what information should be included.

Resources: Cloud Application Logging for ForensicsPaperPresentation

May 25, 2010

Recent Blog Posts on Django, Security, Cloud, and Visualization

Filed under: Links,Log Analysis,Programming,Visualization — Raffael Marty @ 5:17 pm

I thought you might be interested in some blog posts that I have written lately. I have been doing quite a bit of work on Django and Web applications. That might explain the topics of my recent blog posts. Check them out.

Would love to hear from you if you have any comments. Either leave a comment on the blogs, or contact me via Twitter at @zrlram.

February 24, 2007

Geo Lookup on the Command Line

Filed under: Programming,UNIX Scripting — Raffael Marty @ 8:56 pm

By now you should know that I really like command line tools which operate well when applied to data through a pipe. I have posted quite a few tips already to do data manipulation on the command line. Today I wanted a quick way to lookup IP address locations and add them to a log file. After investigating a few free databases, I came accross Geo::IPFree, a Perl library which does the trick. So here is how you add the country code. First, this is the format of my log entries:

10/13/2005 20:25:54.032145,195.141.211.178,195.131.61.44,2071,135

I want to get the country of the source address (first IP in the log). Here we go:

cat pflog.csv | perl -M'Geo::IPfree' -na -F/,/ -e '($country,$country_name)=Geo::IPfree::LookUp($F[1]);chomp; print "$_,$country_name\n"'

And here the output:

10/13/2005 20:24:33.494358,62.245.243.139,212.254.111.99,,echo request,Europe

Very simple!

May 19, 2006

Perl Performance Improvement

Filed under: Programming — Raffael Marty @ 4:18 pm

I was fiddling with optimizing AfterGlow the other day and to do so, I introduced caches for some of the functions. Later a coworker (thanks Senthil) sent me a note that I could have done without implementing the cache myself by using Memoize. This is how to use it:

use Memoize;
memoize(function);
function(arguments); # this is now much faster

This will basically cache the outputs for each of inputs to the function. Especially for recursion this is an incredible speedup.

April 4, 2006

Perl Performance Optimization

Filed under: Programming — Raffael Marty @ 4:13 pm

I was working on AfterGlow the other night and I realized that adding feature after feature starts to slow down the thing quite a bit (you need to be a genious to figure that one out!). So that prompted me to look for Perl performance analyzers and indeed I found something that’s pretty useful.

Run your perl script with: perl -d:DProf and then run dprofpp. This will show you how much time was spent in each of the subroutines. It helped me pinpoint that most of the time was spent in the getColor() call. The logical solution was to introduce a cache for the colors and guess what – AfterGlow 1.5.1 will be faster ;)

This is a sample output of dprofpp:

Total Elapsed Time = 11.69959 Seconds
User+System Time = 8.969595 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.5 7.310 9.900 120000 0.0001 0.0001 main::getColor
29.1 2.615 2.615 116993 0.0000 0.0000 main::subnet
0.89 0.080 0.080 20000 0.0000 0.0000 main::getEventName
0.22 0.020 0.020 20000 0.0000 0.0000 main::getSourceName
0.22 0.020 0.020 20000 0.0000 0.0000 main::getTargetName
0.11 0.010 0.010 1 0.0100 0.0100 main::BEGIN
0.00 - -0.000 1 - - Exporter::import
0.00 - -0.000 1 - - Getopt::Std::getopts
0.00 - -0.000 1 - - main::propertyfile
0.00 - -0.000 1 - - main::init
- - -0.025 116993 - - main::field

December 4, 2005

Python For Beginners and for me

Filed under: Programming — Raffael Marty @ 9:00 pm

The RedHat Magazine had a nice Introduction to Python. Cool example that uses pyGTK!