March 24, 2012
There are cases where you need fairly sophisticated logic to visualize data. Network graphs are a great way to help a viewer understand relationships in data. In my last blog post, I explained how to visualize network traffic. Today I am showing you how to extend your visualization with some more complicated configurations.
This blog post was inspired by an AfterGlow user who emailed me last week asking how he could keep a list of port numbers to drive the color in his graph. Here is the code snippet that I suggested he use:
variable=@ports=qw(22 80 53 110);
color="green" if (grep(/^\Q$fields\E$/,@ports))
Put this in a configuration file and invoke AfterGlow with it:
perl afterglow.pl -c file.config | ...
What this does is color all nodes green if they are part of the list of ports (22, 80, 53, 110). I am using $fields to reference the first column of data. You could also use the function fields() to reference any column in the data.
Another way to define the variable is by looking it up in a file. Here is an example:
variable=open(TOR,"tor.csv"); @tor=; close(TOR);
color="red" if (grep(/^\Q$fields\E$/,@tor))
This time you put the list of items in a file and read it into an array. Remember, it’s just Perl code that you execute after the variable= statement. Anything goes!
I am curious what you will come up with. Post your experiments and questions on secviz.org!
Read more about how to use AfterGlow in security visualization.
March 21, 2012
Have you ever collected a packet capture and you needed to know what the collected traffic is about? Here is a quick tutorial on how to use AfterGlow to generate link graphs from your packet captures (PCAP).
I am sitting at the 2012 Honeynet Project Security Workshop. One of the trainers of a workshop tomorrow just approached me and asked me to help him visualize some PCAP files. I thought it might be useful for other people as well. So here is a quick tutorial.
To start with, make sure you have AfterGlow installed. This means you also need to install GraphViz on your machine!
First Visualization Attempt
The first attempt of visualizing tcpdump traffic is the following:
tcpdump -vttttnnelr file.pcap | parsers/tcpdump2csv.pl "sip dip" | perl graph/afterglow.pl -t | neato -Tgif -o test.gif
I am using the tcpdump2csv parser to deal with the source/destination confusion. The problem with this approach is that if your output format is slightly different to the regular expression used in the tcpdump2csv.pl script, the parsing will fail [In fact, this happened to us when we tried it here on someone else’s computer].
It is more elegant to use something like Argus to do this. They do a much better job at protocol parsing:
argus -r file.pcap -w - | ra -r - -nn -s saddr daddr -c, | perl graph/afterglow.pl -t | neato -Tgif -o test.gif
When you do this, make sure that you are using Argus 3.0 or newer. If you do not, ragator does not have the -c option!
From here you can go in all kinds of directions.
Using other data fields
argus -r file.pcap -w - | ra -r - -nn -s saddr daddr dport -c, | perl graph/afterglow.pl | neato -Tgif -o test.gif
Here I added the dport to the parameters. Also note that I had to remove the -t parameter from the afterglow command. This tells AfterGlow that there are not two, but three columns in the CSV file.
Or use this:
argus -r file.pcap -w - | ra -r - -nn -s daddr dport ttl -c, | perl graph/afterglow.pl | neato -Tgif -o test.gif
This uses the destination address, the destination port and the TTL to plot your graph. Pretty neat …
You can define your own property file to define the colors for the nodes, configure clustering, change the size of the nodes, etc.
argus -r file.pcap -w - | ra -r - -nn -s daddr dport ttl -c, | perl graph/afterglow.pl -c graph/color.properties | neato -Tgif -o test.gif
Here is an example config file that is not as straight forward as the default one that is included in the AfterGlow distribution:
color="white" if ($fields =~ /foo/)
The config uses the number of times the target shows up as the size of the target node.
Comments / Examples / Questions?
Obviously comments and questions are more than welcome. Also make sure that you post your example graphs on secviz.org!
March 16, 2012
Big data doesn’t help us to create security intelligence! Big data is like your relational database. It’s a technology that helps us manage data. We still need the analytical intelligence on top of the storage and processing tier to make sense of everything. Visual analytics anyone?
A couple of weeks ago I hung out around the RSA conference and walked the show floor. Hundreds of companies exhibited their products. The big topics this year? Big data and security intelligence. Seems like this was MY conference. Well, not so fast. Marketing does unfortunately not equal actual solutions. Here is an example out of the press. Unfortunately, these kinds of things shine the light on very specific things; in this case, the use of hadoop for security intelligence. What does that even mean? How does it work? People seem to not really care, but only hear the big words.
Here is a quick side-note or anecdote. After the big data panel, a friend of mine comes up to me and tells me that the audience asked the panel a question about how analytics played into the big data environment. The panel huddled, discussed, and said: “Ask Raffy about that“.
Back to the problem. I have been reading a bunch lately about SIEM being replaced or superseded by big data infrastructure. That’s completely and utterly stupid. These are not competing technologies. They are complementary. If anything, SIEM will be replaced by some other analytical capabilities that are leveraging big data infrastructures. Big data is like RDBMS. New analytical capabilities are like the SIEMs (correlation rules, parsed data, etc.) For example, using big data, who is going to write your parsers for you. SIEMs have spent a lot of time and resources on things like parsers, big data solutions will need to do the same! Yes, there are a couple of things that you can do with big data approaches and unparsed data. However, most discussions out there do not discuss those uses.
In the context of big data, people also talk about leveraging multiple data sources and new data sources. What’s the big deal? We have been talking about that for 6 years (or longer). Yes, we want video feeds, but how do you correlate a video with a firewall log? Well, you process the video and generate events from it. We have been doing that all along. Nothing new there.
What HAS changed is that we now have the means to store and process the data; any data. However, nobody really knows how to process it.
Let’s start focusing on analytics!