August 12, 2013

Cyber Security Monitoring Still Full of Challenges

Category: Security Intelligence,Visualization — Raffael Marty @ 8:33 am

I was greatly honored when I got an invitation from the Conference on Knowledge Discovery and Data Mining (KDD) to give a talk about data mining and cyber security.

Knowing me, you might be able to guess the topic I chose to present: Visual Analytics. I am focussing on not the visualization layer or the data layer, but on the analytics layer. In the presentation I am showing what we have been doing with data analytics and data mining in cyber security. I am showing some examples for three topics:

  • Situational Awareness
  • Exploration and Discovery
  • Forensics

At the end, I am presenting a number of challenges to the community; hard problems that we need help with to advance insights into cyber security of infrastructures and applications. The following slide summarizes the challenges I see in data mining for security:

If you have any suggestions on each of the challenges, please contact me or comment on this post!

The complete presentation is here: Cyber Security – How Visual Analytics Unlock Insight.

Calculate addresses ipv6 to ipv4

March 24, 2012

Advanced Network Graph Visualization with AfterGlow

Category: Log Analysis,Programming,Visualization — Raffael Marty @ 12:49 pm

There are cases where you need fairly sophisticated logic to visualize data. Network graphs are a great way to help a viewer understand relationships in data. In my last blog post, I explained how to visualize network traffic. Today I am showing you how to extend your visualization with some more complicated configurations.

This blog post was inspired by an AfterGlow user who emailed me last week asking how he could keep a list of port numbers to drive the color in his graph. Here is the code snippet that I suggested he use:

variable=@ports=qw(22 80 53 110);
color="green" if (grep(/^\Q$fields[0]\E$/,@ports))

Put this in a configuration file and invoke AfterGlow with it:

perl afterglow.pl -c file.config | ...

What this does is color all nodes green if they are part of the list of ports (22, 80, 53, 110). I am using $fields[0] to reference the first column of data. You could also use the function fields() to reference any column in the data.

Another way to define the variable is by looking it up in a file. Here is an example:

variable=open(TOR,"tor.csv"); @tor=; close(TOR);
color="red" if (grep(/^\Q$fields[1]\E$/,@tor))

This time you put the list of items in a file and read it into an array. Remember, it’s just Perl code that you execute after the variable= statement. Anything goes!

I am curious what you will come up with. Post your experiments and questions on secviz.org!

Read more about how to use AfterGlow in security visualization.

March 21, 2012

Visualizing Packet Captures For Fun and Profit

Category: Log Analysis,Visualization — Raffael Marty @ 1:26 pm


Have you ever collected a packet capture and you needed to know what the collected traffic is about? Here is a quick tutorial on how to use AfterGlow to generate link graphs from your packet captures (PCAP).

I am sitting at the 2012 Honeynet Project Security Workshop. One of the trainers of a workshop tomorrow just approached me and asked me to help him visualize some PCAP files. I thought it might be useful for other people as well. So here is a quick tutorial.

Installation

To start with, make sure you have AfterGlow installed. This means you also need to install GraphViz on your machine!

First Visualization Attempt

The first attempt of visualizing tcpdump traffic is the following:

tcpdump -vttttnnelr file.pcap | parsers/tcpdump2csv.pl "sip dip" | perl graph/afterglow.pl -t | neato -Tgif -o test.gif

I am using the tcpdump2csv parser to deal with the source/destination confusion. The problem with this approach is that if your output format is slightly different to the regular expression used in the tcpdump2csv.pl script, the parsing will fail [In fact, this happened to us when we tried it here on someone else’s computer].
It is more elegant to use something like Argus to do this. They do a much better job at protocol parsing:

argus -r file.pcap -w - | ra -r - -nn -s saddr daddr -c, | perl graph/afterglow.pl -t | neato -Tgif -o test.gif

When you do this, make sure that you are using Argus 3.0 or newer. If you do not, ragator does not have the -c option!

From here you can go in all kinds of directions.

Using other data fields

argus -r file.pcap -w - | ra -r - -nn -s saddr daddr dport -c, | perl graph/afterglow.pl | neato -Tgif -o test.gif

Here I added the dport to the parameters. Also note that I had to remove the -t parameter from the afterglow command. This tells AfterGlow that there are not two, but three columns in the CSV file.

Or use this:

argus -r file.pcap -w - | ra -r - -nn -s daddr dport ttl -c, | perl graph/afterglow.pl | neato -Tgif -o test.gif

This uses the destination address, the destination port and the TTL to plot your graph. Pretty neat …

AfterGlow Properties

You can define your own property file to define the colors for the nodes, configure clustering, change the size of the nodes, etc.

argus -r file.pcap -w - | ra -r - -nn -s daddr dport ttl -c, | perl graph/afterglow.pl -c graph/color.properties | neato -Tgif -o test.gif

Here is an example config file that is not as straight forward as the default one that is included in the AfterGlow distribution:

color="white" if ($fields[2] =~ /foo/)
color="gray50"
size.target=$targetCount{$targetName};
size=0.5
maxnodesize=1

The config uses the number of times the target shows up as the size of the target node.

Comments / Examples / Questions?

Obviously comments and questions are more than welcome. Also make sure that you post your example graphs on secviz.org!

March 16, 2012

Big Data Security Intelligence – nothing to see here – move along

Category: Log Analysis,Security Intelligence,Visualization — Raffael Marty @ 7:22 am

Big data doesn’t help us to create security intelligence! Big data is like your relational database. It’s a technology that helps us manage data. We still need the analytical intelligence on top of the storage and processing tier to make sense of everything. Visual analytics anyone?

A couple of weeks ago I hung out around the RSA conference and walked the show floor. Hundreds of companies exhibited their products. The big topics this year? Big data and security intelligence. Seems like this was MY conference. Well, not so fast. Marketing does unfortunately not equal actual solutions. Here is an example out of the press. Unfortunately, these kinds of things shine the light on very specific things; in this case, the use of hadoop for security intelligence. What does that even mean? How does it work? People seem to not really care, but only hear the big words.

Here is a quick side-note or anecdote. After the big data panel, a friend of mine comes up to me and tells me that the audience asked the panel a question about how analytics played into the big data environment. The panel huddled, discussed, and said: “Ask Raffy about that“.

Back to the problem. I have been reading a bunch lately about SIEM being replaced or superseded by big data infrastructure. That’s completely and utterly stupid. These are not competing technologies. They are complementary. If anything, SIEM will be replaced by some other analytical capabilities that are leveraging big data infrastructures. Big data is like RDBMS. New analytical capabilities are like the SIEMs (correlation rules, parsed data, etc.) For example, using big data, who is going to write your parsers for you. SIEMs have spent a lot of time and resources on things like parsers, big data solutions will need to do the same! Yes, there are a couple of things that you can do with big data approaches and unparsed data. However, most discussions out there do not discuss those uses.

In the context of big data, people also talk about leveraging multiple data sources and new data sources. What’s the big deal? We have been talking about that for 6 years (or longer). Yes, we want video feeds, but how do you correlate a video with a firewall log? Well, you process the video and generate events from it. We have been doing that all along. Nothing new there.

What HAS changed is that we now have the means to store and process the data; any data. However, nobody really knows how to process it.

Let’s start focusing on analytics!

January 8, 2012

The Steps To a Mature Visual Analytics Practice

Category: Visualization — Raffael Marty @ 1:50 pm

The visualization maturity scale can be used to explain a number of issues in the visual analytics space. For example, why aren’t companies leveraging visualization to analyze their data? What are the requirements to implement visual analytics services? Or why don’t we have more visual analytics products?

About three years ago I posted the log management maturity scale. The maturity scale helped explain why companies and products are not as advanced as they should be in the log management, log analysis, and security information management space.

While preparing my presentation for the cyber security grand challenge meeting in early December, I developed the maturity scale for information visualization that you can see above.

Companies that are implementing visualization processes move from through each of the steps from left to right. So do product companies that build visualization applications. In order to build products on the right-hand side, they need to support the pieces to the left. Let’s have a look at the different stages in more detail:

  • Data Collection: No data, no visuals (see also Where Data Analytics and Security Collide). This is the foundation. Data needs to be available and accessible. Generally it is centralized in a big data store (it used to be relational databases and that’s a viable solution as well). This step generally involves parsing data. Turning unstructured data or semi-structured data into structured data. Although a fairly old problem, this is still a huge issue. I wonder if anyone is going to come up with a novel solution in this space anytime soon! The traditional regular expression based approach just doesn’t scale.
  • Data Analysis: Once data is centralized or accessible via a federated data store, you have to do something with it. A lot of companies are using Excel to do the first iteration of data analysis. Some are using R, SAS, or other statistics and data analytics software. One of the core problems here is data cleansing. Another huge problem is understanding the data itself. Not every data set is as self explanatory as sales data.
  • Context Integration: Often we collect data, analyze it, and then realize that the data doesn’t really contain enough information to understand it. For example in network security. What does the machine behind a specific IP address do? Is it a Web server? This is where we start adding more context: roles of machines, roles of users, etc. This can significantly increase the value of data analytics.
  • Visualization: Lets be clear about what I refer to as visualization. I am using visualization to mean reporting and dashboards. Reports are static summaries of historical data. They help communicate information. Dashboards are used to communicate information in real-time (or near real-time) to create situational awareness.
  • Visual Analytics: This is where things are getting interesting. Interactive interfaces are used as a means to understand and reason about the data. Often linked views, brushing, and dynamic queries are key technologies used to give the user the most freedom to look at and analyze the data.
  • Collaboration: It is one thing to have one analyst look at data and apply his/her own knowledge to understand the data. It’s another thing to have people collaborate on data and use their joint ‘wisdom’.
  • Dissemination: Once an analysis is done, the job of the analyst is not. The newly found insights have to be shared and communicated to other groups or people in order for them to take action based on the findings.
  • Put in Action: This could be regarded as part of the dissemination step. This step is about operationalizing the information. In the case of security information management, this is where the knowledge is encoded in correlation rules to catch future instances of the same or similar incidents.

For an end user, the visualization maturity scale outlines the individual steps he/she has to go through in order to achieve analytical maturity. In order to implement the ‘put in action’ step, users need to implement all of the steps on the left of the scale.

For visualization product companies, the scale means that in order to have a product that lets a user put findings into action, they have to support all the left-hand stages: there needs to be a data collection piece; a data storage. The data needs to be pre-analyzed. Operations like data cleansing, aggregation, filtering, or even the calculation of certain statistical properties fall into this step. Context is not always necessary, but often adds to the usefulness of the data. Etc. etc.

There are a number of products, both open source, as well as commercial solutions that are solving a lot of the left hand side problems. Technologies like column-based data bases (e.g., MongoDB) or map reduce (e.g., Hadoop), or search engines like ElasticSearch are great open source examples of such technologies. In the commercial space you will find companies like Karmaspehre or DataMeer tackling these problems.

Comments? Chime in!

December 8, 2011

Cyber Security Visualization – Grand Challenge

Category: Security Market,Visualization — Raffael Marty @ 5:54 pm

At the beginning of this week, I spent some time with a number of interesting folks talking about cyber security visualization. It was a diverse set of people from the DoD, the X Prize foundation, game designers, and even an astronaut. We all discussed what it would mean if we launched a grand challenge to improve cyber situational awareness. Something like the Lunar XPrize that is a challenge where teams have to build a robot and successfully send it to the moon.

There were a number of interesting proposals that came to the table. On a lot of them I had to bring things back down to reality every now and then. These people are not domain experts in cyber security, so you might imagine what kind of ideas they suggested. But it was fun to be challenged and to hear all these crazy ideas. Definitely expanded my horizon and stretched my imagination.

What I found interesting is that pretty much everybody gravitated towards a game-like challenge. All the way to having a game simulator for cyber security situational awareness.

Anyways, we’ll see whether the DoD is actually going to carry through with this. I sure hope so, it would help the secviz field enormously and spur interesting development, as well as extend and revitalize the secviz community!

Here is the presentation about situational awareness that I gave on the first day. I talked very briefly about what situational awareness is, where we are today, what the challenges are, and where we should be moving to.

View more of my presentations

September 13, 2011

Learning About Log Analysis and Visualization in Taipei

Category: Log Analysis,Visualization — Raffael Marty @ 10:29 am

L1090818_smallI just returned from Taipei where I was teaching log analysis and visualization classes for Trend Micro. Three classes a 20 students. I am surprised that my voice is still okay after all that talking. It’s probably all the tea I was drinking.

The class schedule looked as follows:

Day 1: Log Analysis

  • data sources
  • data analysis and visualization linux (davix)
  • log management and siem overview
  • application logging guidelines
  • log data processing
  • loggly introduction
  • splunk introduction
  • data analysis with splunk

Day 2: Visualization

  • visualization theory
  • data visualization tools an libraries
  • perimeter threat use-cases
  • host-based data analysis in splunk
  • packet capture analysis in splunk
  • loggly api overview
  • visualization resources

IMG_2069The class was accompanied by a number of exercises that helped the students apply the theory we talked about. The exercises are partly pen and paper and partly hands-on data analysis of sample logs with the davix life CD.

I love Taipei, especially the food. I hope I’ll have a chance to visit again soon.

PS: If you are looking for a list of visualization resources, they got moved over to secviz.

January 17, 2011

Mid January Roundup

Category: Links,Log Analysis,Visualization — Raffael Marty @ 9:00 am

The last couple of months have been pretty busy. I have been really bad about updating my personal blog here, but I have not been lazy. Among other things, I have been traveling a lot to attend a number of conferences. Here is a little summary of what’s been going on:

  • I posted a blog entry on secviz about my security visualization predictions for 2011. It’s a bit of a gloomy forecast, but check it out.
  • The Security visualization predictions post was motivated by a panel I was on at the SANS Incident Detection Summit in D.C. early December. Here are the slides for my panel discussion.
  • One of the topics I have been talking about lately is Cloud Security. The slides linked here are from a presentation I gave in Mexico.
  • The topic of cloud security and also cloud risk management is one that I have been discussing on my new blog over at Infoboom.
  • I have recorded a couple of pod casts in the last months also. One was the CloudChaser podcast where we talked about Logging Challenges and Logging in the Cloud.
  • The other pod cast I recorded was together with Kord and Gary for The Cloud Computing Show. We talked about all kinds of things. Mainly about Loggly and logging in the cloud. Here the mp3.
  • I also dug out the log maturity scale again. After mentioning it at the SANS logging summit, I got a lot of great responses on it.
  • The other day, one of my Google alerts surfaced this DefCon video of me talking about security visualization. It’s probably one of my first conference appearances. Is it?
  • And finally, 2011 started with a trip to Kauai where I presented a paper on insider threat visualization. Unfortunately, the paper is not publicly available. Email me if you want a copy.

As you are probably aware, you find my speaking schedule and slides on my personal page. That’s a good way of tracking me down. And in case you haven’t found it yet, I have a slideshare account where I try to share my presentations as well.

June 28, 2010

All the Data That’s Fit to Visualize

Last week I posted the introductionary video for a talk that I gave at Source Boston in 2008. I just found the entire video of that talk. Enjoy:

Talk by Raffael Marty:

With the ever-growing amount of data collected in IT environments, we need new methods and tools to deal with them. Event and Log Analysis is becoming one of the main tools for analysts to investigate and comprehend the state of their networks, hosts, applications, and business processes. Recent developments, such as regulatory compliance and an increased focus on insider threat have increased the demand for analytical tools to help in the process. Visualization is offering a new, more effective, and simpler approach to data analysis. To date, security visualization, has mostly failed to deliver effective tools and methods. This presentation will show what the New York Times has to teach us about effective visualizations. Visualization for the masses and not visualization for the experts. Insider Threat, Governance, Risk, and Compliance (GRC), and Perimeter Threat all require effective visualization methods and they are right in front of us – in the newspaper.

June 22, 2010

All the Data That’s Fit to Visualize – SOURCE Boston 2008

Category: Visualization — Raffael Marty @ 1:00 pm

img-62_t
I was giving a talk at SOURCEBoston 2008. The topic this time was general visualization and what has gone wrong in security visualization in the past. I showed how we can learn and steal from other disciplines, in this case, the New York Times. The NYT has done some pretty fantastic work in the area of data visualization. Their interactive market map, for example, is a great way of exploring stock data. During the talk, I outlined some of the design principles that the NYT graphics department is using when they are designing their graphs: Show – Don’t Tell.

To start my presentation, I showed a little video about security visualization (see below).

Security Visualization – Gone Wrong