December 12, 2005
I recently posted on Focus-IDS, loganalysis and the idug mailinglists to ask about what people are using to visualize their log files. Here is a summary of answers I got:
- One of the answers was Excel. Well, why not.
- Perl and MySQL, but completely manual approach.
- A paper on some techniques. I have not read it yet, but will post my comments when I did so: http://www.cs.unm.edu/~chris2d/papers/HPC_Analytics_Final.pdf
I got a couple more responses, but I want to verify and see what they are about first before I post them. Stay put.
December 11, 2005
I am still sitting in the airplane and the next article from the ISSA Journal from November 2005 that catches my attention is the “Log Data Management: A Smarter Approach to Managing Risk”. I have only a few comments about this article:
- The author demands that all the log data is archived, and archived unfiltered. Well, here is a question: What is the difference between not logging something and logging it, but later filtering it out? What does that mean for litigation quality logs?
- On the same topic of litigation quality data, the author suggest that a copy of the logs are save in the original, raw format while analysis is done on the other copy. I don’t agree with this. I know, in this matter my opinion does not really count and nobody is really interested in it, but I will have some proof soon that this is not required. I am not a lawyer, so I will not even try to explain the rational behind allowing the processing of the original logs and still maintaining litigation quality data.
- “Any log management solution should be completely automated.” While I agree with this, I would emphasize the word should. What does that mean anyways? Completely automated in the real of log management? Does that mean the log is archived automatically? Does it mean that the log management solution takes action and block systems (like an IPS)? There will always need to be human interaction. You can automate a lot of things, including the generation of trouble tickets, but at least then, an operator will be involved.
- Why does the author demand that “companies should look for an applicance-based solution”. Why is that important? The author does not give any rational for that. I can see some benefits, but there are tons of draw-backs to that approach too. I yet have to see a compelling reason why an appliance is better than a custom install on company approved hardware.
- In the section about alerting and report capabilities, the author mentiones “text-based alerts”, meaning that rules can be setup to trigger on text-srings in log messages. That’s certainly nice, but sorry, it does not scale. Assume I want to setup a trigger on firewall block events. I can define a text-string of “block” to trigger upon. But all the firewalls which call this not a block, but a “Deny” will not be caught. Have you heard of categorization or an event taxonomy? That’s what is really needed!
- “… fast text-based searches can accelerate problem resolution …” Okay. Interesting. I disagree. I would argue that vsualization is the key here. But I am completely biased on that one 😉
- Another interesting point is that the author suggest that “… a copy [of the data] can be used for analysis”. Sure. Why not, but why? If the argument is litigation quality data again, why would compression, which is mentioned in the next sentence be considered a “non-altering” way of processing the data. If that is the argument. I would argue that I can work with the log data by normalizing it and even enriching the data without altering it.
Sitting in an airplane from San Francisco to New York City, I am reading through some old magazines that piled up on my desk over the past weeks. One article I cannot resist commenting on I found in the ISSA Journal from November 2005. It’s titled: “Holistic IPS: The Convergence of Intrusion Prevention Technologies”. The article talkes about an apparently new way of doing intrusion detection (or intrusion prevention). Well, to give my comment up front already, this approach is far from new. SIMs are doing some of the things mentioned and quite a few more for years! It’s one more of these cases where people could learn from other technologies, but nobody pays attention.
Let’s have a look at some flaws in the article:
- First, a claim is made that an IPS can come up with a threat level for each attack. This is very interesting. I like it. But here are a few questions: 1. How do you define an attack? What is that? The author does not touch on that. 2. A threat-level needs to take into account how important the targeted asset is to the organization. It is first of all totally impractical for an IPS to know about the assets it protects and second, the author of the article does not mention this at all. We all know that risk = asset X vulnerability X threat. Why is this not mentioned here?
- The author claims that an attack always starts with probing activity from the attacker. What? Have you heard of networks of scanners that are just there to scan millions of hosts for specific services and vulnerabilities? The attackers will never be the same ones that conducted the reconnaissance. So this logic is somewhat flawed. And even if there were no such scanner networks, why does an attack always have to start with pre-attack reconnaissance? That’s just not true.
- The pre-attack reconnaissance does per the article not impart a threat. Oh really? Have you ever run a nessus scan against your network and used all the plugins? Did all of you machines survive? Mine did not. But Nessus is just a scanner. So just recon acctivity…
- The entire idea of this new correlation engines in the IPSs is that the pre-attack recon is correlated against the real exploit traffic. The article fails to outline how the recon activity can be detected. Is it just anormal behavior? Well… I would think there are attackers that can scan your network without being too anomalous. Ever heard of application level attacks? And you claim by just analyzing the traffic, without deep inspection, you will find that scanning activity? So the claim that “… behavior-based intrusion prevention technologies will probably be most effective in detecting them [the recon activity]” is not really true. I argue that there are much better technologies to make that call.
- What does it mean that an attack has “… a unique (and dynamic) threat level”. I thought unique would rule dynamic out? I don’t quite understand.
- “The correlation engine proposed in this article…” Well, maybe this technique is new in the IPS world, but there are other technologies that have used this kind (and a few more) of correlation for years.
- Differentiating probes and intrusions is not really described in the article either. I think I hit on this point already, but a probe is not necessarily detectable with just behavior based models. There are many one-packet probes that a signature-based approach can detect much more efficiently!
- The author gives an example of how this “new” type of IPS can detect an attack. Nowhere in the entire process is there a mention of the target’s vulenrability and a correlation against it. This is just flawed. You need to know whether a target is vulnerable to determine whether the attack was successful, unless you get evidence from the target itself, be that either from audit logs on the system or network traffic that proofs a vulnerable system is present or the attack was successful.
- Continuing on the example, I just don’t seem to understand that there have to be a probe and then an intrusion that gets correlated with the probe and then an action (e.g., blocking the source) is executed. Why not looking at the intrusion, determinig that it has a potential to be successul or was successful and then block?
- Here is something I really like in the article: If there were probes going to multiple machines, the offender is being blocked not just from going to the one target that it was already actively trying to exploit, but also all the other machines that it probed.
- You decide on this one: “The preventive countermeasures accurately reflect the tangible threat level…”.
- The article fails almost completely to discuss the topic of false positives in either the detection of probes or the detection of the intrusions (I don’t like this word at all, let’s call it an exploit). Maybe there are none?
- The argument that this approach “preludes the need for deep-packet inspection)” and that it “… improve IPS performance” by basically not having to look at all the packets due to the classification into two groups is not new. Have you ever tried to deploy an IDS behind a firewall? Same thing? Maybe not quite, but the idea is exactly the same.
- What I am missing in this whole discussion is also passive network discovery. If so much effort is put into behavioral discovery, why is it not used to model the services and vulnerabilities that the targets are exposing? There are technologies out there that use this very well.
Am I over critical? Did I totally misunderstand what the article tries to say?
December 8, 2005
Something that comes in handy all the time (as proven today when someone asked me about how to do it), is how to setup a reverse SSH tunnel. (Especially when you need to access your work computer from home.) Well, my SSH page explains the procedure.
December 7, 2005
There is an interesting thread on the log-analysis mailinglist about regex-less parsing of messages. The problem is a very old one. Every device out there is logging in some strange way, making it incredibly time-consuming for event consumers (such as ArcSight), to parse the messages and normalize them.
There have been attempts to standardize events, such as IDMEF, which tried to tackle IDS messages. It’s kind of sad, but there is not a single IDS that I know of, which really uses this event exchange format. A lot of IDSs support it, but it’s not their main transport. Then there are tons of other attempts from BEEP to RDEP to SDEE and alike. They are all nice, but guys, we need something that is
easy to implement,
scales to high event rates,
is extensible to support not just security devices (for sure not just IDSs),
and is MACHINE READABLE (not human readable) [when are you people going to realize that logs are not read by humans anymore, but by machines?].
All the past attempts of standardizing event formats are not enough, now Microsoft comes out with yet another event logging format. I have to admit, I only quickly glanced over it, but it’s XML again. That’s just SLOW! Huge overhead!
Also, why do people always define the transport when they are trying to standardize log messages? Leave the transport to the devices. They will figure that one out. In the worst case, people can just use syslog which is widely deployed and has it’s problems. But you know what? At least the burden of complying with the standard is incredibly low. Just send a syslog message. Even I can do that. If you asked me to implement BEEP, I don’t think I would even start thinking about complying with the standard…
Sorry for the long post and rant, but this is just a bit frustrating …
December 6, 2005
I guess by now everyone knows scapy. At this point this is more a way for me to remember this tool.
Scapy is an interactive packet manipulation program written in Python.
I find myself adding data to files that need to be randomized. Well, just call awk from within vi and use the rand() function
:%!awk ‘BEGIN {srand()}; {if (int(rand()*4)==2) {printf(“\%s,S\n”,$0)} else print $0;}’
Maybe even a bit more comfortable AND this only adds to the lines if they don’t use SA, S or F
:%!awk ‘BEGIN {srand()} \!/(SA|S|F)$/ {if (int(rand()*4)==2) {sub(/$/,”FA”);}};{print}’
I need to remember this one:
:g/some text/s/$/,more_data
This will add another column of data to all the lines with “some text”. Simple but useful.
December 4, 2005
The RedHat Magazine had a nice Introduction to Python. Cool example that uses pyGTK!
I just tryed a mutli-dimensional data visualization tool. Took me a bit to get the Java OpenGL running and just to find that it’s not _that_ cool. Oh well, here it is: xmdv.