I will keep posting the answers to my Focus-IDS post where I asked people what they use to visualize their log files. Here is some other home grown solution to visualize pf logs: Fireplot. It’s basically a scatter plot over time where the x-axis shows the port.
I was reading this article in the ISSA Journal from December 2005 that talks about Bayes’ Theorem and its application in security posture analysis. I love math and was very interested in what the article has to say. However, when I reached the end, I was not quite sure what I learned and why Bayes would help me to do any analysis. The examples given in the article are very poor. If department A has a 90% of all the faults, it probably needs a reorganisation. I don’t need any mathematician to tell me that, not even Bayes.
Sometimes people are blinded by math. Just because it’s a nice theorem, don’t abuse it. In the entire article, the probabilities are never discussed. They are completely random. There is no guidance in choosing them. Making the entire analysis completely useless! The outcome from all the math and complexity is extremely logical. Again, there is no added value in using Bayes for this.
Maybe I am missing the point of the article, but I did not learn anything. Maybe I should retake statistics…
Yes, I am reading not just old issues of magazines … So here is a jewel I found in the December issue of the Information Security Magazine:
if you work with large, high-performance networks, make sure yo uare using system such as Windows 2000 or Linux kernels 2.1.9 or later.
2.1.9? I am not even sure whether kernel.org still has those around Does he mean 2.4.9? Maybe. I think he’s another author that has not used Linux. At least not in a while. The quote stems from an article called “The Weakest Link” by Michael Cobb. Another author who is coming up with new terminology. This time it’s Application Level Firewall. While I have definitely heard this term, the author manages very well to confuse me:
Where IDS informs of an actual attack, IPS tries to stop it. IPS solutions tend to be deployed as added security devices at the network perimeter because they don’t provide network segmentation.
ALFs provide the application-layer protection of an IPS by merging IDS signatures and application protocol anomaly detection rules into the traffic-processing engine, while also allowing security zone segmentation.
That’s a long one and reading it, I don’t really see the difference between ALFs and IPSs. Network segmentation? Hmm… Interesting. Is that really the difference? I have to admit, I don’t know, but this seems like a “lame” difference. I bet the IPSs out there can do network segmentation.
The report manages to omit something that I think is quite important. When the author talks about decision factors for buying ALFs (by the way, this reminds me of the brown creature ALF on the TV series…), he does not mention that logs need to be monitored! And that requires from the application that the logs they produce need to be useful. What a concept.
I am not sure in what century this article was written. I know, August is almost half a year back, but still, have you not heard of anomaly detection?
The article triggers another question that constantly bugs me: Why do people have to – over and over – invent new terms for old things. I know, most of the times it’s marketing that is to blame, but please! Have you heard of NBAD (network based anomaly detection)? Well, I have and thought that was the term used in the industry. Well, apparently there is another school of thoughts calling it NAD (network anomaly detection). That’s just wrong. How long have we had anomaly detection? I remember some work written around 5 years ago that outlined the different types of IDSs [btw, I learned from Richard at UCSB that it’s not IDSes, but IDSs]: behavioral based and knowledge-based ones. Or as other call them, anomaly based and signature based. I will try to find the link again for the paper, which originated at IBM Research in Zurich. [Here it is: A Revised Taxonomy. for Intrusion-Detection. Systems by H. Debar, M. Dacier, and A. Wepsi.] So when the author says:
… it helps to understand the differences between it [NAD] and a traditional IDS/IPS.
What does he mean by traditional? Anomaly based systems are IDSs and are among the first ones that were built. So where is the difference?
Let’s continue to see what other things are kind of confusing in the article that is trying to explain what NAD is. The main problem with the article is that it’s imprecise and confusing – in my point of view. Let’s have a look:
NAD is the last line of defense.
Last line? I always thought that host-protection mechanisms would be the last line or something even beyond that, but network based anomaly detection? That’s just wrong!
NADS use network flow data…
Interesting. This is still my definition of an NBAD. Maybe the author is confusing NBADs with anomaly detection. Because anomaly-based IDSs are using a whole range of input sources, and not necessarily network flow data.
NAD is primarily an investigative technology.
Why is that? If that’s really the case, why would I use them at all? I could just use network flow data by itself, not needing to buy an expensive solution. NAD (I am sticking with the author’s term), combined with other sources of information is actually very very useful in early detection, etc. Correlate these streams with other stremas and you will be surprised what you can do. Well, just get a SIM (security information management) to do it for you!
Another thing I love is that _the_ usecase called out in the article is detecting worms. Why is everyone using this example? It’s one of the simplest things to do. Worms have such huge footprints that I could write you a very simple tool that detects one. Just run MRTG and look at the graphs. Huge spike? Very likely a worm! (I know I am simplifying a bit here). My point is that here are harder problems to solve with NAD and there are much nicer examples, but the author fails to mention them. The other point is that I don’t need a NAD for this, even SIMs can do that for you (and they have only done that for about three years now, although the author claims that SEMs (how he calls SIMs) are just starting to do this. Well, he has to know.)
I love this one:
Security incident deteciton typically falls into two categories: signature- and anomaly-based. These terms are so overused …
So NAD is not overused? I actually don’t think they are very overused. There are very clear definitions for them. It’s just that a lot of people have not read the definitions and if they have, don’t really understand them. (I admit, I might not understand them either.) [For those interested, google for: A revised taxonomy for intrusion detection systems]
NAD’s implied advantages include reduced tuning complexity, …
This is again not true. Have you ever tried to train an anomaly detection system? I think that’s much harder than tuning signatures. The author actually contradicts his own statement in the next sentence:
NADS suffer from high false-positive rates…
Well, if it’s so easy to tune them, why are there many false-positives?
What does this mean:
Network anomaly detection systems offer a different view of network activity, which focuses on abnormal behaviors without necessarily designating them good or bad.
Why do I have such a system then? That’s exactly what an anomaly detection system does. It lears normal behavior and flags anomalous behavior. Maybe not necessarily bad behavior, but certainly anomalous!
The case study presented is pretty weak too, I think. Detecting unusual protocols on the network can very nicely be done with MRTG or just netflow itself. I don’t need an NAD (and again, I think this should really be NBAD) for that. By the way, a signature-based NIDS can do some of that stuff too. You have to basically feed it the network usage policy and it can alert if something strange shows up, such as the use of FTP to your financial servers. So is that anomaly detection? No! This goes along with the article claiming that NADs check for new services appearing to be used on machines. I always thought that was passive network discovery. I know things are melting together, but still! Oh, and protocol verification is anomaly detection? No! It’s not. Where is the baseline that you have to train against?
Finally, why would an NAD, or NBAD for that matter, only be useful in an environment that is, quote: “stable”? I know of many ISPs that are using those systems and they for sure don’t have a stable environment!
Well, that’s all I have…
It is probably a sign that I travel too much if I have already seen all the movies – a total of three – they show on the airplane. But at least it is a good opportunity to read some of the many computer security magazines that have piled up on my desk over the past months.
I have an old issue of the information security magazine in front of me, the August 2005 issue. There is an article by Joel Snyder entitled “ProvinGrounds” where he writes about setting up a test lab for security devies. I like the article, but one quote caught my attention:
LINUX is a useful OS for any lab equipment, but it’s best kept on the server side. Its weak GUI and lack of laptop support makes it difficult to use as a client.
I don’t know how much experience the author actually has with Linux, but I am typing this blog entry on a linux system running on my laptop. I don’t know what the problem is. In fact, my GUI is probably even nicer than some of the Windows installations (I know, this is personal taste ;). Why would someone write something like that?
In the same issue of the magazein, there is another article from the same author. This article is talking about VLAN security. While the article does not reveal anything new and exciting – if you actually follow Nicolas Fischbach’s work, you might even be disappointed with the article – there is one thing in the article which makes me think that the author never had to configure a firewall. He recommends doing the following on a switch:
Limit and control traffic. Many switches have the ability to block broad types of traffic. If your goal, for example, is to enable IP connectivity, then you want to use an ACL to allow IP and ARP Ethernet protocols only, blocking all other types.
Firstly, why are IP and ARP Ethernet protocols? But that’s not what is wrong here. Have you ever configured your switch like this? Do you want to know what happens if you do? If you ever had to setup a firewall (and I am not talking about one that has a nice GUI with a wizzard and stuff), then you know that this is going to break quite a lot of things. Dude, you need some of the ICMP messages! Path MTU discovery for example. What about things like ICMP unreachables? Without them you introduce a lot of latency in your network because your clients have to wait for timeouts instead of getting negative ACKs.
While I am not at all a fan of the “security through obscurity” paradigm, I think in some cases it has its benefits. For example in preventing automated scripts (i.e., worms) to compromise your box. I found this page about “Port Knocking” which only opens port 22 if you connect to a series of other ports beforehand. What I like about this solution is the simplicity by using iptables.
The solution uses the
-m recent –rcheck
feature of iptables to open port 22 if a certain other port is being connected to.
The RAID (Recent Advances in Intrusion Detection) conference next year will be held in Hamburg. I will be on the program committee for the conference.
Make sure you submit a paper and attend the con!
I recently posted on Focus-IDS, loganalysis and the idug mailinglists to ask about what people are using to visualize their log files. Here is a summary of answers I got:
- One of the answers was Excel. Well, why not.
- Perl and MySQL, but completely manual approach.
- A paper on some techniques. I have not read it yet, but will post my comments when I did so: http://www.cs.unm.edu/~chris2d/papers/HPC_Analytics_Final.pdf
I got a couple more responses, but I want to verify and see what they are about first before I post them. Stay put.
I am still sitting in the airplane and the next article from the ISSA Journal from November 2005 that catches my attention is the “Log Data Management: A Smarter Approach to Managing Risk”. I have only a few comments about this article:
- The author demands that all the log data is archived, and archived unfiltered. Well, here is a question: What is the difference between not logging something and logging it, but later filtering it out? What does that mean for litigation quality logs?
- On the same topic of litigation quality data, the author suggest that a copy of the logs are save in the original, raw format while analysis is done on the other copy. I don’t agree with this. I know, in this matter my opinion does not really count and nobody is really interested in it, but I will have some proof soon that this is not required. I am not a lawyer, so I will not even try to explain the rational behind allowing the processing of the original logs and still maintaining litigation quality data.
- “Any log management solution should be completely automated.” While I agree with this, I would emphasize the word should. What does that mean anyways? Completely automated in the real of log management? Does that mean the log is archived automatically? Does it mean that the log management solution takes action and block systems (like an IPS)? There will always need to be human interaction. You can automate a lot of things, including the generation of trouble tickets, but at least then, an operator will be involved.
- Why does the author demand that “companies should look for an applicance-based solution”. Why is that important? The author does not give any rational for that. I can see some benefits, but there are tons of draw-backs to that approach too. I yet have to see a compelling reason why an appliance is better than a custom install on company approved hardware.
- In the section about alerting and report capabilities, the author mentiones “text-based alerts”, meaning that rules can be setup to trigger on text-srings in log messages. That’s certainly nice, but sorry, it does not scale. Assume I want to setup a trigger on firewall block events. I can define a text-string of “block” to trigger upon. But all the firewalls which call this not a block, but a “Deny” will not be caught. Have you heard of categorization or an event taxonomy? That’s what is really needed!
- “… fast text-based searches can accelerate problem resolution …” Okay. Interesting. I disagree. I would argue that vsualization is the key here. But I am completely biased on that one 😉
- Another interesting point is that the author suggest that “… a copy [of the data] can be used for analysis”. Sure. Why not, but why? If the argument is litigation quality data again, why would compression, which is mentioned in the next sentence be considered a “non-altering” way of processing the data. If that is the argument. I would argue that I can work with the log data by normalizing it and even enriching the data without altering it.
Sitting in an airplane from San Francisco to New York City, I am reading through some old magazines that piled up on my desk over the past weeks. One article I cannot resist commenting on I found in the ISSA Journal from November 2005. It’s titled: “Holistic IPS: The Convergence of Intrusion Prevention Technologies”. The article talkes about an apparently new way of doing intrusion detection (or intrusion prevention). Well, to give my comment up front already, this approach is far from new. SIMs are doing some of the things mentioned and quite a few more for years! It’s one more of these cases where people could learn from other technologies, but nobody pays attention.
Let’s have a look at some flaws in the article:
- First, a claim is made that an IPS can come up with a threat level for each attack. This is very interesting. I like it. But here are a few questions: 1. How do you define an attack? What is that? The author does not touch on that. 2. A threat-level needs to take into account how important the targeted asset is to the organization. It is first of all totally impractical for an IPS to know about the assets it protects and second, the author of the article does not mention this at all. We all know that risk = asset X vulnerability X threat. Why is this not mentioned here?
- The author claims that an attack always starts with probing activity from the attacker. What? Have you heard of networks of scanners that are just there to scan millions of hosts for specific services and vulnerabilities? The attackers will never be the same ones that conducted the reconnaissance. So this logic is somewhat flawed. And even if there were no such scanner networks, why does an attack always have to start with pre-attack reconnaissance? That’s just not true.
- The pre-attack reconnaissance does per the article not impart a threat. Oh really? Have you ever run a nessus scan against your network and used all the plugins? Did all of you machines survive? Mine did not. But Nessus is just a scanner. So just recon acctivity…
- The entire idea of this new correlation engines in the IPSs is that the pre-attack recon is correlated against the real exploit traffic. The article fails to outline how the recon activity can be detected. Is it just anormal behavior? Well… I would think there are attackers that can scan your network without being too anomalous. Ever heard of application level attacks? And you claim by just analyzing the traffic, without deep inspection, you will find that scanning activity? So the claim that “… behavior-based intrusion prevention technologies will probably be most effective in detecting them [the recon activity]” is not really true. I argue that there are much better technologies to make that call.
- What does it mean that an attack has “… a unique (and dynamic) threat level”. I thought unique would rule dynamic out? I don’t quite understand.
- “The correlation engine proposed in this article…” Well, maybe this technique is new in the IPS world, but there are other technologies that have used this kind (and a few more) of correlation for years.
- Differentiating probes and intrusions is not really described in the article either. I think I hit on this point already, but a probe is not necessarily detectable with just behavior based models. There are many one-packet probes that a signature-based approach can detect much more efficiently!
- The author gives an example of how this “new” type of IPS can detect an attack. Nowhere in the entire process is there a mention of the target’s vulenrability and a correlation against it. This is just flawed. You need to know whether a target is vulnerable to determine whether the attack was successful, unless you get evidence from the target itself, be that either from audit logs on the system or network traffic that proofs a vulnerable system is present or the attack was successful.
- Continuing on the example, I just don’t seem to understand that there have to be a probe and then an intrusion that gets correlated with the probe and then an action (e.g., blocking the source) is executed. Why not looking at the intrusion, determinig that it has a potential to be successul or was successful and then block?
- Here is something I really like in the article: If there were probes going to multiple machines, the offender is being blocked not just from going to the one target that it was already actively trying to exploit, but also all the other machines that it probed.
- You decide on this one: “The preventive countermeasures accurately reflect the tangible threat level…”.
- The article fails almost completely to discuss the topic of false positives in either the detection of probes or the detection of the intrusions (I don’t like this word at all, let’s call it an exploit). Maybe there are none?
- The argument that this approach “preludes the need for deep-packet inspection)” and that it “… improve IPS performance” by basically not having to look at all the packets due to the classification into two groups is not new. Have you ever tried to deploy an IDS behind a firewall? Same thing? Maybe not quite, but the idea is exactly the same.
- What I am missing in this whole discussion is also passive network discovery. If so much effort is put into behavioral discovery, why is it not used to model the services and vulnerabilities that the targets are exposing? There are technologies out there that use this very well.
Am I over critical? Did I totally misunderstand what the article tries to say?