May 26, 2007
Log analysis has shifted fairly significantly in the last couple of years. It is not about reporting on log records (e.g., Web statistics or user logins) anymore. It is all about pinpointing who is responsible for certain actions/activities. The problem is that the log files do oftentimes not communicate that. There are instances of logs (mainly from network centric devices), which contain IP addresses that are used to identify the subject. In other instances, there is no subject that can be identified in the log files at all (database transactions for example).
What I really want to identify is a person. I want to know who is to blame for deleting a file. The log files have not evolved to a point where they would contain the user information. It generally does not help much to know what machine the user came from when he deleted the file.
This all is old news and you probably are living with these limitations. But here is what I was wondering about: Why has nobody built a tool or started an open source project which looks at network traffic to extract user to machine mappings? It’s not _that_ hard. For example SMB traffic contains plain-text usernames, shares, originating machines, etc. You should be able to compile session tables from this. I need this information. Anyone? There is so much information you could extract from network traffic (even from Kerberos!). Most of the protocols would give you a fair understanding of who is using what machine at what time and how.
[tags]identify correlation, user, log analysis, user mapping[/tags]
May 24, 2007
I am teaching a workshop at FIRST in Seville in June about Visualizing Insider Threat Data. I recorded a PodCast introducing my workshop and talking about visualization in general.
[tags]visualization, visualization podcast, security[/tags]
May 15, 2007
I was just listening to this podcast about security information management (SIM) systems. Tom Bowers from Information Security magazine is talking about various topics in SIM. Unfortunately I have to disagree with Tom on a couple of points, if not more. But let me pick the couple I find most important:
- Visualization is a great tool to see attacks in real-time. However, you can only see where the attacks are coming from and not how many. What? Why would I not be able to visualize that? You can map that to edge size, node size, map it as a color to you nodes, etc. I don’t know what system he looked at to make this statement, but that’s wrong!
- Active Response is something that SIMs cannot do. Well. Wrong again. I could tell you how ArcSight is doing this with the Threat Response Manager (TRM), but that would be a vendor pitch. That’s why I am going to mention SEC, the simple correlation engine. It can execute an arbitrary action. Well, it’s not quantum leaps from there to imagine how you could issue a command to add an ACL to a router for example. To sum up: Active response is something SIMs can do! If you want to know how exactly you do this with SEC, read my chapter on event analysis in the new Snort book.
These were the main points where I disagree with Tom. He could have done a bit of a better job describing the benefits of visualization, but that’s another story.
[tags]arcsight,visualization[/tags]
May 11, 2007
I was trying to get my Ubuntu desktop to use Beryl, just like my laptop does. Unforunately, my NVidia drivers didn’t quite want to do what I wanted them to do. Long story short, at some point I remembered to check in the log files to see whether I could determine what exactly the problem was. Where should I look first? /var/log/messages And right there it was:
May 11 11:15:12 zurich kernel: [ 2503.193111] NVRM: API mismatch: the client has the version 1.0-9631, but
May 11 11:15:12 zurich kernel: [ 2503.193114] NVRM: this kernel module has the version 1.0-9755. Please
May 11 11:15:12 zurich kernel: [ 2503.193115] NVRM: make sure that this kernel module and all NVIDIA driver
May 11 11:15:12 zurich kernel: [ 2503.193117] NVRM: components have the same version.
Beautiful. That’s exactly what I needed to know. But hang on a second. Isn’t this a syslog entry? Wow. It just hit me. While I really liked the verbose output, I was trying to think about how I would parse this thing. How would I normalize this message to later apply machine logic to further process this? Aweful!
I guess my conclusion would be that we need two types of Syslogs! One that logs machine readable log entries and one for humans. Is that really what we want? Maybe the even better solution would be to only have a machine readable log and then provide an application that can read the log and blow the contents up to make it readable for humans!
Where is CEE when you need it?
May 10, 2007
Although I work in the log/event management space and therefore help organizations to gather more information about people, I am a big opponent of personal information collection.
I flew back from Switzerland to San Francisco after my Christmas break and was in for a surprise. Not only did they want my passport (which I can sort of understand ;), but they also wanted me to fill out an additional form with my address in San Francisco, a contact person, etc. Why do they need all that? And then there is still the controversy about the airlines giving passenger information to the TSA and possibly other US agencies. I just don’t know what they use all this information for? To flag potentially dangerous passengers? What was the rate of false positives for that? I wish everyone had stringent laws as the EU for personal data. At least I would have a chance to find out what the data is that they have about me and possibly correct it!
Are you a non-US citizen, and if so, did you enter the US lately? Yes? Picture taken, finger prints (soon to be 10, not just 2). Even more data they collect. I’ve got to tell you, it’s not just the wait in the immigration hall that annoys me. It’s all the data they collect. And that’s what tirggered my post. I wouldn’t have that much of a problem, if they actually told me what they were going to do with the data and kept it safe.
Maybe they are starting to rethink the “data collection” after more and more of the US agencies are suffering data leaks. Now the TSA itself. Hopefully they realize that they should either start to be serious about data security or stop collecting information!
May 2, 2007
A group of info sec people is meeting up in San Francisco for an informal get together. We’ll have a drink and probably chat about security.
You work in computer security? Join us:
Wednesday, May 16th, 7pm at Zeitgeist in San Francisco.
No RSVP needed. So far you’ll have to buy your own beer unless we’ll find a sponsor 😉
April 23, 2007
I have some more detilas on the CEE effort, which is captured in this CEE Brochure. The most interesting part is probably page two where the benefits are outlined. This effort will continue by tackling one of the four standard areas after the other. I have a feeling that we will tackle the taxonomy part first. I can already see it, this is going to be HARD!
April 19, 2007
As Anton mentioned, there is a new event logging standard in the works. What Anton did not mention is the four areas that you need to talk about when you talk about a logging standard. Well, here they are:
- Common Event Syntax, like CEF
- Common Event Taxonomy. This is where you attach “meaning” or “semantics” to an event. There are a few proprietary ones, nothing standardized though.
- Common Event Transport
- Common Event Representation, defining what a device should log. An operating system should log user logins for example.
And don’t mix these things. The transport has nothing to do with the syntax! I don’t want to implement a SOAP environment to transport some events. Unfortunately a few companies and even standards have made that mistake! I don’t want to mention anyone here…
Stay tuned for http://cee.mitre.org to go live and learn more about all of this.
March 20, 2007
In cryptography or science in general, you often need perfect random numbers. Well, up to today, that was my need as well. However, today I was trying to generate numbers that are not too random, but have a certain bias. I think it’s kind of ironic. Googling for a solution is almost impossible. Every link shows a perfect random number generator 😉
I don’t care what the bias is in the numbers that are generated. Actually, the bias can be pretty high. Anyone have a method to do this in Perl?
Can you do something like int(rand($upperLimit*1000)) % 1000 ??? Basically changing the interval from where the random number is taken and then shrinking it again?
March 13, 2007
I came accross this really nice library of R graphs and scripts. One that I really liked is a scatter plot with histograms for each of the axes. The code to generate such a graph is the following:
Dataset < - read.table("/home/ram/foo2_200.csv", header=FALSE, sep=",")
x <- as.numeric(Dataset$V2)
y <- as.numeric(Dataset$V3)
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
par(mar=c(3,3,1,1))
plot(x,y,xlab="",ylab="")
par(mar=c(0,3,1,1))
xhist <- hist(x, breaks=seq(min(x),max(x),(max(x)-min(x))/24), plot=FALSE)
barplot(xhist$count,axes=FALSE,space=0,col=heat.colors(24))
par(mar=c(3,0,1,1))
yhist <- hist(y, breaks=seq(min(y),max(y),(max(y)-min(y))/24), plot=FALSE)
barplot(yhist$count,axes=FALSE,space=0,horiz=TRUE,col=heat.colors(24))
And the result looks like this:
