September 14, 2007

Open Log Format – What a Great Standard – Not

Category: Log Analysis,Security Information Management — Raffael Marty @ 4:01 pm

When eIQnetworks announced their OpenLogFormat, I think they did it just for me. I love it. I really enjoy taking these things apart to show why they are really really bad attempts. I am sure these guys are not readers of my blog. Otherwise they would have known that I will question their standard, line by line. It just doesn’t add up for me. Why are companies/people not learning/listening?
So, there is yet another “standard” for event interoperability being suggested by yet another vendor. While some vendors (for example the one I used to work for), actually thought about the problem and made sure they are coming up with something useful, I am not sure this standard lives up to that promise. Let me go through the standard piece by piece, right after some general comments:

  • Why another interoperability standard? There is not a single word of motivation printed in the standards document. Don’t we have existing standards already?
  • You have to register for download the standard? Well, I know, ArcSight makes that same mistake. That wasn’t my doing! I promise.
  • How does this standard compare to others? What’s the motivation for defining it? Is it better than everything else?
  • When exactly would you apply this standard? All the time? OLF (the open log format) states:
    OLF is designed for logging network events such as those often logged by firewalls, but it can also be used for events not related to the network.
    What the heck does that mean? For everything? Do you want me to proof you wrong? There are tons of examples where this thing won’t be able to apply this standard.
  • You did not do your homework, my friends! In a lot of areas. Some friends of mine already commented on the fact that this is advertised as an “open” log format. The press release even calls it an open source log format. What does that mean? Was there a period for public comment? Believe me, there wasn’t. I would have known FOR SURE!
  • With regards to the homework. Have you heard of CEE? Yes, that’s a group that actually knows quite a bit about logging. Why bother asking them, they would only critique the proposal and possibly shoot it down? You bet. That’s what I am doing right now anyways.
  • Let’s see, did you guys learn from past mistakes? Don’t get me started. I claim NO. Read on and you will see a lot of cases that proof why.
  • Have you read my old blog entries and at least tried to understand what logging is about? I can guarantee that you guys have not. Or maybe you didn’t understand what I was saying. Hmm…. Here again, for your reference.
  • Have you looked at the other standards out there? For example CEF (common event format) from ArcSight. I am definitely biased towards that one, as I have written it, but even now that I don’t work there anymore, I still think that CEF is actually a really good logging standard. Again. Not done your homework!
  • Last general question: Why would I be using this standard as opposed to anything else, for example CEF. Is eIQnetworks big enough so I would care? Last time I checked, the answer was: No. If this was something that was done by Microsoft, I might care, just because of their size. Maybe you have a lot of vendors already supporting this standard? Yes? How many? Who? I have not heard OLF ever before and I deal with log management every day! So I doubt any significant adoption is reality. Actually, I just checked the Web page and there are six companies supporting it. Okay. All that πŸ˜‰

Let’s go through the standard in more detail:

  • I already made this point: What is the area where this standard applies? Networking and non-networking events (That’s what OLF claims)? Nice. And why would you require an IP address field (to be exact: internalIP and externalIP) for every record? In your world, are there only events that contain IPs? In mine, there are many others too!
  • You are proposing a log-file approach. So you are defining a file-based standard, limiting it to one transport. Okay. But why? Again, read my blog about transport-independence. Who is logging to files only? A minority of products in the networking realm.
  • Have you guys written parsers before? (Yes, I have!). Do you know how bad it is to read headers first? Makes a whole lot of use-cases impossible. And to be frank, it requires too much coding (I am lazy).
  • Minor detail: You guys are already on version 1.1? Hmm… I wonder how version 1.0 looked πŸ˜‰
  • I don’t think the author of this paper has written a standard before: “The #Version line gives the version of OLF, which should always be 1.1.” How do you do updates? You deprecate this document? Confusing, confusing.
  • Why do you need a #Date line in the header? That does not make any sense AT ALL!
  • Okay, so you are using a header line that defines the fields. All right. Let’s assume that’s a good idea in order to reduce the size of an event (exercise to the reader why this is true). Why do you say then:
    NOTE: The fields may not vary; they must alwas be the ones specified in this document.

    What? This does not make any sense at all! Whatsoever! Delete that line. Done. It’s irrelevant.
  • Let’s go back to the header line. Why all these required fields? spam-info? This is very inefficient. Why have all these fields for every event? It unnecessarily bloats your events and circumvents the idea of a header line!
  • Tab-separated fields. Okay. Your choice. Square brackets to deal with escaping? Are you guys coders? That’s not a standard way of doing things at all. Anyone who wrote code before, have you seen this approach anywhere? If you stuck to commas and quotes, you might be able to read your logs in Excel without any configuration πŸ˜‰
  • tab-separated subfields. Shiver.
  • Guys, your example on page one is horrible. Priority in the preamble and in the suffix? Then the virtualdevice is root? Maybe I can’t count. You know what, I think the fields don’t even align. What are all the IPs in the message? Part of the message (the one with the seemingly interesting IPs) seems to be lumped together into one field (uses the square brackets). I don’t get it.
  • Error lines? Come again? So there are really two different types of log entries? Or no, hang on, there aren’t. Those lines are only generated if the OLF consumer realizes that the format is not correct? What does that have to do with a logging standard. If I wasn’t confused yet, now I definitely am.
  • Open source: “a device-type assigned by eIQnetworks”. No further comment.
  • Wow. Is it right that every log entry carries the “original” log message also (called the Nativelog)? So, if a product supports OLF by default, that’s just empty? Come on guys. Are you really suggesting to double the size of messages?
  • Talking about the field dictionary… What does it mean to have “unused” fields? Unused by what? The standard? Oh, maybe this is not a standard?
  • I will spare you the analysis of all the fields in the dictionary. There are tons of problems. Just one: If you have a count bigger than one and you only have one timestamp. What does that mean? All the events happened at the same time?
  • Note that the Nativelog field is defined as: Original syslog line. Okay, so this is a file-based standard, but it consumes syslog messages?
  • event types: There is indeed, and I kid you not, a -1 value. Is that for real?
  • priority codes: Nice. Read this (again, this is a standard, in case you forgot):
    The descriptions [of the priorities] given are the official interpretation, but usage varies; some vendors report routine events with higher priority
  • Note the copyright at the bottom of the pages πŸ˜‰ [Okay, I admit, I might have made the same mistake with the first version of CEF, you are forgiven].

Have I convinced you yet why not to use this “standard”?

Random observation: Why does this log remind me of IIS logs gone wrong?

[tags]log standard, logging, event interoperability, cee, olf, open log format[/tags]

August 25, 2007

Event Processing – Normalization

Category: Log Analysis,Security Information Management — Raffael Marty @ 6:15 pm

A lot has happened the last couple of weeks and I am really behind with a lot of things that I want to blog about. If you are familiar with the field that I am working in (SIEM, SIM, ESM, log management, etc.), you will fairly quickly realize where I am going with this blog entry. This is the first of a series of posts where I want to dig into the topic of event processing.

Let me start with one of the basic concepts of event processing: normalization. When dealing with time-series data, you will very likely come across this topic. What is time-series data? I used to blog and talk about log files all the time. Log files are a type of time-series data. It’s data which is collected over time. Entries are associated with a time stamp. This covers anything from your traditional log files to snapshots of configuration files or snapshots of tools that are run on a periodic basis (e.g., capturing your netstat output every 30 seconds).

Let’s talk about normalization. Assume you have some data which reports logins to one of our servers. We would like to generate a report which shows the top ten users accessing the server. How would you do that? We’d have to identify the user name in the log entry first. Then we’d extract it, for example by writing a regular expression. Then we’d collect all the user names and compile the top ten list.

Another way would be to build a tool which picks the entire log entry apart and puts as much information from the event into a database. As opposed to just capturing the user name. We’d have to create a database with a specific schema. It would probably have these fields: timestamp, source, destination, username. Once we have all this information in a database, it is really easy to do all kinds of analysis on the data, which was not possible before we normalized it.

The process of taking raw input events and extracting individual fields is called normalization. Sometimes there are other processes which are classified as normalization. I am not going to discuss them right here, but for example normalizing numerical values to fall in a predefined range is generally referred to as normalization as well.

The advantages of normalization should be fairly obvious. You can operate on the structured and parsed data. You know which field represents the source address versus the destination address. If you don’t parse the entries, you don’t really know that. You can only guess. However, there are many disadvantages to the process of normalization that you should be aware of:

  • If you are dealing with a disparate set of event sources, you have to find the union of all fields to make up your generic schema. Assume you have a telephone call log and a firewall log. You want to store both types of logs in the same database. What you have to do is take all the fields from both logs and build the database schema. This will result in a fairly large set of fields. If you keep adding new types of data sources, your database schema gets fairly big. I know of a SIM which uses more than 200 hundred fields. And still that doesn’t cover nearly all the fields that are needed to cover a good set of data sources.
  • Extending the schema is incredibly hard: When building a system with a fixed schema, you need to decide what your schema will look like. If, to a later point in time, you have a need to add another type of data source, you will have to go back and modify the schema. This can have all kinds of implications on the data already captured in the data store.
  • Once you decided to use a specific schema, you have to build your parsers to normalize the inputs into this schema. If you don’t have a parser, you are out of luck and you cannot use that data source.
  • Before you can do any type of analysis, you need to invest the time to parse (or normalize) the data. This can become a scalability issue. Parsing is fairly slow. It generally applys regular expressions to each of the data entries, which is a fairly expensive operation.
  • Humans are not perfect and programmers are not either. The parsers will have bugs and they will screw up normalization. This means that the data that is stored in the database could be wrong in a number of ways:
    • A specific field doesn’t get parsed. This part of the data entry is not available for any further processing.
    • A field gets parsed but assigned to the wrong field. Part of your prior analysis could be wrong.
    • Breaking up the data entry into tokens (fields) is not granular enough. The parser should have broken the original entry into more specific fields.
  • The data entries can change. Oftentimes, when a new version of a product is released, it either adds new data types or it changes some of the log entries. This has to be reflected in the parsers. They need to be updated to support the new data entries, before the data source can be used again.
  • The original data entry is not available anymore, unless you are spending the time and space to store the original data entry along with the parsed and extracted fields. This can have quite some scalability issues as well.

I have seen all of these cases happening. And they happen all the time. Sometimes, the issues are not that bad, but other times, when you are dealing with mission critical systems, it is absolutely crucial that the normalization happens correctly and on time.

I will expand on the challenges of normalization in a future blog entry and put it into the context of security information management (SIM).

[tags]SIM, SIEM, ESM, log management, event normalization, event processing, log analysis[/tags]

July 29, 2007

Chief Security Strategist @ Splunk

Category: Log Analysis — Raffael Marty @ 8:12 pm

Effective immediately, I have a new employer! I am leaving ArcSight to start working for Splunk, an IT search company in San Francisco. As their Chief Security Strategist, I will be working in product management, with responsibility for all of the UI and solutions.

The work I have been doing in my past with log management and especially visualization is going to directly apply to my new job. I will be spending quite some time to help further the visual interfaces and define use-cases for log management. Exactly what I’ve been doing for the last four years already πŸ˜‰

Please don’t send me any emails to my arcsight email anymore. My new address:
raffy at splunk . c o m

I found out that a lot of the Splunk developers hang out on IRC (#splunk). I’ve been hanging out in there for the last couple of days. Maybe you can catch me there too πŸ˜‰

These Splunk guys are funny. One of the first things they did is giving me a Mac book. Darn. I have never used a Mac before. This is crazy. All the little things I had developed and installed on my Linux boxen I now have to translate to OS X. I am slowly getting used to this beast, but there are still things I wasn’t able to figure out. Maybe some of you want to help me out?

  • The first thing that I did was looking for something to cover the built-in camera. I don’t trust this thing. Who knows who’s watching πŸ˜‰ I finally found the iPatch. Unfortunately they are out of stock. Well, I just built my own …
    cimg1523_2.jpgcimg1524.jpg
  • Then I discovered that the plugs I have for the microphone and headphone jacks are not working either. They are slightly too big. Well, I will have to talk to Josh about that during DefCon πŸ˜‰
    cimg1528.jpg
  • Then the other thing that I am struggling with is logging and auditing. I used tcpspy before to log all the connections that are opened to and from my machine. I downloaded the source and started compiling. No luck. Here is the error during compilation. Anyone know how to fix this?
    tcpspy.c: In function 'ct_read':
    tcpspy.c:236: error: 'TCP_ESTABLISHED' undeclared (first use in this function)
  • Maybe there is another tool that I can use to record all the connections? The nice thing about tcpyspy is that it also logs the application that opened or accepted the connection and the user associated with that.
  • What do I do about auditing? Are there instructions somewhere on how to enable either BSM auditing for Mac OSX or is there something else? I would like to mainly audit access to critical files on my box.
  • There are all kinds of other little odd things, but these are the items bugging me right now πŸ˜‰

See ya all at BlackHat! Hit me up so we can meet up!

July 10, 2007

Cubing Log Files

Category: Log Analysis — Raffael Marty @ 8:44 pm

<disclaimer>This post is not 100% serious</disclaimer>

The mere fact that I have to put a disclaimer here is sort of funny. I guess I don’t want to discuss a topic and then people come back calling me names πŸ˜‰

At the FIRST conference last month in Spain, I was talking to Ben Chai for a while and he was recording the talk, as well as summarized some of the discussion in a blog entry. We talked about log analysis of huge amounts of data. I guess I came up with the idea of cubing logs to approach the problem, which uses log visualization of subsets to help the analyst.

[tags]visualization, log cubing, log analysis[/tags]

June 18, 2007

AfterGlow Example – Visualizing IP Tables Logs

Category: Log Analysis,Visualization — Raffael Marty @ 11:13 am

I am sitting in Seville, at the First conference, where I will be teaching a workshop on Wednesday. The topic is going to be insider threat visualization. While sitting in some of the sessions here, I was playing with my iptables logs. Here is a command I am running to generate a graph from my iptables logs:
cat /var/log/messages | grep "BLOCK" | perl -pe 's/.*\] (.*) (IN=[^ ]*).*(OUT=[^ ]*).*SRC=([^ ]* ).*DST=([^ ]* ).*PROTO=([^ ]* ).*?(SPT=[^ ]+ )?.*?(DPT=\d* )?.*?(UID=\d+)?.*?/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | awk -F, '{printf "%s,%s,%s\n",$4,$5,$8}' | sed -e 's/DPT=//' -e 's/ //g' | afterglow.pl -e 1.3 -c iptables.properties -p 1 | neato -Tgif -o/tmp/test.gif; xloadimage /tmp/test.gif
This is fairly ugly. I am not going to clean it up. You get the idea. Now, what does the iptables.properties look? Well, if you thought the command above was ugly. The properties file is completely insane, but it shows you the power of using the “variable” assignment:
variable=$ip=`ifconfig eth1 | grep inet`
variable=$ip=~s/.*?:(\d+\.\d+\.\d+\.\d+).*\n?/\1/;
variable=$subnet=$ip; $subnet=~s/(.*)\.\d+/\1/;
variable=$bcast=$subnet.".255";
color="invisible" if (field() eq "$bcast")
color="invisible" if ($fields[2] eq "67")
color="yellow" if (field() eq "$ip")
color.source="greenyellow" if ($fields[0]=~/^192\.168\..*/);
color.source="greenyellow" if ($fields[0]=~/^10\..*/);
color.source="greenyellow" if ($fields[0]=~/^172\.16\..*/);
color.source="red"
color.event="greenyellow" if ($fields[1]=~/^192\.168\..*/)
color.event="greenyellow" if ($fields[1]=~/^10\..*/)
color.event="greenyellow" if ($fields[1]=~/^172\.16\..*/)
color.event="red"
cluster.target=">1024" if (($fields[2]>1024) && ($fields[1] ne "$ip"))
cluster.source="External" if (field()!~/^$subnet/)
cluster.event="External" if (field()!~/^$subnet/)
color.target="blue" if ($fields[1] eq "$ip")
color.target="lightblue"

Mind you, this was run on an Ubuntu Linux. You might have to change some of the commands and parsing of the output. Pretty neat, ey? Here is a sample output that this generated. My box is yellow. The question that I was trying to answer: Why is someone trying to conect to me on port 80?

IPTables Visualization

[tags]visualization, iptables, afterglow, security[/tags]

June 17, 2007

AfterGlow 1.5.8 – Security Data Visualization

Category: Log Analysis,Visualization — Raffael Marty @ 10:29 am

Finally! I worked on this AfterGlow release forever. I submitted a few checkpoints to CVS before I felt read to released AfterGlow 1.5.8. I highly recommend upgrading to 1.5.8. It has a few bugfixes, but what you will find most rewarding is the new color assignment heuristic and the capability to change the node sizes. Here is the complete changelog:

06/10/07 Version 1.5.8
- Nodes can have a size now:
(size.[source|target|event]=<expression returning size>)
Size is accumulative! So if a node shows up multiple times, the
values are summed up!! This is unlike any other property in
AfterGlow where values are replaced.
- The maximum node size can be defined as well, either with a property:
(maxnodesize=<value>)
or via the command line:
-m=<value>
The size is scaled to a max of 'maxsize'. Note that if you
are only setting the maxsize and no special sizes for nodes
Afterglow will blow the nodes up to optimal size so the labels
will fit.
There is a limit also, if you want the source nodes to be a max of say
1, you cannot have the target nodes be scaled to fit the labels. They
will have a max size of 1 and if you don't use any expression, they will
be of size 1. This can be a bit annoying ;)
Be cautious with sizes. The number you provide in the assignment is not the actual size
that the node will get, but this number will get scaled!
- One of the problems with assignments is that they might get overwritten with later nodes
For example, you have these entries:
A,B
A,C
and your properties are:
color="blue" if ($fileds[1] eq "B")
color="red"
you would really expect the color for A to be blue as you specified that explicitly.
However, as the other entry comes later, the color will end up being red. AfterGlow takes
care of this. It will determine that the second color assignment is a catch-all, identified
by the fact that there is no "if" statement. If this happens, it will re-use the more specific
condition specified earlier. I hope I am making sense and the code really does what you would
expect ;)
- Define whether AfterGlow should sum node sizes or not.
(sum.[source|target|event]=[0|1];)
by default summarization is enabled.
- Added capability to define thresholds per node type in properties file
(threshold.[source|event|target]=<value>;)
- Added capability to change the node shape:
shape.[source|event|target]=
(box|polygon|circle|ellipse|invtriangle|octagon|pentagon|diamond|point|triangle|plaintext)
- Fixed an issue where, if you use -t to only process two columns
and you can use the third in the property file for size or color.
The third column was not carried through, however. This is fixed!
- The color assignment heuristic changed a bit. Along the same lines that the size assignment works.
Catch-alls are not taking presedence anymore. You might want to take this into account when defining
colors. The catch-all will only be used, if there was never a more specific color assignment that
was evaluated for this node. For example:
color="gray50" if ($fields[2] !~ /(CON|FIN|CLO)/)
color="white"
This is used with a three-column dataset, but only two are displayed (-t). If the first condition
ever evaluated to true for a node, the last one will not hit, although the data might have a node that
evaluates to false in the first assignment and then the latter one would grip. As a catch-all it does
get superior treatment. This is really what you would intuitively assume.
- Just another note on color. Watch out, if you are definig colors not based on the fields in the
data, but some other conditions that might change per record, you will get the wrong results as
AfterGlow uses a cache for colorswhich keys off the concatenation of all the field values. Just
a note! Anyone having problems with this? I might have to change the heuristic for caching then. Let
me know.

[tags]afterglow, visualization, security log analysis, security visualization[/tags]

June 7, 2007

Common Event Exchange Formats – XDAS

Category: Log Analysis — Raffael Marty @ 1:01 am

Not too long ago, I posted an entry about CEE, the Common Event Expression standard which is a work in progress, lead by Mitre. I was one of the founding members of the working group and I have been in discussions with Mitre and other entities for a long time about common event formats. Anyways, one of the comments to my blog entries pointed to an effort called Distributed Audit Service (XDAS). I have not heard of this effort before and was a bit worried that we started something new (CEE) where there was already a legitimate solution. That’s definitely not what I want to do. Well, I finally had time to read through the 100! page document. It’s not at all what CEE is after. Let me tell you why XDAS is not what we (you) want:

  1. How many people are actually using this standard for their audit logs? Anyone? I have been working in the log management space for quite a while and not just on the vendor side, but also in academia. I have NEVER heard of it before. So why should I use this if nobody else is? In contrast, CEF from ArcSight is in use not just by ArcSight itself, but many of its partners.
  2. I just mentioned it before. 100 pages! What’s the last time you read through 100 pages? I just did. Took me about an hour to read the document and I skipped a lot of the API definitions. My point being: A standard should be at most 10 pages! It’s not just the length of the document, it’s the complexity which comes with it. Nobody is going to read and adhere to this. The more you demand, the more mistakes are being made by vendors which implement this. Oh, and please don’t tell me to only read pages 1-10! Make it 10 pages if you want me to read only those.
  3. How much time does it take to actually implement this? Has anyone done it? How long did it take you? I bet a couple of weeks, plus QA, etc. Much too long. I am NEVER going to make that investment.
  4. Let’s get into details. Gosh. Why does this define APIs? Don’t dictate how I should do things. A standard needs to define the common interface, not how I have to open a stream and safe files and so on. It’s overkill. The implementations will differ and they should! And why lock yourself into this API transport. Can you support other transports?
  5. It seems that there is an XDAS service that I need to integrate with. What is that? That’s not clear to me. Can I exchange logs (audit records) between just to parties or do I need an intermediary XDAS service? I am confused.
  6. Keep the scope of the standard to what it wants to accomplish: event interchange! This thing talks about access control, RBAC, filtering, etc. Why? Please! That’s absolutely unnecessary and should not be part of an interchange standard!
  7. In general, I am quite confused about the exact setting of this. Are we only talking about audit records? Security related only? What about other events? I want this to be very generic! Don’t give me a security specific solution! The world is opening up! We need generic solutions!
  8. What kind of people wrote this? Using percent signs to escape entries and colons to separate them? Must be from the old AS/400 world … Sorry… I just had to say this, in a world of CSV and key-value pairs it is sort of funny to see these things.
  9. The glossary could really benefit from a definition of event and log.
  10. The standard requires a measure of uncertainty for timestamps. I have never heard of this. Could you please elaborate? How can I measure time uncertainty???
  11. In section 2.5, access IDs and principle IDs are mentioned. What’s that?
  12. Although the standard does not position itself with log management, it talks about alarms and actions. Why would you need to mention actions in this context at all?
  13. A pointer to the original log entry? How do you do that? Log rotation, archiving, leave alone the mere problem of how to reference the original entry to start with.
  14. Why does the standard require the length of a record to be communicated? Just drop that.
  15. The caller has to provide an event_number. I like it. But sorry folks, syslog does not have it. How do you get that in there?
  16. Originator identity: It specifies that this should be the UNIX id. ID of what? The process that logs? The user that initiated the action? The remote user that sent the packet to this machine to trigger some action? How do you know that?
  17. I like the list of XDAS events. It’s a good start, but it’s definitely not all we need. We need much more! Again a nice list to start with.
  18. Why is there so much information encoded in the outcome, instead of defining individual entries? There might be a valid reason, but please motivate these decisions.

That’s what I have for a quick review. Again, no need for us to stop working on CEE. There is still a need for a decent interoperability standard.

[tags]interoperability, log, event exchange, common exchange format, common event expression, log management[/tags]

May 28, 2007

Charts

Category: Log Analysis,Visualization — Raffael Marty @ 4:32 pm

My R journey seems to be over… It’s just too complicated to get charts and plots right with R. I found a new library that I staerted playing with: chart director. I am using Perl to drive the graph generation and it’s actually fairly easy to use. There are some cumbersome pieces, such as seg-faults when you have NULL values in a matrix that you are drawing, but that’s things I can deal with ;

Here is a sample graph I generated: (I know, the color selection is fairly horrible!)

proto.png

[tags]R, charts, visualization, perl[/tags]

May 26, 2007

Machine – User Attribution

Category: Log Analysis,Security Information Management — Raffael Marty @ 9:54 pm

Log analysis has shifted fairly significantly in the last couple of years. It is not about reporting on log records (e.g., Web statistics or user logins) anymore. It is all about pinpointing who is responsible for certain actions/activities. The problem is that the log files do oftentimes not communicate that. There are instances of logs (mainly from network centric devices), which contain IP addresses that are used to identify the subject. In other instances, there is no subject that can be identified in the log files at all (database transactions for example).

What I really want to identify is a person. I want to know who is to blame for deleting a file. The log files have not evolved to a point where they would contain the user information. It generally does not help much to know what machine the user came from when he deleted the file.
This all is old news and you probably are living with these limitations. But here is what I was wondering about: Why has nobody built a tool or started an open source project which looks at network traffic to extract user to machine mappings? It’s not _that_ hard. For example SMB traffic contains plain-text usernames, shares, originating machines, etc. You should be able to compile session tables from this. I need this information. Anyone? There is so much information you could extract from network traffic (even from Kerberos!). Most of the protocols would give you a fair understanding of who is using what machine at what time and how.

[tags]identify correlation, user, log analysis, user mapping[/tags]

May 11, 2007

Human Readable Log Entires

Category: Log Analysis,Security Information Management — Raffael Marty @ 10:47 pm

I was trying to get my Ubuntu desktop to use Beryl, just like my laptop does. Unforunately, my NVidia drivers didn’t quite want to do what I wanted them to do. Long story short, at some point I remembered to check in the log files to see whether I could determine what exactly the problem was. Where should I look first? /var/log/messages And right there it was:

May 11 11:15:12 zurich kernel: [ 2503.193111] NVRM: API mismatch: the client has the version 1.0-9631, but
May 11 11:15:12 zurich kernel: [ 2503.193114] NVRM: this kernel module has the version 1.0-9755. Please
May 11 11:15:12 zurich kernel: [ 2503.193115] NVRM: make sure that this kernel module and all NVIDIA driver
May 11 11:15:12 zurich kernel: [ 2503.193117] NVRM: components have the same version.

Beautiful. That’s exactly what I needed to know. But hang on a second. Isn’t this a syslog entry? Wow. It just hit me. While I really liked the verbose output, I was trying to think about how I would parse this thing. How would I normalize this message to later apply machine logic to further process this? Aweful!

I guess my conclusion would be that we need two types of Syslogs! One that logs machine readable log entries and one for humans. Is that really what we want? Maybe the even better solution would be to only have a machine readable log and then provide an application that can read the log and blow the contents up to make it readable for humans!

Where is CEE when you need it?