I am sitting in Seville, at the First conference, where I will be teaching a workshop on Wednesday. The topic is going to be insider threat visualization. While sitting in some of the sessions here, I was playing with my iptables logs. Here is a command I am running to generate a graph from my iptables logs: cat /var/log/messages | grep "BLOCK" | perl -pe 's/.*\] (.*) (IN=[^ ]*).*(OUT=[^ ]*).*SRC=([^ ]* ).*DST=([^ ]* ).*PROTO=([^ ]* ).*?(SPT=[^ ]+ )?.*?(DPT=\d* )?.*?(UID=\d+)?.*?/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | awk -F, '{printf "%s,%s,%s\n",$4,$5,$8}' | sed -e 's/DPT=//' -e 's/ //g' | afterglow.pl -e 1.3 -c iptables.properties -p 1 | neato -Tgif -o/tmp/test.gif; xloadimage /tmp/test.gif
This is fairly ugly. I am not going to clean it up. You get the idea. Now, what does the iptables.properties look? Well, if you thought the command above was ugly. The properties file is completely insane, but it shows you the power of using the “variable” assignment: variable=$ip=`ifconfig eth1 | grep inet`
variable=$ip=~s/.*?:(\d+\.\d+\.\d+\.\d+).*\n?/\1/;
variable=$subnet=$ip; $subnet=~s/(.*)\.\d+/\1/;
variable=$bcast=$subnet.".255"; color="invisible" if (field() eq "$bcast")
color="invisible" if ($fields[2] eq "67")
color="yellow" if (field() eq "$ip")
color.source="greenyellow" if ($fields[0]=~/^192\.168\..*/);
color.source="greenyellow" if ($fields[0]=~/^10\..*/);
color.source="greenyellow" if ($fields[0]=~/^172\.16\..*/);
color.source="red"
color.event="greenyellow" if ($fields[1]=~/^192\.168\..*/)
color.event="greenyellow" if ($fields[1]=~/^10\..*/)
color.event="greenyellow" if ($fields[1]=~/^172\.16\..*/)
color.event="red"
cluster.target=">1024" if (($fields[2]>1024) && ($fields[1] ne "$ip"))
cluster.source="External" if (field()!~/^$subnet/)
cluster.event="External" if (field()!~/^$subnet/)
color.target="blue" if ($fields[1] eq "$ip")
color.target="lightblue"
Mind you, this was run on an Ubuntu Linux. You might have to change some of the commands and parsing of the output. Pretty neat, ey? Here is a sample output that this generated. My box is yellow. The question that I was trying to answer: Why is someone trying to conect to me on port 80?
Finally! I worked on this AfterGlow release forever. I submitted a few checkpoints to CVS before I felt read to released AfterGlow 1.5.8. I highly recommend upgrading to 1.5.8. It has a few bugfixes, but what you will find most rewarding is the new color assignment heuristic and the capability to change the node sizes. Here is the complete changelog:
06/10/07 Version 1.5.8
- Nodes can have a size now:
(size.[source|target|event]=<expression returning size>)
Size is accumulative! So if a node shows up multiple times, the
values are summed up!! This is unlike any other property in
AfterGlow where values are replaced.
- The maximum node size can be defined as well, either with a property:
(maxnodesize=<value>)
or via the command line:
-m=<value>
The size is scaled to a max of 'maxsize'. Note that if you
are only setting the maxsize and no special sizes for nodes
Afterglow will blow the nodes up to optimal size so the labels
will fit.
There is a limit also, if you want the source nodes to be a max of say
1, you cannot have the target nodes be scaled to fit the labels. They
will have a max size of 1 and if you don't use any expression, they will
be of size 1. This can be a bit annoying ;)
Be cautious with sizes. The number you provide in the assignment is not the actual size
that the node will get, but this number will get scaled!
- One of the problems with assignments is that they might get overwritten with later nodes
For example, you have these entries:
A,B
A,C
and your properties are:
color="blue" if ($fileds[1] eq "B")
color="red"
you would really expect the color for A to be blue as you specified that explicitly.
However, as the other entry comes later, the color will end up being red. AfterGlow takes
care of this. It will determine that the second color assignment is a catch-all, identified
by the fact that there is no "if" statement. If this happens, it will re-use the more specific
condition specified earlier. I hope I am making sense and the code really does what you would
expect ;)
- Define whether AfterGlow should sum node sizes or not.
(sum.[source|target|event]=[0|1];)
by default summarization is enabled.
- Added capability to define thresholds per node type in properties file
(threshold.[source|event|target]=<value>;)
- Added capability to change the node shape:
shape.[source|event|target]=
(box|polygon|circle|ellipse|invtriangle|octagon|pentagon|diamond|point|triangle|plaintext)
- Fixed an issue where, if you use -t to only process two columns
and you can use the third in the property file for size or color.
The third column was not carried through, however. This is fixed!
- The color assignment heuristic changed a bit. Along the same lines that the size assignment works.
Catch-alls are not taking presedence anymore. You might want to take this into account when defining
colors. The catch-all will only be used, if there was never a more specific color assignment that
was evaluated for this node. For example:
color="gray50" if ($fields[2] !~ /(CON|FIN|CLO)/)
color="white"
This is used with a three-column dataset, but only two are displayed (-t). If the first condition
ever evaluated to true for a node, the last one will not hit, although the data might have a node that
evaluates to false in the first assignment and then the latter one would grip. As a catch-all it does
get superior treatment. This is really what you would intuitively assume.
- Just another note on color. Watch out, if you are definig colors not based on the fields in the
data, but some other conditions that might change per record, you will get the wrong results as
AfterGlow uses a cache for colorswhich keys off the concatenation of all the field values. Just
a note! Anyone having problems with this? I might have to change the heuristic for caching then. Let
me know.
This is a pretty random blog entry, but oh well… I am sitting in the London airport. In the lounge here, they have a computer that is connected to the Internet. I sat down, opened a browser, typed in my webmail domain and paused for a second. Then I opened a command shell and checked for open ports, processes running, and all that. Well, I still felt like I couldn’t enter my password. What if a keylogger was running?
Then I had an idea. I opened a notepad and just entered some random characters. Then I started, using the mouse, to rearrange the letters into my username and password. A key logger is not able to capture my password like this. I _think_ I successfully circumvented these beasts.
I know, there are other trojans, such as transaction generators that could get in my way, but …
Not too long ago, I posted an entry about
CEE, the Common Event Expression standard which is a work in progress, lead by Mitre. I was one of the founding members of the working group and I have been in discussions with Mitre and other entities for a long time about common event formats. Anyways, one of the comments to my blog entries pointed to an effort called
Distributed Audit Service (XDAS). I have not heard of this effort before and was a bit worried that we started something new (CEE) where there was already a legitimate solution. That’s definitely not what I want to do. Well, I finally had time to read through the 100! page document. It’s not at all what CEE is after. Let me tell you why XDAS is not what we (you) want:
How many people are actually using this standard for their audit logs? Anyone? I have been working in the log management space for quite a while and not just on the vendor side, but also in academia. I have NEVER heard of it before. So why should I use this if nobody else is? In contrast, CEF from ArcSight is in use not just by ArcSight itself, but many of its partners.
I just mentioned it before. 100 pages! What’s the last time you read through 100 pages? I just did. Took me about an hour to read the document and I skipped a lot of the API definitions. My point being: A standard should be at most 10 pages! It’s not just the length of the document, it’s the complexity which comes with it. Nobody is going to read and adhere to this. The more you demand, the more mistakes are being made by vendors which implement this. Oh, and please don’t tell me to only read pages 1-10! Make it 10 pages if you want me to read only those.
How much time does it take to actually implement this? Has anyone done it? How long did it take you? I bet a couple of weeks, plus QA, etc. Much too long. I am NEVER going to make that investment.
Let’s get into details. Gosh. Why does this define APIs? Don’t dictate how I should do things. A standard needs to define the common interface, not how I have to open a stream and safe files and so on. It’s overkill. The implementations will differ and they should! And why lock yourself into this API transport. Can you support other transports?
It seems that there is an XDAS service that I need to integrate with. What is that? That’s not clear to me. Can I exchange logs (audit records) between just to parties or do I need an intermediary XDAS service? I am confused.
Keep the scope of the standard to what it wants to accomplish: event interchange! This thing talks about access control, RBAC, filtering, etc. Why? Please! That’s absolutely unnecessary and should not be part of an interchange standard!
In general, I am quite confused about the exact setting of this. Are we only talking about audit records? Security related only? What about other events? I want this to be very generic! Don’t give me a security specific solution! The world is opening up! We need generic solutions!
What kind of people wrote this? Using percent signs to escape entries and colons to separate them? Must be from the old AS/400 world … Sorry… I just had to say this, in a world of CSV and key-value pairs it is sort of funny to see these things.
The glossary could really benefit from a definition of event and log.
The standard requires a measure of uncertainty for timestamps. I have never heard of this. Could you please elaborate? How can I measure time uncertainty???
In section 2.5, access IDs and principle IDs are mentioned. What’s that?
Although the standard does not position itself with log management, it talks about alarms and actions. Why would you need to mention actions in this context at all?
A pointer to the original log entry? How do you do that? Log rotation, archiving, leave alone the mere problem of how to reference the original entry to start with.
Why does the standard require the length of a record to be communicated? Just drop that.
The caller has to provide an event_number. I like it. But sorry folks, syslog does not have it. How do you get that in there?
Originator identity: It specifies that this should be the UNIX id. ID of what? The process that logs? The user that initiated the action? The remote user that sent the packet to this machine to trigger some action? How do you know that?
I like the list of XDAS events. It’s a good start, but it’s definitely not all we need. We need much more! Again a nice list to start with.
Why is there so much information encoded in the outcome, instead of defining individual entries? There might be a valid reason, but please motivate these decisions.
That’s what I have for a quick review. Again, no need for us to stop working on CEE. There is still a need for a decent interoperability standard.
[tags]interoperability, log, event exchange, common exchange format, common event expression, log management[/tags]