I haven’t written about UNIX scripting in a while. It was yesterday in the afternoon that our QA guy came over and asked me some questions about VI. Among his problems was the “parsing of an XML” file. He wanted to extract elements from specific branches of an XML structure. I told him that VI was not XML aware. It treats XMLs just like any other text file; line by line. He was not happy with my answer and kept bugging me. Then he said: “You should write a tool called XMLgrep”. And that was it. I was pretty sure that someone had written a tool that would do exactly that.
After 30 seconds on google, I found it: XMLStarlet. It took me about 30 minutes to get the hang of the tool, but it is really cool. It takes XPATH queries as an input. My knowledge of XPATH goes back to my thesis and is a bit rusty, but I finally got it right. Here is an example of how to apply an XPATH query to an XML file:
xmlstarlet sel -t -c "/archive/ActiveList[@name='Public Webmail']/description" JSOX_ActiveLists.xml
another one:
xmlstarlet sel -t -m "/archive/ActiveList" -v "concat (@name,'
')" JSOX_ActiveLists.xm
Yes, there is a newline in this command. However, it didn’t really work for me. What I wanted to do is separating the different outputs with a newline, but for some reason this didn’t work. I tried all kinds of things, but no luck. Oh well.
Here is another link that might be useful. It’s a nice tutorial on XMLStarlet.
Bob Blakley from the burton group wrote a blog entry about event interoperability standards. This clearly shows that interoperability is a hot topic. However, it also shows that we (CEE) still have to do a lot of work educating the community ;)I want to correct some of Bob’s statements about CEF and provide some more information and thoughts:
“CEF defines only a record format”. Well, that’s absolutely right and very very intended. You do not HAVE to define anything else. The transport for example is something that should not depend on the syntax and vice versa. I keep haing to make that point. The ArcSight CEF standard is not bound to any transport. Use anything. If you don’t have anything better, use syslog. It is very very very easy to implement. You just marshall a packet, send it to port 514 and done. Yes, it’s not reliable and all, but it’s a very simple and quick start. If you want, use something more complicated and with more capabilities. CEE will be doing exactly the same thing. We’ll break the standard up into four subtypes, separating syntax from transport, recomendation, and taxonomy.
 “it doesn’t define service interfaces to allow event producers to notify event consumers that an event has been created and is ready to be processed”. Wow, this is interesting. Why would you not just send the event? Why going asynchronous? People, get away from the notion of pulling events!
“it does not contain any mechanism for dealing with clock synchronization issues in distributed environments”. Since when should an interoperability standard take care of synchronizing clocks? Use NTP. I am just assuming that! The standard should not have to talk about that!
“[…] CEF leaves the definition of event types (which are called “Signature IDs†[…] ) up to the individual event producers, thus inviting both ID conflict issues and proliferation of different names for events of the same type in different systems.” Half of this is definitely wrong. The other half is again a separation issue. CEF is a syntax standard. Not a taxonomy! Furthermore, you use a combination of deviceVendor, deviceProduct, deviceVersion, and SignatureID for the unique ID. Hence, no overlapping IDs. I know where this is going. Have a look at CPE. Darn, that thing is complex. However, compeltely unnecessary in this case. Let people define their own IDs. They have them already anyways (except for most syslog entries, but there you just make an ID up). I know what I talk about. I have been doing all of this for the last 4 years! What is really missing in the critique is (and yes, I will admit that there are wholes in CEF) that the granularity of defining the signature IDs is not defined. For example, do you use the same ID for all logins? Failed and successful? The answer here is no. I need different ones, but that’s something CEF does not define. Be assured, CEE will!
I also disagree with Bob that multiple standards should be pursued and supported. I will definitely push CEE harder than CEF. It’s open, it’s a community effort, it’s Mitre led, and it’s going to be a more comprehensive approach. We are keeping NIST and all the other interested parties involved. No need for NIST to go out and create yet another standard. There are so many other standards out there also and just because they exist does not mean they are any good. For example XDASÂ is not what I want to see standardized! Why? See my review of XDAS.
Today I was booking my airline ticket to Kualalumpur, Malaysia for my trip to Hack in the Box in September. I called the sales lady for the airline and talk to her about my flight dates and all that. In the end she asks me for my credit card information. Number, expiration date, and then the CVV number on the back of my card (the security code, as it is called sometimes too). I hesitate for a second, trying to remember what I just learned from the PCI auditors we had in house. I couldn’t really remember when a merchant needed that number, but after a second I realized that it would be okay to give it to her. It’s about the same as on a Web page, where you enter that information. They can use the CVV to run a authorization with the credit card company. Well, I thought that would be it. Wrong!
A couple of hours later I get a pretty ugly Excel spreadsheet back. I am asked to print it out, sign it, and fax it back to them. I had a look at the form and I wondered what was going on. Well, there was all my information in this spreadsheet, including CVV number! They even “encrypted” my credit card number in the spreadsheet. I am just kidding. It was all in plain text. The only funny thing was that the credit card number field was not formatted as a string, but a number, so it looked like it was encrypted. *grins*. But back to serious. I was quite upset. All my information in this document. I have to assume that this excel document is on the sales person’s desktop, along with probably dozens of others. Hmmm… Maybe I should send an email with a link that points to a site that contains a … Let’s not even go there.
The next thing I did was digging up the PCI standard. And here it was, section 3.2.2: 3.2.2 Do not store the card-validation code (Three-digit or four-digit value printed on the front or back of a payment card (e.g., CVV2 and CVC2 data))
A clear violation! And you know, this is pretty much the first thing you should address; the way of authorizing credit card transactions. Just plain wrong! Darn!
I wrote them an email asking for a contact in their security department. So far, no luck, just the sales person telling me that she needs all that information to complete the transaction. Whatever. Either she needs my signature, but then no CVV, or the CVV and no signature. But not both! I wonder how this is going to continue.
<disclaimer>This post is not 100% serious</disclaimer>
The mere fact that I have to put a disclaimer here is sort of funny. I guess I don’t want to discuss a topic and then people come back calling me names 😉
At the FIRST conference last month in Spain, I was talking to Ben Chai for a while and he was recording the talk, as well as summarized some of the discussion in a blog entry. We talked about log analysis of huge amounts of data. I guess I came up with the idea of cubing logs to approach the problem, which uses log visualization of subsets to help the analyst.
I am sitting in Seville, at the First conference, where I will be teaching a workshop on Wednesday. The topic is going to be insider threat visualization. While sitting in some of the sessions here, I was playing with my iptables logs. Here is a command I am running to generate a graph from my iptables logs: cat /var/log/messages | grep "BLOCK" | perl -pe 's/.*\] (.*) (IN=[^ ]*).*(OUT=[^ ]*).*SRC=([^ ]* ).*DST=([^ ]* ).*PROTO=([^ ]* ).*?(SPT=[^ ]+ )?.*?(DPT=\d* )?.*?(UID=\d+)?.*?/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | awk -F, '{printf "%s,%s,%s\n",$4,$5,$8}' | sed -e 's/DPT=//' -e 's/ //g' | afterglow.pl -e 1.3 -c iptables.properties -p 1 | neato -Tgif -o/tmp/test.gif; xloadimage /tmp/test.gif
This is fairly ugly. I am not going to clean it up. You get the idea. Now, what does the iptables.properties look? Well, if you thought the command above was ugly. The properties file is completely insane, but it shows you the power of using the “variable” assignment: variable=$ip=`ifconfig eth1 | grep inet`
variable=$ip=~s/.*?:(\d+\.\d+\.\d+\.\d+).*\n?/\1/;
variable=$subnet=$ip; $subnet=~s/(.*)\.\d+/\1/;
variable=$bcast=$subnet.".255"; color="invisible" if (field() eq "$bcast")
color="invisible" if ($fields[2] eq "67")
color="yellow" if (field() eq "$ip")
color.source="greenyellow" if ($fields[0]=~/^192\.168\..*/);
color.source="greenyellow" if ($fields[0]=~/^10\..*/);
color.source="greenyellow" if ($fields[0]=~/^172\.16\..*/);
color.source="red"
color.event="greenyellow" if ($fields[1]=~/^192\.168\..*/)
color.event="greenyellow" if ($fields[1]=~/^10\..*/)
color.event="greenyellow" if ($fields[1]=~/^172\.16\..*/)
color.event="red"
cluster.target=">1024" if (($fields[2]>1024) && ($fields[1] ne "$ip"))
cluster.source="External" if (field()!~/^$subnet/)
cluster.event="External" if (field()!~/^$subnet/)
color.target="blue" if ($fields[1] eq "$ip")
color.target="lightblue"
Mind you, this was run on an Ubuntu Linux. You might have to change some of the commands and parsing of the output. Pretty neat, ey? Here is a sample output that this generated. My box is yellow. The question that I was trying to answer: Why is someone trying to conect to me on port 80?
Finally! I worked on this AfterGlow release forever. I submitted a few checkpoints to CVS before I felt read to released AfterGlow 1.5.8. I highly recommend upgrading to 1.5.8. It has a few bugfixes, but what you will find most rewarding is the new color assignment heuristic and the capability to change the node sizes. Here is the complete changelog:
06/10/07 Version 1.5.8
- Nodes can have a size now:
(size.[source|target|event]=<expression returning size>)
Size is accumulative! So if a node shows up multiple times, the
values are summed up!! This is unlike any other property in
AfterGlow where values are replaced.
- The maximum node size can be defined as well, either with a property:
(maxnodesize=<value>)
or via the command line:
-m=<value>
The size is scaled to a max of 'maxsize'. Note that if you
are only setting the maxsize and no special sizes for nodes
Afterglow will blow the nodes up to optimal size so the labels
will fit.
There is a limit also, if you want the source nodes to be a max of say
1, you cannot have the target nodes be scaled to fit the labels. They
will have a max size of 1 and if you don't use any expression, they will
be of size 1. This can be a bit annoying ;)
Be cautious with sizes. The number you provide in the assignment is not the actual size
that the node will get, but this number will get scaled!
- One of the problems with assignments is that they might get overwritten with later nodes
For example, you have these entries:
A,B
A,C
and your properties are:
color="blue" if ($fileds[1] eq "B")
color="red"
you would really expect the color for A to be blue as you specified that explicitly.
However, as the other entry comes later, the color will end up being red. AfterGlow takes
care of this. It will determine that the second color assignment is a catch-all, identified
by the fact that there is no "if" statement. If this happens, it will re-use the more specific
condition specified earlier. I hope I am making sense and the code really does what you would
expect ;)
- Define whether AfterGlow should sum node sizes or not.
(sum.[source|target|event]=[0|1];)
by default summarization is enabled.
- Added capability to define thresholds per node type in properties file
(threshold.[source|event|target]=<value>;)
- Added capability to change the node shape:
shape.[source|event|target]=
(box|polygon|circle|ellipse|invtriangle|octagon|pentagon|diamond|point|triangle|plaintext)
- Fixed an issue where, if you use -t to only process two columns
and you can use the third in the property file for size or color.
The third column was not carried through, however. This is fixed!
- The color assignment heuristic changed a bit. Along the same lines that the size assignment works.
Catch-alls are not taking presedence anymore. You might want to take this into account when defining
colors. The catch-all will only be used, if there was never a more specific color assignment that
was evaluated for this node. For example:
color="gray50" if ($fields[2] !~ /(CON|FIN|CLO)/)
color="white"
This is used with a three-column dataset, but only two are displayed (-t). If the first condition
ever evaluated to true for a node, the last one will not hit, although the data might have a node that
evaluates to false in the first assignment and then the latter one would grip. As a catch-all it does
get superior treatment. This is really what you would intuitively assume.
- Just another note on color. Watch out, if you are definig colors not based on the fields in the
data, but some other conditions that might change per record, you will get the wrong results as
AfterGlow uses a cache for colorswhich keys off the concatenation of all the field values. Just
a note! Anyone having problems with this? I might have to change the heuristic for caching then. Let
me know.
This is a pretty random blog entry, but oh well… I am sitting in the London airport. In the lounge here, they have a computer that is connected to the Internet. I sat down, opened a browser, typed in my webmail domain and paused for a second. Then I opened a command shell and checked for open ports, processes running, and all that. Well, I still felt like I couldn’t enter my password. What if a keylogger was running?
Then I had an idea. I opened a notepad and just entered some random characters. Then I started, using the mouse, to rearrange the letters into my username and password. A key logger is not able to capture my password like this. I _think_ I successfully circumvented these beasts.
I know, there are other trojans, such as transaction generators that could get in my way, but …
Not too long ago, I posted an entry about
CEE, the Common Event Expression standard which is a work in progress, lead by Mitre. I was one of the founding members of the working group and I have been in discussions with Mitre and other entities for a long time about common event formats. Anyways, one of the comments to my blog entries pointed to an effort called
Distributed Audit Service (XDAS). I have not heard of this effort before and was a bit worried that we started something new (CEE) where there was already a legitimate solution. That’s definitely not what I want to do. Well, I finally had time to read through the 100! page document. It’s not at all what CEE is after. Let me tell you why XDAS is not what we (you) want:
How many people are actually using this standard for their audit logs? Anyone? I have been working in the log management space for quite a while and not just on the vendor side, but also in academia. I have NEVER heard of it before. So why should I use this if nobody else is? In contrast, CEF from ArcSight is in use not just by ArcSight itself, but many of its partners.
I just mentioned it before. 100 pages! What’s the last time you read through 100 pages? I just did. Took me about an hour to read the document and I skipped a lot of the API definitions. My point being: A standard should be at most 10 pages! It’s not just the length of the document, it’s the complexity which comes with it. Nobody is going to read and adhere to this. The more you demand, the more mistakes are being made by vendors which implement this. Oh, and please don’t tell me to only read pages 1-10! Make it 10 pages if you want me to read only those.
How much time does it take to actually implement this? Has anyone done it? How long did it take you? I bet a couple of weeks, plus QA, etc. Much too long. I am NEVER going to make that investment.
Let’s get into details. Gosh. Why does this define APIs? Don’t dictate how I should do things. A standard needs to define the common interface, not how I have to open a stream and safe files and so on. It’s overkill. The implementations will differ and they should! And why lock yourself into this API transport. Can you support other transports?
It seems that there is an XDAS service that I need to integrate with. What is that? That’s not clear to me. Can I exchange logs (audit records) between just to parties or do I need an intermediary XDAS service? I am confused.
Keep the scope of the standard to what it wants to accomplish: event interchange! This thing talks about access control, RBAC, filtering, etc. Why? Please! That’s absolutely unnecessary and should not be part of an interchange standard!
In general, I am quite confused about the exact setting of this. Are we only talking about audit records? Security related only? What about other events? I want this to be very generic! Don’t give me a security specific solution! The world is opening up! We need generic solutions!
What kind of people wrote this? Using percent signs to escape entries and colons to separate them? Must be from the old AS/400 world … Sorry… I just had to say this, in a world of CSV and key-value pairs it is sort of funny to see these things.
The glossary could really benefit from a definition of event and log.
The standard requires a measure of uncertainty for timestamps. I have never heard of this. Could you please elaborate? How can I measure time uncertainty???
In section 2.5, access IDs and principle IDs are mentioned. What’s that?
Although the standard does not position itself with log management, it talks about alarms and actions. Why would you need to mention actions in this context at all?
A pointer to the original log entry? How do you do that? Log rotation, archiving, leave alone the mere problem of how to reference the original entry to start with.
Why does the standard require the length of a record to be communicated? Just drop that.
The caller has to provide an event_number. I like it. But sorry folks, syslog does not have it. How do you get that in there?
Originator identity: It specifies that this should be the UNIX id. ID of what? The process that logs? The user that initiated the action? The remote user that sent the packet to this machine to trigger some action? How do you know that?
I like the list of XDAS events. It’s a good start, but it’s definitely not all we need. We need much more! Again a nice list to start with.
Why is there so much information encoded in the outcome, instead of defining individual entries? There might be a valid reason, but please motivate these decisions.
That’s what I have for a quick review. Again, no need for us to stop working on CEE. There is still a need for a decent interoperability standard.
[tags]interoperability, log, event exchange, common exchange format, common event expression, log management[/tags]
My R journey seems to be over… It’s just too complicated to get charts and plots right with R. I found a new library that I staerted playing with: chart director. I am using Perl to drive the graph generation and it’s actually fairly easy to use. There are some cumbersome pieces, such as seg-faults when you have NULL values in a matrix that you are drawing, but that’s things I can deal with ;
Here is a sample graph I generated: (I know, the color selection is fairly horrible!)
I just finished reading the Information Dashboard Design book by Stephen Few. I love it. I am definitely going to use some of the concepts that I learned about for my own book. I especially like how Stephen illustrates Tufte’s principle of reducing the non-data ink in charts. Definitely a book people who are building dashboards should read!