If you have been interested and been following event interchange formats or logging standards, you know of CEF and CEE. Problem is that we lost funding for CEE, which doesn’t mean that CEE is dead! In fact, I updated the field dictionary to accommodate some more use-cases and data sources. The one currently published by CEE is horrible. Don’t use it. Use my new version!
Whether you are using CEE or any other logging standard for your message formatting, you will need a naming schema; a set of field names. In CEE we call that a field dictionary.
The problem with the currently published field dictionary of CEE is that it’s inconsistent, has duplicate field names, and is missing a bunch of field names that you commonly need. I updated and cleaned up the dictionary (see below or download it here.) Please email me with any feedback / updates / additions! This is by no means complete, but it’s a good next iteration to keep improving on! If you know and use CEF, you can use this new dictionary with it. The problem with CEF is that it has to use ArcSight’s very limited field schema. And you have to overload a bunch of fields. So, try using this schema instead!
I was emailing with my friend Jose Nazario the other day and realized that we never really published anything decent on the event taxonomy either. That’s going to be my next task to gather whatever I can find in notes and such to put together an updated version of the taxonomy with my latest thinking; which has emerged quite a bit in the last 12 years that I have been building event taxonomies (starting with the ArcSight categorization schema, Splunk’s Common Information Model, and then designing the CEE taxonomy). Stay tuned for that.
For reference purposes. Here are some spin-offs from CEE which have field dictionaries as well:
- Project Lumberjack which has some field names.
- SyslogNG PatternDB has a bunch of patterns and they also have a Schema.
Here is the new field dictionary:
Object | Field | Type | Description |
action | STRING | Action taken | |
bytes_received | NUMBER | Bytes received | |
bytes_sent | NUMBER | Bytes sent | |
category | STRING | Log source assigned category of message | |
cmd | STRING | Command | |
duration | NUMBER | Duration in seconds | |
host | STRING | Hostname of the event source | |
in_interface | STRING | Inbound interface | |
ip_proto | NUMBER | IP protocol field value (8=UDP, …) | |
msg | STRING | The event message | |
msgid | STRING | The event message identifier | |
out_interface | STRING | Outbound interface | |
packets_received | NUMBER | Number of packets received | |
packets_sent | NUMBER | Number of packets sent | |
reason | STRING | Reason for action taken or activity observed | |
rule_number | STRING | Number of rule – firewalls, for example | |
subsys | STRING | Application subsystem responsible for generating the event | |
tcp_flags | STRING | TCP flags | |
tid | NUMBER | Numeric thread ID associated with the process generating the event | |
time | DATETIME | Event Start Time | |
time_logged | DATETIME | Time log record was logged | |
time_received | DATETIME | Time log record was received | |
vend | STRING | Vendor of the event source application | |
app | name | STRING | Name of the application that generated the event |
app | session_id | STRING | Session identifier from application |
app | vend | STRING | Application vendor |
app | ver | STRING | Application version |
dst | country | STRING | Country name of the destination |
dst | host | STRING | Network destination hostname |
dst | ipv4 | IPv4 | Network destination IPv4 address |
dst | ipv6 | IPv6 | Network destination IPv6 address |
dst | nat_ipv4 | IPv4 | NAT IPv4 address of destination |
dst | nat_ipv6 | IPv6 | NAT IPv6 destination address |
dst | nat_port | NUMBER | NAT port number for destination |
dst | port | NUMBER | Network destination port |
dst | zone | STRING | Zone name for destination – examples: Bldg1, Europe |
file | line | NUMBER | File line number |
file | md5 | STRING | File MD5 Hash |
file | mode | STRING | File mode flags |
file | name | STRING | File name |
file | path | STRING | File system path |
file | perm | STRING | File permissions |
file | size | NUMBER | File size in bytes |
http | content_type | STRING | MIME content type within HTTP |
http | method | STRING | HTTP method – GET | POST | HEAD | … |
http | query_string | STRING | HTTP query string |
http | request | STRING | HTTP request URL |
http | request_protocol | STRING | HTTP protocol used |
http | status | NUMBER | Return code in HTTP response |
palo_alto | actionflags | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | config_version | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | cpadding | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | domain | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | log_type | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | padding | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | seqno | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | serial_number | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | threat_content_type | STRING | Palo Alto Networks Firewall Specific Field |
palo_alto | virtual_system | STRING | Palo Alto Networks Firewall Specific Field |
proc | id | STRING | Process ID (pid) |
proc | name | STRING | Process name |
proc | tid | NUMBER | Thread identifier of the process |
src | country | STRING | Country name of the source |
src | host | STRING | Network source hostname |
src | ipv4 | IPv4 | Network source IPv4 address |
src | ipv6 | IPv6 | Network source IPv6 address |
src | nat_ipv4 | IPv4 | NAT IPv4 address of source |
src | nat_ipv6 | IPv6 | NAT IPVv6 address |
src | nat_port | NUMBER | NAT port number for source |
src | port | NUMBER | Network source port |
src | zone | STRING | Zone name for source – examples: Bldg1, Europe |
syslog | fac | NUMBER | Syslog facility value |
syslog | pri | NUMBER | Syslog priority value |
syslog | pri | STRING | Event priority (ERROR|WARN|DEBUG|CRIT) |
syslog | sev | NUMBER | Event severity |
syslog | tag | STRING | Syslog Tag value |
syslog | ver | NUMBER | Syslog Protocol version (0=legacy/RFC3164; 1=RFC5424) |
user | auid | STRING | Source User login authentication ID (login id) |
user | domain | STRING | User account domain (NT Domain) |
user | eid | STRING | Source user effective ID (euid) |
user | gid | STRING | Group ID (gid) |
user | group | STRING | Group name |
user | id | STRING | User account ID (uid) |
user | name | STRING | User account name |
Instead of trying to fit many use cases into a complex dictionary: Why not define a really small set of required fields like a source, a message/description and a timestamp format and then letting people add anything they want as totally free fields?
What is the big benefit of a big dictionary?
Comment by Lennart Koopmann — January 20, 2014 @ 1:03 pm
The purpose of the dictionary is parsing and field semantics. If you let them chose things, you don’t know what the meaning of a fields is. And interoperability. If I chose “src_ip” and you chose “source_ip” and someone else chooses “ip_source_address”. Then we have to first normalize and make sure the semantics of all the fields is the same, etc.
In short: Interoperability!
Comment by Raffael Marty — January 20, 2014 @ 1:09 pm
+1 for interoperability , but is that actually required out there? (I really don’t know) Are there tools that are so bound to static field names?
Requiring field names from a dictionary sounds like either missing configuration options or a too static user interface.
Of course there is nothing wrong with having something like CEE in place – I am just afraid of making static UIs even more static by making them [protocol/dictionary]-compliant.
Comment by Lennart Koopmann — January 20, 2014 @ 1:15 pm