November 27, 2022

*NIX Command Line Foo

Category: Uncategorized,UNIX Scripting — Raffael Marty @ 11:28 am

Well, not one of my normal blog posts, but I hope some of you geeks out there will find this useful anyways. I will definitely use this post as a reference frequently.

I have been using various flavors of UNIX and their command lines from ksh to bash and zsh for over 25 years and there is always something new to learn to make me faster at the jobs I am doing. One tool that I keep using (despite my growing command of Excel), is VIM coupled with UNIX command line tools. It saves me hours and hours of work all the time.

Well, here are some new things I learned and want to remember from the Well, here are some new things I learned and want to remember from the art of command line github repo:

  • CTRL-W on the command line deletes the last word
  • pgrep to search for processes rather than doing the longer version with awk
  • lsof -iTCP -sTCP:LISTEN -P -n processes listening on TCP ports
  • Diff two json files: diff <(jq --sort-keys . < file1.json) <(jq --sort-keys . < file2.json) | colordiff | less -R
  • I totally forgot about csvkitbrew install csvkit
    • in2csv file1.xls > file1.csv
    • csvstat data.csv
    • csvsql --query "select name from data where age > 30" data.csv > old.csv

I just found some additional command son OSX that I wish I had known earlier:

  • ditto copies one or more source files or directories to a destination directory. If the destination directory does not exist it will be created before the first source is copied. If the destination directory already exists then the source directories are merged with the previous contents of the destination.
  • pbcopy past data from command line into the clipboard
  • qlmanage quick view from the command line

This is a great repo as well for great OSX commands.

April 14, 2015

Rockstars Use a Good Text Editor – I Use VIM

Category: Uncategorized — Raffael Marty @ 9:13 am

Those of you who know me most likely know that I am quite the VIM fan. At any time, there is at least one VIM window open on my computer. I just like the speed of editing and the flexibility it offers. I even use VI bindings in my UNIX shells (set -o vi). And yes, I did write my book in VIM.

Anyways, here is a command from my .vimrc file that I use a lot:

command F set guifont=Monaco:h13

Basically, if I type “:F”, it makes my font larger. I know, not earth shattering, but really useful.

Here are a couple esthetic things I like to make my VIM look nice:

set background=dark
colorscheme solarized
set guioptions=-m

This is my complete .vimrc file.

February 16, 2015

Big Data Lake – Leveraging Big Data Technologies To Build a Common Data Repository For Security

Category: Uncategorized — Raffael Marty @ 11:50 am

Information security has been dealing with terabytes of data for over a decade; almost two. Companies of all sizes are realizing the benefit of having more data available to not only conduct forensic investigations, but also pro-actively find anomalies and stop adversaries before they cause any harm.

UPDATE: Download the paper here

I am finalizing a paper on the topic of the security big data lake. I should have the full paper available soon. As a teaser, here are the first two sections:

What Is a Data Lake?

The term data lake comes from the big data community and starts appearing in the security field more often. A data lake (or a data hub) is a central location where all security data is collected and stored. Sounds like log management or security information and event management (SIEM)? Sure. Very similar. In line with the Hadoop big data movement, one of the objectives is to run the data lake on commodity hardware and storage that is cheaper than special purpose storage arrays, SANs, etc. Furthermore, the lake should be accessible by third-party tools, processes, workflows, and teams across the organization that need the data. Log management tools do not make it easy to access the data through standard interfaces (APIs). They also do not provide a way to run arbitrary analytics code against the data.

Just because we mentioned SIEM and data lakes in the same sentence above does not mean that a data lake is a replacement for a SIEM. The concept of a data lake merely covers the storage and maybe some of the processing of data. SIEMs are so much more.

Why Implementing a Data Lake?

Security data is often found stored in multiple copies across a company. Every security product collects and stores its own copy of the data. For example, tools working with network traffic (e.g., IDS/IPS, DLP, forensic tools) monitor, process, and store their own copies of the traffic. Behavioral monitoring, network anomaly detection, user scoring, correlation engines, etc. all need a copy of the data to function. Every security solution is more or less collecting and storing the same data over and over again, resulting in multiple data copies.

The data lake tries to rid of this duplication by collecting the data once and making it available to all the tools and products that need it. This is much simpler said than done. The core of this document is to discuss the issues and approaches around the lake.

To summarize, the four goals of the data lake are:

  • One way (process) to collect all data
  • Process, clean, enrich the data in one location
  • Store data once
  • Have a standard interface to access the data

One of the main challenges with this approach is how to make all the security products leverage the data lake instead of collecting and processing their own data. Mostly this means that products have to be rebuilt by the vendors to do so.

Have you implemented something like this? Email me or put a comment on the blog. I’d love to hear your experience. And stay tuned for the full paper!

January 19, 2014

A New and Updated Field Dictionary for Logging Standards

Category: Uncategorized — Raffael Marty @ 2:51 pm

If you have been interested and been following event interchange formats or logging standards, you know of CEF and CEE. Problem is that we lost funding for CEE, which doesn’t mean that CEE is dead! In fact, I updated the field dictionary to accommodate some more use-cases and data sources. The one currently published by CEE is horrible. Don’t use it. Use my new version!

Whether you are using CEE or any other logging standard for your message formatting, you will need a naming schema; a set of field names. In CEE we call that a field dictionary.

The problem with the currently published field dictionary of CEE is that it’s inconsistent, has duplicate field names, and is missing a bunch of field names that you commonly need. I updated and cleaned up the dictionary (see below or download it here.) Please email me with any feedback / updates / additions! This is by no means complete, but it’s a good next iteration to keep improving on! If you know and use CEF, you can use this new dictionary with it. The problem with CEF is that it has to use ArcSight’s very limited field schema. And you have to overload a bunch of fields. So, try using this schema instead!

I was emailing with my friend Jose Nazario the other day and realized that we never really published anything decent on the event taxonomy either. That’s going to be my next task to gather whatever I can find in notes and such to put together an updated version of the taxonomy with my latest thinking; which has emerged quite a bit in the last 12 years that I have been building event taxonomies (starting with the ArcSight categorization schema, Splunk’s Common Information Model, and then designing the CEE taxonomy). Stay tuned for that.

For reference purposes. Here are some spin-offs from CEE which have field dictionaries as well:

Here is the new field dictionary:

Object Field Type Description
action STRING Action taken
bytes_received NUMBER Bytes received
bytes_sent NUMBER Bytes sent
category STRING Log source assigned category of message
cmd STRING Command
duration NUMBER Duration in seconds
host STRING Hostname of the event source
in_interface STRING Inbound interface
ip_proto NUMBER IP protocol field value (8=UDP, …)
msg STRING The event message
msgid STRING The event message identifier
out_interface STRING Outbound interface
packets_received NUMBER Number of packets received
packets_sent NUMBER Number of packets sent
reason STRING Reason for action taken or activity observed
rule_number STRING Number of rule – firewalls, for example
subsys STRING Application subsystem responsible for generating the event
tcp_flags STRING TCP flags
tid NUMBER Numeric thread ID associated with the process generating the event
time DATETIME Event Start Time
time_logged DATETIME Time log record was logged
time_received DATETIME Time log record was received
vend STRING Vendor of the event source application
app name STRING Name of the application that generated the event
app session_id STRING Session identifier from application
app vend STRING Application vendor
app ver STRING Application version
dst country STRING Country name of the destination
dst host STRING Network destination hostname
dst ipv4 IPv4 Network destination IPv4 address
dst ipv6 IPv6 Network destination IPv6 address
dst nat_ipv4 IPv4 NAT IPv4 address of destination
dst nat_ipv6 IPv6 NAT IPv6 destination address
dst nat_port NUMBER NAT port number for destination
dst port NUMBER Network destination port
dst zone STRING Zone name for destination – examples: Bldg1, Europe
file line NUMBER File line number
file md5 STRING File MD5 Hash
file mode STRING File mode flags
file name STRING File name
file path STRING File system path
file perm STRING File permissions
file size NUMBER File size in bytes
http content_type STRING MIME content type within HTTP
http method STRING HTTP method – GET | POST | HEAD | …
http query_string STRING HTTP query string
http request STRING HTTP request URL
http request_protocol STRING HTTP protocol used
http status NUMBER Return code in HTTP response
palo_alto actionflags STRING Palo Alto Networks Firewall Specific Field
palo_alto config_version STRING Palo Alto Networks Firewall Specific Field
palo_alto cpadding STRING Palo Alto Networks Firewall Specific Field
palo_alto domain STRING Palo Alto Networks Firewall Specific Field
palo_alto log_type STRING Palo Alto Networks Firewall Specific Field
palo_alto padding STRING Palo Alto Networks Firewall Specific Field
palo_alto seqno STRING Palo Alto Networks Firewall Specific Field
palo_alto serial_number STRING Palo Alto Networks Firewall Specific Field
palo_alto threat_content_type STRING Palo Alto Networks Firewall Specific Field
palo_alto virtual_system STRING Palo Alto Networks Firewall Specific Field
proc id STRING Process ID (pid)
proc name STRING Process name
proc tid NUMBER Thread identifier of the process
src country STRING Country name of the source
src host STRING Network source hostname
src ipv4 IPv4 Network source IPv4 address
src ipv6 IPv6 Network source IPv6 address
src nat_ipv4 IPv4 NAT IPv4 address of source
src nat_ipv6 IPv6 NAT IPVv6 address
src nat_port NUMBER NAT port number for source
src port NUMBER Network source port
src zone STRING Zone name for source – examples: Bldg1, Europe
syslog fac NUMBER Syslog facility value
syslog pri NUMBER Syslog priority value
syslog pri STRING Event priority (ERROR|WARN|DEBUG|CRIT)
syslog sev NUMBER Event severity
syslog tag STRING Syslog Tag value
syslog ver NUMBER Syslog Protocol version (0=legacy/RFC3164; 1=RFC5424)
user auid STRING Source User login authentication ID (login id)
user domain STRING User account domain (NT Domain)
user eid STRING Source user effective ID (euid)
user gid STRING Group ID (gid)
user group STRING Group name
user id STRING User account ID (uid)
user name STRING User account name
November 11, 2010

Applied Security Visualization – Book Video

Category: Uncategorized — Raffael Marty @ 3:57 pm

It’s been a while since I wrote “Applied Security Visualization“. Here is an older video that I just came about. A good overview of the book. Enjoy!

September 4, 2010

Logging Formats and Standards

Category: Uncategorized — Raffael Marty @ 11:24 am

cee working group I have discussed the topic of logging standards multiple times on this blog. Some recent developments in the logging space urged me to give an update and provide my opinion:

Yet another vendor just released a “standard” log format (note the quotes around standard). It’s called UCF, the Universal Collection Framework™ (UCF). This is how the vendor describes it:

UCF is the first WAN-aware, store-and-forward, encrypted, compressed IT data transport. It allows customers to gather IT data, increase resilience, reduce network chatter and encrypt from almost any device, anywhere, quickly and easily. UCF leverages a new transport and store protocol that LogLogic intends to open source in the near future.

Sounds a whole lot like syslog. (syslog-ng and rsyslog seem to support exactly this!) Okay, let’s just look at this description: WAN aware? What the heck is that supposed to mean? You mean it won’t work well on a LAN? Does that mean it knows the Internets? That’s just a strange description to start with. Oh, and it’s the first property mentioned! The rest of the description sounds like a transport protocol. Interesting. Why not stick with syslog that is well known, has proven to work, and has integration libraries built already. I never understood why vendors implemented their own transport protocols. They are hard (very hard) to implement and even harder for producers and consumers to adopt to. Oh well.

When people talk about UCF, they keep bringing up ArcSight’s CEF. Well, I am greatly responsible for that specification. But guess what? It’s not a transport protocol! It’s a syntax definition. It tells a log producer how to format their log file. Not how to transport it. Because, there is always syslog that a lot of machines have installed already and it’s easy to use. (And in newer versions you get encryption, caching, etc.).

Now, my last point about standards. Why do vendors keep trying to come up with standards by themselves? It just doesn’t make any sense. How is going to adapt it? At ArcSight, about 4 years ago, we came up with CEF because CEE didn’t move fast enough and we wanted something that our partners could easily use. An analyst wrote that ArcSight is planning to take CEF to the IETF. I hope they are not going to do that. I don’t have any control over that anymore, but that would be stupid. We rather push CEE through IETF. If you have a chance, compare the CEE syntax proposal with CEF. Notice something? Yes. It’s very similar. Again, I might have had something to do with that. Anyways. Vendors should not define logging standards!

On a good note: CEE is moving forward and just released the architecture overview for public commentary. Check them out!

June 28, 2010

All the Data That’s Fit to Visualize

Last week I posted the introductionary video for a talk that I gave at Source Boston in 2008. I just found the entire video of that talk. Enjoy:

Talk by Raffael Marty:

With the ever-growing amount of data collected in IT environments, we need new methods and tools to deal with them. Event and Log Analysis is becoming one of the main tools for analysts to investigate and comprehend the state of their networks, hosts, applications, and business processes. Recent developments, such as regulatory compliance and an increased focus on insider threat have increased the demand for analytical tools to help in the process. Visualization is offering a new, more effective, and simpler approach to data analysis. To date, security visualization, has mostly failed to deliver effective tools and methods. This presentation will show what the New York Times has to teach us about effective visualizations. Visualization for the masses and not visualization for the experts. Insider Threat, Governance, Risk, and Compliance (GRC), and Perimeter Threat all require effective visualization methods and they are right in front of us – in the newspaper.

December 1, 2009

Applied Security Visualization Book seen in Singapore

Category: Uncategorized — Raffael Marty @ 5:50 pm

A friend just sent me couple of pictures he took in a bookstore in Singapore.



Have you seen the book Applied Security Visualization on the shelf at your local book store? If so, send me a picture and I will post it…

February 17, 2009

Security Visualization and Log Analysis Workshop – Sign up now!

Category: Uncategorized — Raffael Marty @ 10:32 pm

Log Analysis and Security Visualization” is a two-day training class held on March 9th and 10th 2009 in Boston during the SOURCE Boston conference that addresses the data management and analysis challenges of today’s IT environments.
Applied Security VisualizationStudents will leave this class with the knowledge to visualize and manage their own IT data. They will learn the basics of log analysis, learn about common data sources, get an overview of visualization techniques, and learn how to generate visual representations of IT data for a number of different use-cases from DoS and worm detection to compliance reporting. The training is filled with hands-on exercises utilizing DAVIX, the open-source data analysis and visualization platform.

Register today to secure your spot.

January 2, 2009

links for 2009-01-02

Category: Uncategorized — Raffael Marty @ 6:03 pm