July 25, 2007

Parsing XML on the Command Line

Category: UNIX Scripting — Raffael Marty @ 11:24 am

I haven’t written about UNIX scripting in a while. It was yesterday in the afternoon that our QA guy came over and asked me some questions about VI. Among his problems was the “parsing of an XML” file. He wanted to extract elements from specific branches of an XML structure. I told him that VI was not XML aware. It treats XMLs just like any other text file; line by line. He was not happy with my answer and kept bugging me. Then he said: “You should write a tool called XMLgrep”. And that was it. I was pretty sure that someone had written a tool that would do exactly that.

After 30 seconds on google, I found it: XMLStarlet. It took me about 30 minutes to get the hang of the tool, but it is really cool. It takes XPATH queries as an input. My knowledge of XPATH goes back to my thesis and is a bit rusty, but I finally got it right. Here is an example of how to apply an XPATH query to an XML file:

xmlstarlet sel -t -c "/archive/ActiveList[@name='Public Webmail']/description" JSOX_ActiveLists.xml

another one:

xmlstarlet sel -t -m "/archive/ActiveList" -v "concat (@name,'
')" JSOX_ActiveLists.xm

Yes, there is a newline in this command. However, it didn’t really work for me. What I wanted to do is separating the different outputs with a newline, but for some reason this didn’t work. I tried all kinds of things, but no luck. Oh well.

Here is another link that might be useful. It’s a nice tutorial on XMLStarlet.

Technorati Tags: , , , ,

1 Comment »

  1. Hi! I just ‘stumbled’ upon your blog through Technorati.

    Like you, I’ve just discovered XMLStarlet and have found it very convenient and useful for writing scripts dealing with XML.

    Anyway, I just wanted to point out that XMLStarlet has a “-n” option that’ll print a newline character to the output. Not sure if it’s applicable for all instances where you need a newline, though.

    Comment by alistair — October 8, 2007 @ 7:21 pm

RSS feed for comments on this post. | TrackBack URI

Leave a comment

XHTML ( You can use these tags): <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> .