Extracting XML data with curl and xmllint
(I start off talking about why I was doing this. You can jump to the tip if you want.)
I use HomeAssistant heavily at home for converging a range of smart home data from sensors around my house, as well as from a range of open data sources.
HomeAssistant has a REST sensor which is good for a known data source with a friendly format, but, because of the architecture of HomeAssistant, this sensor is not particularly flexible to use when the data is coming from a third-party in a format that isn’t readily usable. For more complex data consumption, a command line sensor may be appropriate. A command line sensor captures the output of running a command as the sensor value. The trick with this is to use curl
and some kind of data parsing routing to extract the data you actually want as your sensor value. That’s what I’m going over.
I used two utilities for this - curl
, and xmllint
. curl
is preinstalled a lot these days, but if you don’t have it, it’s a simple brew install curl
/apt-get install curl
, etc. away. xmllint
is a little harder. It’s technically either part of, or strongly related to libxml2
(it seems to be a reference or demo of how to use libxml2?). I have it on Mac OS by having libxml2
installed using Homebrew, but on Ubuntu Server I had to apt-get install libxml2-utils
. YMMV, but I suspect that -utils
package is required in most Linux distributions. Once you do have it though, you can take advantage of Unix pipes to pluck out the XML you need using XPath.
Here’s an example that I’ll break down:
curl "http://api.opendata.tld/example.xml" |\
xmllint --xpath "//Measurements/Data/Reading[last()]/Value/text()" -
First of all, curl "http://api.opendata.tld/example.xml"
. This requests the given URI, streaming the response to the pipe. For the sake of the further examples, let’s assume it returns XML data in the following format:
<Measurements>
<Author>Data Publisher</Author>
<Data DateFormat="Calendar" NumItems="1">
<Reading>
<Time>2018-06-19T08:45:00</Time>
<Value>11.8</Value>
</Reading>
<Reading>
<Time>2018-06-19T09:00:00</Time>
<Value>11.8</Value>
</Reading>
<Reading>
<Time>2018-06-19T09:15:00</Time>
<Value>11.8</Value>
</Reading>
</Data>
</Measurements>
The second part of this command is the pipe to xmllint
. xmllint
is passed two key arguments. The first, --xpath
, contains an XPath expression for waht to select, and the trailing dash (-
) prompts xmllint
to accept XML from STDIN (the pipe from curl).
Given a valid XPath expression, the result of this command will be exactly what is required - a value to assign to the sensor. In this case, the XPath expression will be interpreted as:
- Start at the root
Measurements
node - Traverse down the tree into
Data
- Select the
last()
Reading
node - Traverse into the
Value
node - Select the
text()
(inner text) of the node
In other words, the complete command:
curl "http://api.opendata.tld/example.xml" |\
xmllint --xpath "//Measurements/Data/Reading[last()]/Value/text()" -
Will return…..11.8
. Exactly what we need!
We can now define our HomeAssistant sensor:
sensor:
- platform: command_line
command: "curl \"http://api.opendata.tld/example.xml\" | xmllint --xpath \"//Measurements/Data/Reading[last()]/Value/text()\" -"
unit_of_measurement: "C"