Indeni Training Module 3b: Advanced JSON XML Parsing

Indeni Knowledge Language Training

Module 3b: Advanced JSON XML Parsing

Indeni Knowledge Language Training Modules

Module 1: Introduction

Module 2: Interrogation & Monitoring

Module 3a: Basic JSON XML Parsing

Module 3b: Advanced JSON XML Parsing

Module 4a: Setting Up IKL Environment

Module 4b: Command Runner

Module 4c: Using SoureTree

Module 4d: Pull Request

Full Video Transcript:

MALE 1:

Okay, so let’s continue with the XML/JSON parser. If you remember, the goal is to parse hierarchical data, whether in XML format or in JSON format. The parser syntax is YAML-based. The actual operators and everything is our own invention. A bit of a learning curve, but once mastered, is very, very powerful, as Patrick, for example, would probably be able to tell, or Arya.

So, last time we went through some of the basics, like we said “Okay, this is the structure.” You have a metrics section, and it has a dash under it. We have tags. The tags need to include the ‘im.name’. We have operators like constant, which means a string, or value, which means use the JSON path or the XML path and get the value. ‘_value’ for JSON and ‘_text’ for XML.

What I wanted to go through today are more advanced things like the complex metrics, the transform section, and more advanced operators. Okay? Let’s start with the complex metric. So, if I want to describe a number, I’m going to use the complex metrics. So, for example, let’s say I want to describe CPU. Okay? So, the metric name is going to be ‘cpu-usage’. It’s going to have a ‘cpu-id’ tag like ‘dp0’, which is data plane zero on a Palo Alto. And then, the value itself would be something like 71 to represent 71% CPU utilization. Right?

So, a lot of data in the system is numeric data like this – what we call double. And, that data can be graphed over time. We can do averages. Minimum, maximum – things like that. But, what happens when we want to describe something more complicated? For example, we want to describe a static routing table. Okay, so this would be ‘static-routing-table’, and then the value itself would actually look like this: It’s an array, and in the array, we have multiple lines where we have ‘network’ like ‘10.1.2.0’. ‘mask’ like 24. And then, ‘next-hop’ would be something like ‘1.1.1.1’. Okay? This is how you would describe a static routing table. And, you have multiple of these.

So, this cannot be described in a normal double metric, so instead we have the complex metric. That is what this is. The complex metric’s value is JSON. Okay? It’s always JSON. It doesn’t matter if you’re generating the metric from the AWK parser, from the XML parser, or from the JSON parser. Always JSON. Okay?

So, to generate these guys in the JSON/… What?

MALE 2:

I have a question.

MALE 1:

Oh, yes.

MALE 2:

Is there a reason why there’s a double quotes on ‘static-routing-table’ ‘im.name’, and not ‘cpu-usage’?

MALE 1:

Me being lazy.

MALE 2:

Okay. Thanks

MALE 1:

No problem. So, we need to generate complex metrics. Now, if we kind of zoom in on the concept of complex metrics, there are two different types. This one is called an array. Okay? It’s an array because it has multiple items to it. The array is this. You see this – these brackets. And then inside it, we have multiple. Okay? If you write rules, these are called ‘MULTI-SNAPSHOTS’ in the rule base. Okay? The other one we can have is just a single value, what we also call ‘SINGLE-SNAPSHOT’ in the rule base. And, in this case, it would be something like ‘hostname’, value equals ‘myserver’. Okay? So, let’s say we had a metric for the hostname, which we actually do, and the name of the hostname is ‘myserver’. Okay? It’s a string. It’s not a number. Notice that this is just a string like this, whereas this is an array. Okay? The way it’s actually represented in the backend looks like this. Okay? Make sense? Okay, great.

Now, the question is, “How do we generate this in the XML parser”, for example. So, let’s go to the XML parser, and let’s look at… So, ignoring the ‘_transform’ section for a second – we’ll remove this, just for simplicity’s sake. Okay? This is going through the list of admins that we have connected to the device, and it generates a complex metric. Okay? So, just to show you the input and the output, the input of this thing is an XML that lists all the admins that are logged in. Let me quickly prettify this so it’s a bit easier to see. Okay, so you can see there’s two admins. There’s ‘indeni’. There’s ‘admin’. They came from two different IPs. They’re connected through two different types. The sessions start at two different date timestamps. It’s been idle for two different amounts of time. Okay, this is the XML structure we get from the Palo box.

The complex metric we want to return is an array listing all the users. Okay? So, this is what we want the metrics value to be. It’s an array. You see the square brackets? And, it has two entries. This is one. You can see it here. And, this is the other. It has three fields, okay? Or, if we use the JSON terminology, it’s three key-value sets here. This is a key. This is a value. So, ‘idle’, ‘0’; ‘from’, ‘10.10.1.1’; ‘username’, ‘indeni’ – similarly for the admin. That’s what we want. This is what we want to send back to the Indeni server, because then a rule can be written on this metric, looking for the idle time. Okay? So, the way to generate this… Actually, I’ll leave it open for a second.

Okay, so remember, this is the XML, and we’re looking to generate the JSON that I showed you. So, what we do here, is it’s still ‘_metrics’, like we always have. Still the dash like we always have for each metric. The ‘_groups’ function is still there just like we have with the double metrics, because we’re iterating over multiple XML entries. This is one and this is the other, right? So, the XPath goes all the way down into the entry like you can see here. Okay? Ignore the ‘_temp’ for a moment. We’ll get to that later on.

We’re generating a metric. The tag for ‘im.name’ is ‘logged-in-users’, because that’s the name of the metric we want. And then, we say it’s a complex metric. Remember that normally, previously we had something that looked like ‘_value.double’. Remember? We had this. Okay? So, instead, we have ‘_value.complex’, and then we have key-value pairs. ‘From’, and then the value of that, and ‘username’ and the value of that. Okay? Remember, ‘_text’ is an operator, so what it essentially means is under the entry, I’m going to look for an element called ‘from’, evaluate that, grab the text, and that will be the value of the key, ‘from’. Okay?

So, ‘username’, ‘admin’. Okay, remember, ‘admin’ is the name of the element, right? Admin is the name of the element, whereas ‘username’ is the name of the key that we want to actually have in the metric, because that’s what the rule’s going to be looking for. Okay? This last thing here says, “Oh, by the way, it’s a complex array.” Okay, it tells the parser, “I want to generate an array with multiple lines, each of them for each of the admins that are in the XML.” That’s how I generate a complex array.

Okay, so what’s different between here and the value and the double ones is that we use complex here instead of double, and then you have sets of keys and values, versus in double where you just have the value for the double. Okay? Any questions?

MALE 2:

So, this code here knows to iterate through each and every entry for admin?

MALE 1:

Mhmm.

MALE 2:

No matter how long that list is?

MALE 1:

Right.

MALE 2:

And outputs it into a structured format that the Indeni server can then read?

MALE 1:

Exactly. What we’re doing here… We’re iterating over the XML content – the entries we have there; converting it into a JSON content that the server can understand and then generate rules from. The JSON content we convert it to is consistent for all devices, right? Because it’s the target content that the rule can work with. The difference from device to device is how do we generate that JSON structure. If it’s from XML output, we’re going to use the XML parser. If it’s in JSON, we use the JSON parser. If it’s from just textual output, we’re going to use AWK.

Just to give you an example for comparison sake… jump into the cross vendor one, you can see we have it there somewhere. Let me look this up for a second. Yeah, this one. So, here’s a command from Linux where we’re trying to grab who’s logged in for how long. This is the AWK parser there. I won’t go into the parser itself, but notice that it’s trying to collect the same data and then eventually create the same structure. ‘Username’, ‘from’, the terminal thing we ignore, but then it also has the ‘idle’ at the bottom here. So, just the same. It’s trying to create the exact same structure as the other one is. The way to do it is different, because the output from the device is different, so the parsing is different. But, the end result has to be the same for the server to understand. Essentially, our job as ‘.ind’ script writers is to translate whatever output we’re getting from the device to something the server understands and is the standard structure. Okay?

So, this is the complex array… Let me undo the stuff I’m breaking here so I don’t actually commit anything. Okay, so we’ll look at an example for doing this for non-complex array. Let’s see if we have something here that we can use. In the case of the config… So, this happens a lot. With a lot of devices we work with, we’re going to have one big ‘.ind’ script that parses their main config file and generates a whole bunch of stuff. So, you’re going to see a lot of examples for complex metrics here. And, here is a complex metric that is the simpler version. It’s not an array. It’s just one value. And, what we’re doing here is we’re grabbing the time zone for the device .

So, we’re saying this is under the ‘_metrics’ section. It’s a metric. It’s value is a complex, and in that complex, the value key – it’s value is what’s inside the time zone. Okay? And then, we attach the regular ‘im.name’ tags to it. Again, ignore these. We talked about this last time that we need to ignore these for now, just for simplicity sake, but this is basically how we generate a complex metric in the parser.

Now, notice that in this case, it’s very, very similar between XML and JSON. So, this is XML – what we have here. The difference if we were in JSON is that instead of ‘_text’, it would be ‘_value’, and the structure of the path would be a bit different. So, it would look something like this, more or less. Something like this. More or less. I might have done something wrong here, but basically this. Okay? Make sense? And then the end result is ‘timezone’. How does the ‘timezone’ look like? If we jump to show running config output, we’re going to see the timezone here somewhere. Here: This is the timezone, and this is how the value looks like. Standard. Okay?

So, those are the complex metrics. Complex metrics are becoming more and more common. Initially, when we started, complex metrics were the less common metric we were generating. They’re starting to become more and more common because the rules are starting to get more and more intelligent. And, more and more rules are getting configuration ordinances, and not just threshold-oriented, so that’s why we’re seeing more and more. Okay?

But, generally, if you want examples, it’s very easy. You can just search through the code for ‘_value.complex’, and you can easily see examples of parsing these. Okay. Let me undo that. So, that’s one part.

Another part… I wanted to go through the operators before I go into the transform section. So, let’s quickly run through the operators. There are not a lot of them, but just so we’re familiar with all of them. There’s a top-level operator called ‘_vars’. I mentioned this in the last session. It looks like this. It basically lets you define a variable, and that variable is going to be some piece of string. Like, in this case, part of an XPath. It just makes it a bit easier, because then you can put ‘${root}’, instead of this. It just saves you a bit of time.

‘_tags’ – this will be at the top-level if you’re writing an interrogation parser. So, to give you an example of that, let’s find show system info interrogation – here. See these guys? These are tags. Remember this is an area point in the training, but devices have tags. So, to generate tags for a device in an interrogation script, we use this kind of format. It’s still very similar to everything else. Key, value; Key, value, etc. It’s just in an ‘_tags’ instead of an ‘_metrics’. Okay?

‘_metrics’ – we’ve talked about. Each metric must be on its own dash to generate the metrics. One thing that would be important to be aware of: If you try to generate a metric and the path doesn’t match anything, the metric will not be generated. Okay? So, to give you an example, let’s say we’re trying to generate this time zone metric. Okay? The parser is going to look for this path and then see that there’s a text in there. If this path does not exist – for example, the timezone element does not exist – the system will not generate a metric. It’s going to skip it. Okay? That’s the behavior. Same thing in the JSONPath. If you’re pointing in your path to something that does not exist, it will skip the metric. It catches people off guard initially, but then makes a lot of sense once you get used to it. Okay?

FEMALE 1:

Is there any indication that it has skipped over anything?

MALE 1:

No. It doesn’t. Yeah, it won’t produce anything. That’s what it’ll do. Okay.

So, that’s the ‘_metrics’ here. At the value-level, we talked about ‘_value.double’. We talked about ‘_tags’ and the fact that we need to assign ‘_tags’. We talked about ‘_value.complex’.

Now, within these, we’ve already seen ‘_value’. We talked about the ‘_value’ operator that grabs a value from the JSONPath. Right? It’s kind of like ‘_text’ in XML. ‘_constant’ is also something that we’ve already seen, like this, where we’re saying, “Just use this string as-is.”A nice little one is ‘_count’, where you can basically say, “Hey, I want to count how many entries there are. I don’t want to know the actual content. I just want to know how many.”

Now, over time, we keep adding more and more aggregator functions like this with the ability to do sum and things like that. But, it’s pretty powerful, because it lets you just kind of run certain calculations on the content and use that as the value, instead of taking the content all the way back to the server.

In the case of XML parser, operators are mostly the same. You know ‘_tags’, ‘_vars’ – all that stuff. The difference is at this level. So, it has ‘_count’, just like the other one. It has ‘_constant’, just like the other one. It has ‘_text’, which is like ‘_value’. But, then it also has these three: ‘_attribute’, ‘_sum’, and ‘_name’.

‘_attribute’ is a bit of a confusing thing with XML. Let me show you what that means. Imagine that this is my XML content. Class, student, name. John Doe. Okay? Imagine that this is the XML content. In the world of XML, you have attributes like an ID. Okay? This does not exist in JSON, only in XML. It’s a feature of XML. People abuse it sometimes. Most XML output we’ve seen coming from the devices do not use attributes, but some do, and sometimes they do it in a very weird way. We have an example with Juniper Junos where they use attributes in a very, very specific case, and it actually contains the main value that we need.

You can’t… In this case, you could do… If you’re on the student level… Remember… Let’s say we have ‘_groups’, which is ‘/class/student:’, right? Because we want to go over all of them. We can do ‘_text’ to grab the name, right? But, we can’t do ‘_text’ to grab the ID. This will not work, because in XML, they differentiate the two. So, in order to grab the value of the ID, we need to do ‘_attribute: id’. Okay? That’s the way to grab it. So, that is the ‘_attribute’.

Then, we have ‘_sum’, which is like ‘_count’, but basically treats it as a number and summarizes them. And then, ‘_name’ gives you the name of the element. Again, in some cases we’ve seen things like this. So, if we go back to our example, it’s going to be ‘johndoe’. Let’s say id: ‘938102’. Okay? So, in this case, the actually put the name of the guy as part of the element name. This is not best practice, but some devices will do this. Okay? So, the way to get this is ‘_groups’, ‘/class*:’ Asterisk means everything in the class. And then, we’re going to do ‘_name’ …Let’s just say ‘_name’ with this one, which basically says, “I want the name of the element.” So, in this case, it would give us ‘johndoe’. Okay?

Now, the language keeps… We keep adding stuff based on what people need, so if we run into certain things that we can’t do for whichever reason, there are things that we can add.

FEMALE 1:

Where do you add them?

MALE 1:

Well, the engineering team is going to add it to the platform, so we need a request. We need enough requests, and then decide to prioritize it or not.

FEMALE 1:

Okay.

MALE 2:

I imagine these operators are mainly based on like high usage, right? So, you see yourself having to count many things or sum many things. So, having an operator dedicated for that makes it easier.

MALE 1:

Exactly. When we see things happening again and again, we say, “Okay we'll go back.”

Okay. Now, we get to the slightly more complicated feature when the XML and JSON parser. So, a lot of times we have this challenge. Let’s say I define a metric that is a memory. Okay? So, ‘im.name = memory-usage’. And, the value is going to look something like this, right? 71%. Okay? But, the data I’m getting from the device is something like this. ‘Data’, ‘memory-free-kbytes’, ‘1024’. Not a lot of memory. And then ‘memory-used-kbytes’, ‘712’. Okay? But, what I need…what the server needs is the value that is the memory utilization, which is not in this. You can calculate it, but it’s not in this. Okay? That’s where the ‘_transform’ section comes into play. ‘_transform’ allows you to take multiple values and do all kinds of nice, nifty things with it and then generate a value that actually goes to the server. Okay?

So, the way we would do this is ‘_metrics’, the metric itself. ‘_tags’ is going to be ‘im.name’, ‘_constant’, ‘memory-usage’. Okay? And, now we’re going to define this. ‘_temp’, which is going to be ‘free’. I’m just going to name the variable ‘freekbytes’, which is ‘_text: /data/memory-free-kbytes’. And then, another one is going to be ‘usedkbytes’. Okay? So, now we defined two temporary variables that hold that. Now, what we’re going to do is ‘_transform’, ‘_value.double: |’, and then we open an AWK section. And, here, what I can do is this: ‘print ${usedkbytes} / (${freekbytes} + ${usedkbytes}).

What did I do here? I grabbed data from the XML structure into temporary variables, and then I ran a ‘_transform’ on them, and I said, “What I want to do is generate a double value that is calculated through AWK code”, and the AWK code knows to replace these with the values that have come from the ‘_temp’. Notice this pipe. This pipe is important, because this pipe tells it, “Okay, now I’m going to have a big section here.” The pipe is part of the syntax saying, “This field is now going to be multi-line.” Okay? And, these curly brackets show it when it starts and when it ends. Notice that they start in an indentation from the ‘_value’. If they don’t, it won’t be able to see them, because YAML is going to think, “I’m now on the next section.” Okay? So, this is a YAML feature, but the AWK thing inside is a feature within our parsers.

MALE 2:

Is spacing important for pipe?

MALE 1:

Yes.

MALE 2:

Okay.

MALE 1:

Well, I haven’t tried it without, but YAML tends to be sensitive to stuff, so maybe. I don’t know. We can test. One space. It’s one space. Yeah. We did one space. It will work with more. I don’t remember if it works with no space or not. We can try.

So, you’re going to see a lot of ‘_transform’, actually. As you start digging into XML and JSON parsers, you’re going to see a massive amount of ‘_transform’, because every time the knowledge experts like Patrick, Arya, etc. – every time they needed to hack something like this, they used ‘_transform’. You get creative very quickly with these things. And, there are all kinds of interesting things you learn along the way as you’re writing this, but it’s very, very powerful. Yup.

MALE 2:

Does the transform default to a single decimal place, or can you get…

MALE 1:

As much as you want. Yeah, as much as you want. I think AWK has a limit, but I don’t think we’ve gotten there. Okay?

FEMALE 1:

I think Jonathan just wrote something. I don’t know if it was a ‘_transform’, but for checkpoint and bandwidth. So, he added two metrics, essentially, together. I don’t know who…saw that one. Yeah, yeah, yeah.

MALE 1:

Jonathan? What, in an AWK parser?

FEMALE 1:

It would have been an AWK parser. Yeah. But, John, who’s doing it for PANOS.

MALE 1:

Yeah. He would probably need a ‘_transform’ on XML, as well, because if he needs to divide one value by the other, he would probably need to transform.

FEMALE 1:

I think they’re adding.

MALE 1:

Or adding.

FEMALE 1:

Okay.

MALE 1:

Yeah. If you’re trying to do any calculation on any of these values [inaudible, 00:28:39]. Now, you can do this for non-numeric stuff, too. Like, I could also build… I could do a ‘_transform’, ‘_value.complex’, and then do something like this. ‘name: |’, and then ‘print “yoni”’. Okay? So, if we think about the levels of the parsers, the first thing you want to master is the Path language – JSONPath or XMLPath, and using those online calculators and getting them kind of nailed down. Then, you want to work on the metrics. You want to start with the double metrics, and then complex, and then complex arrays. And then, tackle the ‘_transform’. Those are the stages I would go through to kind of keep moving the challenge level up, and up, and up and not directly jumping into something too complicated.

FEMALE 1:

Does that make sense, Pricilla, James?

MALE 2:

Yes, it’s clear. It needs practice a lot, but it’s clear.

FEMALE 1:

I know. It’s a good thing we’re recording, right? Yeah. It’ll come.

MALE 1:

It needs a lot of practice. Again, I mean we have had experience. First of all, it took me a while to pick this up. I can tell you that. I would go to the engineering guys every day asking them questions, and they’d be like, “What didn’t you get? It’s this.” I’m like, “Okay.” And then, I ran through it a few days. I got comfortable with it, and then we trained a few guys on it. And, for them also, it took a little while. Like, some things were not 100% intuitive. Once you get used to it, all of a sudden, you’re very powerful and you can write these scripts really fast, and you see that you can really achieve a lot with them, because trying to do anything similar in AWK would be a nightmare. AWK is not built for this. So, it’s great, but yes, you need to have stamina and go through a few rounds of these before you get used to it.

Any questions? Anything you guys want to add? Okay. Great. Thank you very much everybody. Have a good night or day, wherever you are.

FEMALE 1:

It’s pretty late there.

MALE 1:

Yeah.

FEMALE 1:

Athens Greece. Is it super hot there, Phil?

MALE 2:

Almost midnight here. Thank you. Bye, bye. Have a nice day. Bye.

Indeni Knowledge Language Training

Module 3b: Advanced JSON XML Parsing

Indeni Knowledge Language Training Modules

Related Resources:

IKL White Paper

Command Runner

Indeni