Module 1: Introduction
Full Video Transcript:
The purpose of this module is really to baseline everyone with at least a fundamental understanding of what IKL is, what Indeni is, and why we are here – really wanting to foster this community of what we call IKE, or Indeni Knowledge Experts, as a community as we try to leverage a lot of the domain expertise that you guys have with Indeni.
So, today we’re going to go over Module 1, which is going to be very fundamental. It’s going to be covering what Indeni is, what we’re looking for across all our Indeni Knowledge Experts. You’ll hear me saying, “IKE” very frequently, just for reference. What does the Indeni architecture look like? How does it work together across each of the components? What the flow looks like. What you will need to get started as an expert, and how we want to enable you, so you don’t get stuck. Towards the end, there’s going to be an extra credit. That may be helpful for you to conceptualize what we’re doing so far. But, it’s not mandatory, so just keep that into a mind of reference.
So, what is Indeni? It’s really a tool used to help IT teams manage their infrastructure. What Indeni does is it constantly learns from other systems in the infrastructure and maintains a knowledge database of what are known for best practices, and how to adhere to an infrastructure that will employ the best practices that are available within the domain of knowledge that we want to leverage.
Since 2009, Indeni’s mission statement has been really one main goal. It’s to identify what could go wrong in the infrastructure and the different components, and to reduce the amount of time the backpedaling, and increasing the time involved in innovating and changing in the infrastructure of the current needs. So, that was a lot of words, but let’s go dive into the architecture.
What is Indeni and how does it work? As you can see here, there are two components. There is a server and a collector component of the Indeni architecture. What you’ll want to probably focus more on is on the collector architecture and how it works, mainly because that is the component utilized to actually look at the data on the devices; parse it out into something that our servers and our rules can actually understand.
Our collectors are used to do two things. One of them is to interrogate the device. And, when it interrogates the device, it only knows two things. It only has the IP address and the logical name that you’ve tagged it. Once it identifies an ability to communicate to those devices, it will go through what we like to call, “The 20 Question Game.” And, it runs through a bunch of commands to identify what model it is, what version, what operating system, if it has any modules in place, which ones are running – anything that we could leverage to identify the device further, and that way identify what kind of scripts of commands we should run on those devices. Sometimes if they’re in a clustered environment, it’s important to run commands specific to a clustered group. So, I just wanted to bring that up for a reference.
So, we communicate to the devices either through SSH or through an API query. So, sometimes we’ll need sometimes irrelevant information. I’m a little bit more familiar with Palo Alto, so I know that we’ll need an API key to access the devices if we’re doing curl commands to it. But, yeah, so those are the two main mechanisms for communicating to the device. Generally, we’ve seen that sometimes the output is more available through HTTP or SSH. It really comes down to what’s available through the outputted device.
So, that gives you a little overview of what the collector does. We will provide examples of Indeni scripts later on, but they usually run at predefined intervals that you can play around with. For example, you may want to check for CPU utilization every minute, but you may only want to check for the NTP server state every thirty minutes to an hour. We want to minimize the impact on the device as we run these scripts. So, that’s something we want to keep in mind as a community when it comes to monitoring these devices.
Once we’ve interrogated the device and we’ve identified what it is, we tag it with… Well, we actually assign it to certain things which we call tags. And then, those tags get pushed on to what we also call the time series database. Another thing that we also push to the time series database is the values that we’re pulling from the monitoring component, which is going to pull the values that we’re checking for on a regular interval basis. And, those are values that we assign a metric name to. So, those metric names have a certain value and a logical name that is actually uniformly recognized by our alerts. And, we’ll go over that later on, but that’s just so that you can understand how it works. Once that information gets pushed over to our time series database – which, it has the device ID, the values, the tags, the metric names, and the metric values that we’ve pulled through the collector. That information gets stored in a time series database as a stateful process.
So, another thing to note about the collector is it actually acts as a stateless process, so all our scripts run without the ability of storing the information in the collector part. So, if you want to think about how you want to approach when you’re building these Indeni scripts, you want to keep in mind that we want to, for a lack of better words, hot potato the values quickly over to the time series database, where it gets stored over time.
The other component… And, this is going to be not so much of an emphasis during the training process. And, I’ll explain why. Is the server component, which has the rules that are enabling the Indeni instance to populate the alerts on the GUI. So, if you really think about the components that are in the Indeni instance, we have the collector, the time series database, the server, and then the Web GUI. The server is responsible for pulling information from the time series database – usually in set intervals of every sixty seconds or what is recommended. And then, that information will be utilized to cross-reference to the rules that are baked in to the server. Those rules, if applicable, will then generate an alert onto the GUI of the Indeni instance. If you guys haven’t had a look at the GUI, let me know and we’ll go over that towards the end.
But yeah, I mean the server is going to maintain the information of both the time series database process and the rules process. And, the rules are most often than not in a template form. So, the reason why we want to focus more on the collector scripts is that oftentimes we have a rule that’s built out of the template already, and so the biggest challenge to generating these alerts of identifying best practices is really figuring out, “What is the data that we’re trying to pull from the device? What is the proper communication mechanism that we want to use?” And, “How should we tag those values so that the rules can actually read them properly?”
So, I have actually right here if I can pull it up… I actually put together an architecture reference for a lot of new people that are trying to understand how Indeni works. I think it’s a great way to visualize how this works. We often use this device. It’s not really a device, but it’s called ACME Humidifier, and the device is actually I think Yoni put together. But, we use this example a lot to help demonstrate how each different component works.
So, initially, when Indeni tries to connect to the device, we’ll run a series of commands. In this example, we’re going to run through the SSH and run CLI commands. And, as you can see, these are the outputs that are pulled from the commands that were running. Once we get that information, that output gets parsed in the collector component. And, whether it’s the interrogation script, we’ll need to tag those certain values, so we understand what the vendor is, what the product is, and what the operating system is, and a lot of other things will be necessary in an interrogation script. But, if it’s for monitoring, we use that information and output and parse it so that we can tie it to a certain metric value. That metric value will have a certain value that we’re trying to identify that our rules will read. I’ll show you in a bit. So, here, we’re tying this metric value to the value called humidity that our rules read on a regular basis from our time series database. And here, we’re also identifying several things, such as the interval that we’re monitoring. “What are the requirements to run this script?”, whether it’s vendor-specific, whether it’s product-specific. Identify whether we’re running it through SSH or API, and what commands we’re running.
Our collector component, or the monitoring component of the collector has three ways of parsing the data. It can do it through AWK, JSON, or XML; depending on the data and the output that we need. We’ve noticed that sometimes the output is JSON, which is much more structured. And so, because of that, we have a different mechanism for parsing the data, as you can see right here. And, here’s another example for the XML.
So, one thing I brought up earlier is since this is a stateless process, we often write our code so that it immediately provides some sort of value to the rules to read. For example, instead of storing every single humidity value available over the course of five minutes and storing it here, that would get pushed off to the server component of the device. Once that gets pushed off to the time series database, which actually resides on the server component; we then go ahead and use the rules to parse the information from the time series database, determine if there’s a necessary rule that we can apply it to. So, this is an example that I put together for our ACME Humidifier. As you saw earlier, we’re pushing a metric value for humidity and identifying it as such for our rule to understand. Here, we’ve set up a threshold that if it was below 5% humidity, we would go ahead and alert using the Indeni GUI, using this description and this format. You’ll go over this a little bit briefly later on, but I just wanted to give you a quick visualization of how that works.
So, yeah, a big component of our server – our rules are going to be templated. So, it’s going to be a lot easier for us to build those rules. But, the biggest challenge for us is to identify what’s going to be the communication method, what are the output that we’re looking for, and how we parse that output, so we only get the relevant information. All right. So, let me go back to our slides.
So, what you’re going to need for the rest of the other modules: You’re going to need a virtual environment of our 5.9 instance deployed. If you haven’t gotten the OVA link for deploying it as a VM, the product team can get that over to you. Another component that’s going to be really important for running these scripts is Command Runner. And, it’s a tool that we use to quickly simulate whether or not your Indeni scripts are working properly or not. Oftentimes, you can use it with the live configurations, but we simply will need an input text file and an Indeni script to run it against. And then, Command Runner will push out an output file that will determine whether or not it works. If you don’t have it, you may need to download Java Runtime Environment as well, if necessary. But, Command Runner will prompt you if you do or do not have it.
Another component, obviously, we’re going to need Sublime or some text editor of choice. [It’s] really up to you. We found Sublime to be just an easier one to utilize. And, speaking of easy integrations, we’re going to be leveraging SourceTree as our Git application. If you aren’t familiar with Git, we actually have a lot of information of how to leverage SourceTree to push or commit your code onto our Git server, which is Bitbucket. And so, the reason why we use SourceTree is actually because it has an easy integration with a lot of our Atlassian products, because we have a button that literally gives you the ability to look at a ticket requesting for a new monitoring script, and check out in SourceTree, and makes it very easy for you to leverage a Git application if you haven’t in the past.
Speaking of Atlassian products, we use quite a variety of them. We’re leveraging Confluence, which is basically a wiki. And, you’ll find a lot of the information that we’re going over in the modules in the Confluence page. So, towards the end, if you’re stuck with anything that we’ve gone over, everything’s immediately accessible in the Confluence if you have access to it already.
Another component is the Bitbucket, which is our Git server where we are pushing our code or our scripts into. If you are familiar with Git applications, there’s what they call branches, and the primary branch is the master branch where our GA instance is running off of. But, there’s a separate branch that we’ll be working very closely under, which is called the staging. And, we’ll go over that in later modules. So, if you don’t have access to Bitbucket, it’s publicly accessible. You can take a look whenever you have a chance.
Another component is going to be JIRA. That’s going to be a very important part for you to leverage, because that’s going to be our ticketing system, and oftentimes we’ll be living in tickets to build these Indeni scripts. Oftentimes because, whether it’s a customer or R&D team looking to leverage new monitoring script, we’ll be pushing that through JIRA. And so, make sure to have oversight in JIRA and getting updates on when there’s a new ticket being pushed and whether there’s one being assigned to you.
Another thing that we’ll need – and this is primarily for you for testing – will be to having access to our lab. You’ll get VPN credentials to access to it shortly after this call. And, I think some of you guys already got access to it. If not, let us know. But, we’re leveraging Slack as our communication point.
So, tools to succeed, and what will help you be successful: Obviously, Confluence is going to be your main source for wiki for a lot of the information that we’ve gone over through all of these modules. Please, if you need to backtrack and look at anything, take a look at the Confluence page. The community page is going to be full of a lot of resources already available thanks to a lot of our current knowledge experts – or our IKEs. And, our forums will have a lot of the Q&A that you may or may not have already considered, and you can just quickly review over. And, I think the biggest component to all of this is simply getting exposure to the KDLab. We have a lot of devices already connected in there, so if you’re testing for specific devices, we probably have it already for simulation. So, getting access to that as soon as possible will be really useful.
JIRA ticketing, obviously, as we talked about before, and Bitbucket. Last, but most important is leveraging your new IKE friends. So, James, if you have any questions or anything else, or if there’s anything that our consultants can help with, let us know. This applies to everyone else out there that’s onboarding. We want to apply a crowdsourcing methodology to this, so when we’re developing the knowledge, sometimes some of that domain might be covered by one of your fellow IKE friends, if not by us. So, please leverage them as you can.
So, last but not least, we want to go over the extra credit. I think this will help you visualize how you want the proper mental framework for building Indeni scripts. We’re going to keep this very abstract. So, think of one Ubuntu based issue that you can think of. And, answer the questions of what you are looking for; how, and why you are looking for those certain values; and why that’s pertinent to that certain issue in just plain English. And, see if you can submit that in the Indeni community page, if you can. And, that will be reviewed either by the product team or one of your new IKE friends. At this point, we’re not really looking to build the scripts yet. We just want to make sure we have the right mental framework set in place.
So, I think that pretty much concludes my presentation. I mean, I did kind of want to keep this short. I think a lot of this can be really easily referenceable through the wiki page, but I want to spend the rest of this time to answer any questions that’s necessary.