[MUSIC PLAYING] Hello, and welcome to this presentation on extending syslog-ng with Python. My name is Craig Finnan. I'm a systems engineer with One Identity. And here's a quick look at the agenda. We'll go over a quick product line introduction, and then get into how you can extend syslog-ng using Python. We'll talk about creating source drivers, parsers, template functions, and also destination drivers.
So what is syslog-ng? Just a brief overview, syslog-ng gives you the ability to manage logs as it says here at scale. It's a single, very high-performance tool that allows you to centrally collect log messages from any source at any network protocol, any syslog protocol, and then do things like efficiently filter them, parse them, transform their format, and securely store them.
At the same time, you can take all of these messages or selected filtered subsets, and send them to other applications like SIEM tools. And those could be either on prem or in the cloud. And one of the things that syslog-ng helps you do is to optimize whatever SIEM tool you may be using. It'll make them work more reliably, much more efficiently, and it can in many cases decrease your total cost of ownership for collecting and analyzing log event messages.
Just a brief historical view of where syslog-ng came from, and where it stands today, and where it's going. It started out in the late 1990s as an open source tool. It still is available as an open source tool that gave you the basic capabilities to collect the logs, process them in various ways in a central location, and store them securely. But of course, as an open source tool, it provides only community support. But around 2007, it was turned into a commercial product by a company originally called Balabit in Budapest, Hungary.
And this created a version that did provide commercial professional support. But it also added many additional features over and above what was available in open source. I've listed a few here. As you can see the typeface shrunk because there are many more features that could really fit in here. But some of the significant things were the ability to talk to new destinations like splunk-hec, Google Pub/Sub, also the ability to collect and process messages from Windows, either in an agentless fashion or using an agent. So lots of additional features came with the product as well.
And of course, this is called PE, which stands for Premium Edition. And one thing about these versions of syslog-ng, both the OSE and Premium Edition, they're basically Linux applications. They're command line oriented. In other words, you create a configuration file, which is an ASCII text file. You build all of the features you need to have syslog-ng perform for you.
You define your sources. You define your destinations, and also things like filters, parsers, transformation rules, whatever you need to process those messages in the way you want. And then you restart syslog-ng. It reads that file, and then runs in the background as a Linux service or a daemon, and does so at again very high performance and very, very high reliability.
Then about a year later than that, the Balabit Company, and now One Identity, One Identity acquired Balabit, and now is the sole author and developer of syslog-ng. We created an appliance version, which we call the syslog-ng Store Box. And this, as the image here implies has a graphical user interface. So you interact with this application not using ASCII text files and an ASCII configuration syntax, but through a web-based graphical user interface.
But it's more than just kind of a pretty face on top of the functionality underneath. It also gives you the ability, what it will do for you is it will provide real time indexing of all the incoming messages. So it's indexing all the information in each and every message that it ingests as the messages come in off the wire. So it'll index things like the priority, the host name that's sent the message, the application name. But it will also index all of the individual tokens in the message payload.
And what that does for you is to create an ability to search through your logs, to find messages over the interval of your choice that meet certain search criteria. So it is a very efficient search interface, a search page in the graphical user interface that allows you to very quickly isolate messages that meet your search criteria.
And there's other features as well. It makes it very easy to create real time alerts. It also has the ability to build reports, and has standard built-in reports about the statistics of the events that it's collected. And also you can create custom reports. So you can tailor them to the needs and requirements of various different audiences in your IT organization. And again, this is also provided with full 24/7 commercial professional support.
So just to explain this maybe a little bit further, show the differences between the Premium Edition and the Store Box, I created this Venn diagram. Now it's important I draw this distinction because what I'm going to talk about in the bulk of this presentation is, of course, using Python to extend some of the features in a custom way using Python and the built-in Python binding or API.
Now, it turns out that that Python binding that we're going to discuss is only available in Premium Edition, here on the left-hand side. And the reason is because with the Store Box, the Store Box is a sealed hardened image. In other words, you cannot add software to it or modify it in any way. It's done that for security purposes and to maintain the integrity of the image.
However, Premium Edition, being just an ordinary-- it's a Linux application like many other applications. You'd be running on, say, a Linux VM. You can add software and you can run programs that can interact with syslog-ng. And again, it'll do so through the very cleverly designed Python binding that's built into syslog-ng Premium Edition.
Otherwise, they share the same basic syslog-ng core, the same engine. But the right and left-hand sides show you where they don't intersect, the exclusive features of either one. But here again, we'll be talking about Premium Edition because that's the one where you can, in fact, go and provide custom extensions using Python to your syslog-ng functionality.
Now, the fact that the Python binding will only work with Premium Edition does not mean you can't use it in association with the Store Box. So you can have configurations where, say, you have syslog-ng Premium Edition relays collecting messages from the various devices and applications in your network. And they, of course, can now run that Python binding. So you could add these additional features and these custom features using Python to your relays.
Then, of course, the relays can in turn forward the messages that have been processed in whatever way you need to, to the Store Box, where you can take advantage of the searching and the other features that the Store Box can provide. So it's not a binary situation, not either/or. You can very easily combine that. And many, if not most customers, will in fact, do this very thing.
So how can you extend syslog-ng with Python? Well again, you can create these different types of block drivers that already exist in syslog-ng. Of course, there's many out of the box in syslog-ng. But there may be cases where you might want to create a custom source driver. Maybe you need to collect logs from a specific application and there isn't a built-in source driver in syslog-ng that can do specifically what you want.
OK, here's a case where you can build your own very easily using Python to do that. Same thing goes with parsers. You may have-- syslog-ng has a very rich set of parsing capabilities, and it's almost hard to really think of why you'd need to extend it. But you may have a very special case where the built-in out of the box parsers cannot quite do what you want. So here again, you can create your own using Python.
Same thing with template functions, to be repetitious, there are many, many template functions already available in syslog-ng. But you may have some situation that just can't be covered. But you know how to do it. You can very easily cover that requirement using a Python module. You can easily slip that into syslog-ng with the existing Python binding and then make that happen.
And again, closing the loop here, we talked about source drivers. You can do the same thing on the output side. You can create your own custom destination drivers. Where again, you may have to go to a specific location. Maybe you have to reach out and contact an API. And here, again, maybe that specific connection to that API isn't quite available maybe in the built-in drivers or the built-in HTTP driver. But you can, again, make whatever you need to happen, happen, using Python.
We'll talk about all of these. We'll go into one in detail, but I'll give you an overview of how each of these particular Python custom objects work. So with Python, you can create two different types of source drivers or two different Python drivers can be available to you to use for your custom sources. And they're known in syslog-ng Premium Edition as either fetcher or server mode sources.
And basically I think in most cases you use a fetcher-style. A server-style is something you would use if you wanted to create your own event loop and you wanted to have a non-blocking server framework, or for some reason you need a custom loop, very special circumstances. But I think in most cases, this is reflected in the description in the administration guide. In most cases you're going to want to write what we call fetcher-style.
So these are sources where you don't need to build your own event loop. There's a built-in event loop. And what you'll do is essentially reach out, contact say an API or some other method of extracting sources and messages from a source, and then bring it into syslog-ng. So in other words, syslog-ng is handling more of the work. It's building that event loop for you. So really all you need to do is put the logic in to actually get the logs from that event loop and then bring them into syslog-ng into your processing pipeline or pipelines, and do what you need to do once the messages have been fetched in.
And here's just some examples of how you define these. So you define these very simply in the way that you would define any other, say, source object in your syslog-ng.com file. So you'd have a statement that defines your source, say, source and then s_cloud or whatever you want. That's entirely arbitrary. So that could be any string name you want. And it doesn't have to start with s. That's a convention in syslog-ng to start a source with a and then underscore and then the name, a filter with f as in foxtrot, underscore, and then the filter name.
But it's not at all necessary or enforced. But it's good practice. But here again, what we're doing is we're saying we're defining new source. And we're using a driver now instead of network, or syslog, or file. Our driver name is going to be python. And then the options to our driver name that's named python is going to be a few things. One necessary thing is a class name. We have to define a class name.
And again, this is arbitrary. You make this up yourself. You name this the way you need to. But you can also add a series of options. And these options that you define here will eventually-- they'll be passed to your Python object. And they'll be available in your Python code as entries in a Python dictionary. So in other words, we'll see this exact case used a little later on when I show you a real Python model that's using this particular syntax.
But you could say, look, I want to define my host IP address, a port number, some username or whatever. You can have as many of these options defined for your driver as you need. And then, of course, having defined your basic Python source driver, you're going to go and build a log statement. You build the remainder of your log pipeline that uses that source, in this s_cloud, and then of course, this is a very simple case where I'm only specifying a destination.
But of course, between the source and the destination you can have any number of other syslog-ng objects like filter rules, or parsers, or rewrite rules, whatever you need. So again, a very standard syslog syntax to actually define your Python object.
And if you're going to be doing a fetcher, so we're going to look at an example of how we're going to define the fetcher object. Here is where we have actual separate Python code that we would include. Now, we can include this directly in our syslog-ng.com file. So this will be in line, in our file. What we need to do now is build out that Python code and define the source that we identified before in our source block.
So we're going to say class CustomSource. Now, one thing we need to do is we need to inherit from a built in superclass provided by syslog-ng called log fetcher. So that's how we specify. We include that in our class definition. And then what we have to do is have a certain number of methods defined the syslog-ng can call when it's initializing your object.
And many of these are going to be optional. In fact, there's one called def init, not to be confused with the init of the actual object itself. This is a module or a method that syslog-ng will call if you have to do some initial setup for whatever reason. Again, I'll show you an example of this, a more detailed example, a little bit later on. But this is entirely optional. But if you want to do some setup that syslog-ng can do when the object is created, you can have that done here. It'll return true.
Another one is open. This is also an optional method. You don't necessarily have to do anything with this message. But here's the one that you have to have, syslog-ng will enforce this, you'll have to have a method that is actually doing the work. This is where you're going to have the actual code that's going to go, reach out, and actually grab the log messages from your source. And it'll be passed a couple of options here or parameters.
One is LogFetcher SUCCESS. The other is called LogMessage. In other words, what this will return is what will be in a sense a dictionary-- it won't be a dictionary. It'll be a tuple or tuple, depending on your preference, of all the fields that have been parsed out by syslog-ng in the message.
So in other words, that will be a tuple that will include things like the parsed out hostname, the program name, the priority, the message itself. In other words, any of the standard name value pairs that syslog-ng parses, so again, in your message you can get at essentially those macros directly from your Python code.
And then there will be a mandatory exit that you'll provide to tell syslog-ng that, OK, I've done my work. Now we could get out of this particular algorithm and logic. And of course, another one, a deinit, so just like the init, if you had to do any kind of cleanup when your methods and this object is going to be decommissioned, you can have that done here. Again, it's an optional thing. In many cases you just have a pass as the only part of the method itself.
And, of course, there's also a server option. I won't go through this. But it's very similar to what we've seen with the fetcher. It's going to have some of these optional methods, like the ability to run an init method, to do some initial setup if you need to do that. You don't necessarily have to do it. But again, there will be one called run which is mandatory. Again, this is where the work's going to be done. But this is going to have to run the main loop, again that you're going to build.
It's going to create, again, those log message objects, make them available to you. And it's going to then post the message into the pipeline that you're going to have as part of your log pipeline, accepting messages from that source. And again, you're going to have to tell syslog-ng that you're going to exit. This says, I'm all done. This will be the tear-down of your source, which syslog-ng has to end or reload. And that'll tell syslog-ng to shut down that main loop.
And then another optional method called deinit, which again, is the end point analog of your init method. Completely optional, but if you have any other type of cleanup that you want to do, you could do it with that deinit. So those are the basics of building sources. We'll look now at parsers, and same kind of basic logic here. Where again, you're building into your syslog-ng.com file basic parser definition.
So just like you would with any of the built-in parsers-- like the kv-parser or comma separated value parser. What you're going to do is make that definition of the object. And here again, following convention, we'll call our parser p underscore something. In this case, we're going to call it resolver because we're going to be talking about parsers that could do some IP name resolution.
And here what we do, again, is we name a class. So we're going to say in our Python driver definition, we're going to define the class that we need to reference. And again, this is our name. We decide what this needs to be. In this case, it's called SngResolver. And as we saw before, OK, we've defined our parser block, our definition. Now, we need to put it into a pipeline with a source and a destination, and then reference that parser in that pipeline.
So in this case, anything coming through a standard TCP definition called s_tcp, we're going to run those messages through our new Python custom resolver parser. And then we're going to take the results of that parser, and send the results to our destination, which in this case is something called d_splunk. It could be anything again. again, of course, we could have filters and reroute rules as well built into this processing pipeline if we needed to.
So everything here is pretty much standard syslog-ng.com stuff. The only thing is now, of course, we're using this new Python driver. So what we're going to have to do here, just like before, is we defined this class. We've got to have our Python code someplace. Again, the easiest place to put that Python code that's going to define our SngResolver class is going to be directly in our syslog-ng.com file.
Now, there's also an option which we'll talk about a little bit later, where you can separate your Python code into separate files that you include in your syslog-ng.com file. But again, it's perfectly valid to have the Python code directly in line in your syslog-ng.com file as well. And what will that code look like? OK, well, we'll just have to say, all right, we referenced this class called SngResolver in our Python parser definition. Now, now we have to fill that in.
And in this case, we're showing one that's, again, filled in with some actual Python code. This is a very simple case where we're saying, OK, what we need to do is the one thing we need to do is since this is a parser, we have to provide a method called parse, naturally enough. Makes sense? And parse is going to have two parameters passed to it. One is the standard self, the instance identifier, and the other one is going to be log_message which again, is going to be the message itself coming through the pipeline from the source.
And log_message, what it's going to do, it's going to be a set of values that syslog-ng has parsed out of the standard fields in a syslog-ng message, which you can then extract by using the syntax log_message. And then you find the actual part of that dictionary that's going to give you that value. In this particular case, it's looking at something called suricata, suricata logs. It's looking for the destination IP.
And then what it's further doing is saying, OK, what I'm going to do now is I'm going to decode. I'm going to take that IP address as I got it from my standard parser. I'm going to use my decode ability in Python. And I'm going to take that decoded utf-8 piece of the message, that destination IP address, and I'm going to try to resolve that using my standard DNS infrastructure.
And so what I'm doing here is I'm going to say, OK, I'm going to connect to a socket, and I'm going to use Berkeley socket method that's built into Python. And that's just called, as in the Berkeley socket's API, get host by address. And what do I pass to get host by address? Well the IP adder that I took from this particular dictionary entry. And you know then I'm going to look at the host name. I'm going to find the first field in what get returned from get host by address.
And then I'm going to take that, and I'm going to rewrite hostname.dest to what I got from my get host by address call. And I'm doing that in a standard try and accept structure in Python. But again, what I'm doing is I'm just doing a very basic thing, which in fact I mean if you think about this, if you're familiar with this syslog-ng, this is really doing what syslog-ng could do anyway with its own ability to do DNS resolution.
But here again, it's just a way to show that you may have other requirements that are more complex than this, or again, things that syslog-ng will not do directly for you. And you can go ahead and whatever you can do in Python, you can use that to get the specific and exact results you need.
The template functions. It's a little bit different but somewhat easier. So there are many template functions in syslog-ng, again, that you can utilize in your code. But here you once again may have requirements that they just can't quite be satisfied with the out-of-the-box template functions. So we can resolve to Python. And it's actually quite simple now. Because what you can do is you don't need to necessarily create a class definition. All you need to do is define essentially the method.
So here, what we're going to do is if you see in our template definition, what is our template function name? It's python, right? So we invoke this here in this definition, just like we would invoke any other built-in template function. Now, what we're calling it is resolve_host. And we're passing what will be now our Python template function, a source IP address. This is a standard macro at syslog-ng. In its standard syslog parsing, will extract from the message.
So this will be available to us. We know there will be a source IP macro that we can reference from syslog-ng. But what we're going to do is we're going to do a resolution on that source IP, just like what we showed in the previous example. And obviously, what we're going to do here is we've defined here in this case a file destination that includes a Python template function. We're going to now just do a standard log pipeline statement with the source and the destination. Destination, of course, includes this additional processing we want to do with Python.
And now we need to do one more thing. We don't need to create now or reference a class definition. We just need to define the actual function of the method that we are using in our template function definition. So we're going to do resolve_host, which is going to take the incoming log message and the IP address again that we passed it right. Again, what we're passing is the results of evaluating the source_ip standard macro from syslog-ng.
We're going to put here the ability to put and entry into our internal source. So we're using a built-in method called logger. And logger has a few ways. There are different fields you can use. So you could have logger.trace, .info, .debug, depending on how you want to qualify that. But we can have these messages. And incidentally, this is very useful for debugging.
To actually prepare, in this case, we're saying I want to send a message into my internal syslog-ng message source or output that describes what I'm doing, right? So I can follow this in my log messages to make sure it's doing what I thought. In this case, again, I'm going to have a try and accept construct here in Python. And I'm going to do the same thing I did before.
I'm going to use a socket interface of Python to talk directly to my socket to do a get host by address, using the ipaddr that I've specified, and which has been resolved from my source IP. And if everything goes right, what will eventually happen is I'll return hostname and that can be used as another macro or be filled in wherever I need that particular hostname, again, resolved from the IP address.
So again, a little bit easier to use because you're just defining a method. It's just a function definition essentially that you've named in a standard template function using what you now call the Python template function driver.
Python destination driver, OK, it's the output analog of what we looked at before for source drivers. So here again, what you're going to do is your standard syslog-ng destination definition. You name it arbitrarily. In this case, we're saying it's going to be called d_cloud. And now, we use the Python driver which we can now pass individual options, one of which will be our Python class name.
We're going to reference a new class. We're going to provide the code for it called CustomDestination in this case. And as we did before, in the description of the source definition we're going to be able to also provide some other options right up front here that we can then access. We can access these in our Python code. These will essentially be passes at Python dictionary.
So we can, in our code, refer to that using Python dictionary syntax to get at these values. And as usual, we've just defined a new destination. We need to close that off with a log statement using it. So pretty much log standard syslog-ng processing here.
And then, again, we reference that custom class. We called it custom destination. We name it what you like. But then what we need to do is go ahead and do the same kind of-- we build our Python code, keeping in mind that we have to provide certain methods. And we need to also keep in mind that there are other methods that are optional. We don't necessarily need to have.
Again, the optional ones are things like the init. Maybe we don't need to do any particular initial setup. Maybe everything we're going to have is already defined in our options when we defined our driver. But the one thing we will obviously have to do is have a method called send, right? That's where the rubber is going to meet the road. That's where we're going to actually do the work to send the message that reaches out and maybe talks to an API. And we need to do things like build authentication headers, provide lots of other detail, and then make that connection via HTTP or however else we're going to do it.
But again, that'll be the meat of our Python destination driver. And then, of course, we'll have other optional ones like flushing the pipeline when things are done, and maybe something to define if it is opened or not open, our destination that is. So again, pretty much follows the pattern that we saw before for the source definitions, but again, looking at it from the opposite end of the pipeline. Then again deinit, right? deinit is another optional method that we may want to have included once syslog-ng is going to close down and stop.
And OK, so let's now look at one in a little bit more depth right. A lot of the stuff I showed here has been high level, hasn't shown a working example in a lot of detail. So we'll take this opportunity now to look at one. So what I'm going to do is actually build a parser here. I have a parser that's going to look a lot like the two examples actually we had before where we were doing name resolution through a Python parser module.
It's going to be a little bit different though. And I'll show how it differs, as well as showing you the more detailed code. And again, I'll preface this by saying that what I'm going to do here is really something that could realistically be done by built-in features of syslog-ng Premium Edition. syslog-ng can already do DNS reverse name resolution. But I want to show you again to build out an actual way to do this in Python.
And there will be one wrinkle that's a little bit different from what syslog-ng does. We're going to be doing our own lookup, but we're also going to be doing caching. We'll build our own cache, that we don't have to do a resolution that hits our DNS name server every single time. It's going to have its own cache. It's going to build that cache using an in-memory Redis database. So it's going to use that as a lookup cache.
It'll look there first. If it can't find it in that Redis cache, then it'll have to say, OK, I've got to go out and hit my DNS name server to get it. Once it does get it, it'll of course put that pair, that IP address that it resolved into the cache. And it also provides for a configurable expiration of your entries. So if you have a situation where your DNS database does not change very often, you can have a much longer expiration built into Redis.
Or contrarily, you might want to say, hey, I do have a very dynamic DNS name to IP address mapping. It changes more frequently. You could reduce that expiration. So what'll happen is as the entries are made by our Python parser into Redis, they'll be also entered with again a configurable expiration that you can make as long, as short, or whatever you need to do.
And following the logic we already talked about, the first thing we're going to do is we're going to name it. We're going to have a parser block definition, a standard block definition as far as syslog-ng is concerned. We're going to name it p_addr_resolver. And here again, we need to use the driver name Python with a set of options.
The first one is mandatory in this case, right? We've got to define a class that we're going to refer to for this Python parser. And then we're going to set up some options. And what we're going to do here is we're going to set a port name that we're going to need to refer to in our code, a host name, and also an expiration. We're going to have to tell Redis what that expiration wants to be. But here again, instead of having a hard code in our Python code, we can add it as an option in our configuration file to make that easier to change.
And the reason we're talking about a specific port number here, is the Redis server on this particular instance of syslog-ng, Redis listens on TCP port 6379. So we're setting it up here. If that were to change for whatever reason, here again, we could change it here in our options block definition, instead having to do it in the code.
Our hosts local host because we're hosting Redis on this particular machine, and the Redis expiry, that's completely arbitrary. But I extended that out to what that's like a day. But you can make that longer, shorter, whatever your needs are. So here we've done the basics to get the definition created. This will go on our syslog-ng configuration file.
The other thing we need to do is in this case, what I'm doing in my particular configuration is I'm applying this parser to events that I am ingesting from the Windows Event Collector module, just like a GPE. So this is our agentless way to collect Windows events. So I'm defining my source for my WEC inputs. They're called windowsevent. That's standard. I'm prefixing it in my case with the string .SDATA. That again, is arbitrary. You can make any prefix you want.
I do that primarily because I'll be sending these Windows events to syslog-ng Store Box which I'll show you in a minute. And when you send it with the prefix .SDATA, that makes it very convenient for ingestion into the Store Box, because they will come in with all that metadata that's extracted from the XML file from the Windows event will be turned into structured data in the RFC 5424 message coming to the Store Box.
And then all of those extracted name value pairs from the Windows XML will then be available as what we call dynamic columns in the Store Box. So it makes it very easy to add whatever those name value pairs as custom columns in our search display, which we'll see in a little demo toward the end. And then, of course, we have our log statement, log where we define our source, which is the source above. But we need to insert our parser which is again, the my_addr_resolver.
And in this case, our destination is going to be a ssb, syslog-ng Store Box. So that's all the preparatory stuff. This is all standard syslog-ng.com for the most part. OK, let's take a closer look at the actual Python code. It's going to be installed directly in our syslog-ng.com file. It'll have the Python keyword and the Python code itself. The source code will be delimited by curly brackets in a typical syslog-ng fashion. It'll be ended with a closed curly bracket and a semicolon. So it'll follow the same syntax that we use for any other type of syslog-ng object.
So in this case, we're going to import from syslog-ng. It's a module that has some syslog-ng specific functionality. And then we're also going to import the socket module from standard Python. And the reason we're going to do that is we need to create a socket to talk directly to the Redis server. We're going to use a low-level Berkeley socket to interface to it.
And you can see from the code, we're going to have an init method. This is optional. But we're going to use this. And in fact, we're going to use it to set up that initial connection to the Redis database. So we're going to create the socket. We're going to connect to that socket. It's a server socket hosted by Redis.
And we're going to use the options that we provided in our definition of the parser earlier to get the host name, which in this case is just localhost. It's running on the same host that we're running syslog-ng on. We're also going to specify the port. We need to know the port we need to connect to. And we're also including the option that we provided earlier, the expiration time for all the keys that are going to be sent to our database.
And the first thing we're going to do is create a command to flush all the keys in our database. So in other words, we're going to start from scratch. We're going to flush out any hostname to IP address options that we already have in our database, start from scratch when syslog-ng is reloaded or restarted. And we're going to define a command.
And we're going to put that in a syntax that Redis will understand. We're going to be using the Redis serialization protocol. So we'll take a string. We'll convert it to a binary string. And then we're just going to send that command to Redis. That's going to clear out anything that's already in that database. Redis is going to give us a response, but we don't need it. It's not necessary. It's basically going to be an OK. The command won't fail. But we need to remove that from our input queue. We don't want that to interfere later on when we actually make other requests to the Redis database.
So we're essentially just eating that and ignoring what we get. We're also here creating a logger object. Again, this is something that's provided by the syslog-ng module that we were importing. This gives us the capability to create debug messages to help debug our code or monitor what's going on. And this will be sent essentially to the internal database for syslog-ng. It will generally go to a file you specify. Ordinarily it'll be var log messages. But it could be some other place where you send your internal syslog-ng sources.
And then we're just going to return true. We're going to define a deinit method here, but we're not going to use it. So we're just going to say pass. And that's that.
And incidentally, what we're going to be doing here, let me just give you a diagram of our demo environment. It's just a very simple setup where I've got two Windows machines. One is a server, Windows Server 2019. The other is an ordinary Windows 10 desktop. They're sending all of their events to syslog-ng and they're using the agentless method to do it.
So we're using the Windows Event Collector, which again can operate on syslog-ng Premium Edition. So I have a Premium Edition relay that's doing two things for me. One, it's hosting WEC. Since I'm sending it to the Store Box, I need to have this relay, because the Store Box itself with all this great features, one thing it does not do is have a WEC module. So we need to depend on syslog-ng Premium Edition to do that. We'll do that in a relay.
And as I also mentioned earlier, in order to run our Python custom code, we can only do that again on a Premium Edition instance. And we can do it on a relay. So this is where our Python code is actually running. All our results will eventually get sent to a Store Box, and we'll see that actually in a live incantation of the Store Box. But again, I just wanted to make sure you saw that so you understood exactly where we're going with this particular code.
So this is the init module, or init module for the code itself. Now, let's look at the method that does all the work, parse. So parse is going to have two input parameters. One is self. The other one is a log message. So this is going to be passed to our parse method by syslog-ng. And log_message is going to contain our entire message, all the components of the message, the standard syslog components. And the macros that syslog-ng parses from that message will be available to us.
So in this case, since we are getting messages from our Windows Event Collector, the Windows Event Collector is going to automatically parse out all of the events in a native XML coming from the Windows machines. And what it's going to do is we're also telling it, OK, get those events, parse them from the XML, things like event.system.computer.
In our case, we added if you recall, a .SDATA prefix. And we're doing that for a reason. Again, the reason is primarily because that makes it very convenient when the message gets to my Store Box to get that information converted into what we call dynamic columns. We can make a specific column just for that piece of data, that macro.
So what we're going to do here is we're going to say, OK, syslog-ng, I want you to look in the message. I want you to find what event.system.computer is, and that's going to be the fully qualified domain name from again, the Windows event. I want you to take that, decode it into a string. And that's going to be fqdn.
And now what we're going to do is we're going to say, OK, let's create a query to Redis, this to our database. And again, that query is going to be we need to format it in RESP format, or the serialization protocol. But it's just get, space, plus that fqdn value. And we also have to add a new line. We byte encode that. And then we send it to Redis.
Redis is going to send us a result. It's going to be actually two lines separated by one line that's going to have the length of the response, the second line with the results. So what we're going to do is we're going to say, OK, we're getting this into Python. We're going to use the Python splitlines method. That'll convert it into a Python list. And then we're going to say, we're going to look at the result. And we're going to call it the raw result. That raw result is the first thing we want to check is what is the length of our response.
And we want to make sure it's not $-1, because a $-1 response in that first element of that list is going to mean that there's no data. We have no data at all for it. So as long as there is data, as long as it's not $-1, we're good. And what we could do is say, OK. We got a response from the Redis database. And the second list, in other words raw result 1 is actually going to be the IP address we need.
So we're going to take that. We'll decode it back into an ordinary string. And now what we're going to do is we're going to use log_message, and we're going to create a new macro. It's going to be a brand new macro. We're going to call it .SDATA.meta.SourceIP. We want to call it source IP, because we're eventually going to use source IP as a column in our Store Box display on s search display.
So we're creating that. And what we're going to use to create that is what we got from our result. In other words, this will be the IP address matching that FQDN, and we're going to create this new macro that will be get sent eventually to the Store Box with that value.
And then we have an else clause here, right? The else part of this if else is going to have to take place if in fact, our response from Redis did have a $-1. In other words, it was null. And in fact, we'll expect that when the first time this method is employed, because of the fact that we flushed that database. So there won't be any data in there.
So the else says, OK, look. There's no result from Redis. What does that mean? That means we have to go out to our ordinary DNS server, and make that name to address resolution. So what we're going to do now is we're going to create another value here. We're going to call it mykey, just for clarification. That's going to be equal to FQDN, which we know. We know that.
And then we're going to use another socket interface call. We're going to use gethostbyname against mykey or FQDN, and send that to Redis. So Redis is going to send us-- it's going to find, once we get that host name, we're going to put that into the Redis database. So we're going to add that key and its value. And then next time around, of course, it'll be there. And we won't have to go to DNS.
Here again, we're going to send that with socket.send. And we're going to do the same thing we did earlier. We're going to get a response from Redis about the success or non-success. But we're going to get a response that we're going to ignore. Right? We're going to eat that again and ignore that response. And then we're going to do something here that's strictly unnecessary, as you can see by the comments.
At this point we know what that name to address mapping is. We don't have to now query Redis. But we're going to do this here. I'm doing this as an example. We're going to go ahead and do the query anyway, and go back to Redis. Now, that you've got the value, let me just double check. Let me send a get against that key value to Redis. And let's see what it returns to us.
And we're going to look at that result from Redis. We're going to do the split lines to split that response into a Python list. And then we're going to do the same thing we did before. We're going to take the result essentially, and create that SDATA.meta.source IP macro based on what we just got back from Redis in this case.
And of course, what'll happen is when you first start up syslog-ng, you'll be doing a lot of this. You'll be doing a lot of reaching out to the DNS server until you build up your Redis cache, and all the values eventually are primarily all in Redis. And from that point on, you'll be skipping having to go to DNS. OK, so that's basically it. I mean to really learn more, here are some links that you can go to. All of the information I've gone over is available on the administration guide. And it will be much more detailed, especially some of the things I didn't cover in very great detail.
For instance, I mentioned to you that your Python code does not necessarily have to be in line in your primary syslog-ng.com file. But you can put them in separate files in the directory or folder that you like, and then include them. There
Are a couple of other things you have to make sure. You have to make sure the Python path that syslog-ng has when it starts up, those directories where your code is going to be stored is available and can be found in that Python path. A couple of other minor things you have to take care of, but again, that's all documented very clearly in the administration guide.
So now what I'm going to do is I'm going to leave the PowerPoint presentation. And I'm just going to go into a live version, a live running version of my Store Box, just to show you what the results of all this is. So we'll do that right now.
OK, now let's take a look at how this all comes together at the destination. So as I mentioned before, I have Windows events coming from two separate Windows endpoints. They're being collected by a syslog-ng Premium Edition relay running our WEC module. So the Windows events are coming in their native XML format. WEC on the relay is parsing out that XML and converting all of the name value pairs into what can be used in syslog-ng as ordinary syslog-ng macros.
It's then taking those macros and we're sending them to the Store Box using the RFC 5424 syslog message protocol. And so all of those macros that have been extracted from the events are now going to be sent as the structured data part of those RFC 5424 messages. And of course, the other thing we're doing on the relay is we're letting WEC do the automatic parsing of the XML. We're adding our own parser, the address resolver parser.
That's going to say, hey, I'm going to take the FQDN that I'm getting from my Windows Event. I'm going to use that and find the mapping to the actual IP address. And that's going to take place in my parser via that Redis database. And of course, if Redis doesn't have the value, it'll go out to DNS, the ordinary DNS server, find it, and add that key value pair to Redis so that subsequent lookups can happen very, very quickly from that in-memory database.
So here's what we have on the syslog-ng Store Box. So what I have here is the search page of the Store Box. And what I'm doing is looking at a specific what we call log space on the Store Box, which is the analog of a log directory. I could have many of these. I have one specifically for my Windows events. So when I want to search just Windows events, I'll have a repository or bucket that only contains those.
And this is live, so this is actually running now. And so what I could do is I could search over the last 15 minutes, show what's come in, I'm doing this without any search criteria. Of course, I could put search terms up here, so that I could limit my display to only those messages or those events that match those search terms. But here I'm just going to show everything that's come in. You get a nice histogram here that shows the rate at which messages are flowing in within the period I designated.
What I'm doing here is I'm using this fast path. I'm just showing the last 15 minutes of messages. And the display is very nice because it gives you a columnar format, a table format with various pieces of information, again, that syslog-ng has parsed out for us. And this is customizable. I can customize these columns to show whichever name value pairs I think is important for me to display here right up front to make it convenient for me.
And the way you decide that is by-- I'll show you exactly how you can do this. I'll go to a specific message. If I go all the way to the right here under the Message column, click the greater than sign, I'll get the details behind that particular message. You'll see that these are Windows messages. They're multi-line messages. But they're nicely formatted, so you can read them very easily pretty much the way you'd see them in the Event Viewer on Windows.
But one thing you'll notice is at the very bottom, and let me do one thing here, to go full screen. At the very bottom of the display, you'll see a very large tile here that has dynamic columns. And actually what this is, it's a comprehensive list of every single essentially name value pair that syslog-ng has been able to extract from the Windows event. You'll notice they're all prefixed by .SDATA because that's the way I defined it in the syslog-ng.com file on my relay.
I wanted to add that prefix. That ensures that it gets sent again in that structured data part of an RFC 5424 message. It does make it available as these so-called dynamic columns. And you'll see here that wherever I have a name that to the left of it has a gray background circle with a plus sign in it. If I click on that, I could go in and add that particular name value pair to my display that we were just on.
And in fact, where you see the blue ones with the minus sign, those are already added. And you'll see that here we have one called source IP, and that's exactly what I called the macro that I got from my Python parser. Remember, we took the fully qualified domain name, asked Redis for the IP address of that FQDN, and then we added it to that group of data that's going to get sent to the Store Box as structured data.
So I've already put this into my column display, as you can see by the blue circle. So if I go back now, again, just to highlight that, you'll see that I arranged it so I can see now that piece of information that I did get from my parser in the second column. But I put it right after the timestamp, and right before the fully qualified domain name. So right up front here, I can quickly see both the FQDN and its corresponding IP address. And again, that's the purpose of that particular parser in my case.
And of course, there are many, many things about this display that are very useful, besides obviously giving you a very clean way to look at your collected messages and a quick and easy way to search. You can do many things like, for instance, I have a column here with the Windows Event ID. Which again, I can optionally have this column or not have it. I chose to put this into my display because I think it's convenient to be able to see right on the initial search page what the event ID is for any particular message.
And another thing I could do here is I could say, hey, I'd like to see the range of event IDs that I've got. I can click on this pie chart icon to the left of the column heading and it'll show me the statistics. It'll say, OK, well in that period 15 minutes that you selected, given your search criteria-- but I'm looking at all of them I'm not limiting the search in any other way, here are all the different event IDs you've seen.
And now if I wanted to say, hey, I'd like to only look at the ones that had event ID 4624, I can click over that when it gets highlighted. I'll left click on it. We'll add that as a search criterion with the proper syntax. I can redo the search, and now my display will only have the events that have that particular event ID, so a very nice way to be able to monitor what's going on in your environment. I'll clear that.
And over here is the source IP. We'll look at that. We'll do the same thing with statistics here, and we'll see the two machines as you recall in my environment, I have the two Windows machines. 12 is a server, 72 is the Windows 10 desktop. But again, here is the end result of all our processing. Of course, you can do many, many things with Python custom parsers, sources, destinations, template functions to do much more complex things and maybe much more useful things, again, as I mentioned upfront in my presentation.
This is a little bit artificial because syslog-ng can natively do DNS lookups in either direction. So probably you could have done this with a native built-in feature instead of going out to Python to do it. But again, I wanted to show you a full example of how you would implement Python through your syslog-ng configuration. And it is very, very straightforward, very easy to do.
So again, I hope you've enjoyed this presentation and this little mini demo. And of course, if you want more information, please go to syslog-ng.com where you'll have a comprehensive list of all the documentation, both for the syslog-ng Store Box and for syslog-ng Premium Edition. Premium Edition Administration Guide will again provide full detail on creating your own custom Python sources, destinations, and other syslog-ng blocks.
So again, Thank you very much.
[MUSIC PLAYING]