Posts

Heating up the Data Pipeline (Part 4)

Image
In this last part of the "Heating up the Data Pipeline" blog series, we will go through some potentially useful NiFi dataflows. Previous Parts:  Part 1 Part 2 Part 3 Updating Splunk Lookup Files with SQL Query Data The following simple workflow pulls data from a SQL Database using a JDBC connection. The ExecuteSQL processor returns data in Avro format. The results will be send through the ConvertRecord Processor to convert from Avro into CSV. As Splunk does not allow to directly upload CSV files, we have to put the data into a Spooldir on the Splunk Server. In this case Splunk resides on the same server, so we can use the PutFile Processor.  We could also use the PutSFTP Processor for transferring the file to a remote server. Finally we will assemble the proper POST request, and invoke the REST endpoint. Splunk Event Copier Sometimes you want to copy a subset of events from a production system t...

Heating up the Data Pipeline (Part 3)

Image
Welcome back to the "Heating up the Data Pipeline" blog series. In part 1 we talked about how to route data from Splunk to a 3rd party system. In part 2 walked through a simple data flow that passes data collected from Splunk Forwarders through Apache NiFi back to Splunk over the HTTP Event Collector. In this part, we will look at a more complex use case, where we route events to an index, based on the sending host's classification. The classification will be looked up from a CSV file. We will make use of Apache NiFi's new Record-Oriented  data handling capabilities, which will look initially a bit more complicated, but once you grasp it, it will make further Use Cases easier and faster to build. High-Level Dataflow We will again start with our ListentTCP input, but this time we will send the data to another Processor Group. The Processor Group will again emit events suitable for Splunk HEC. Note that we have increased the Max Batch Size from 1 ...

Heating up the Data Pipeline (Part 2)

Image
In Part 1 we went through how to route events from Splunk to a 3rd party system without losing metadata. Now I'll show you how events can be transformed using Apache NiFi and be sent back to Splunk into the  HTTP Event Collector . Note: The following is not a step-by-step documentation. To learn how to use Apache NiFi you should read the G etting Started Guide . Simple Pass-Through Flow As a first exercise we will create a simple flow, that only passes data through NiFi, without applying any complex transformations. The following picture shows a high-level NiFi flow, that receives events in our custom uncooked event format with a TCP listener, then sends the data further into a transformation "black box" (aka Processor Group), which emits events in a format, that can be ingested into a Splunk HTTP Event Collector input. Apache Nifi currently provides a rapidly growing number of processors (currently 266), which can be used for data ingestion, transfo...

Heating up the Data Pipeline (Part 1)

Image
Pre-Processing Data I often hear the question from our customers, how data can be transformed prior to indexing in Splunk. Damien from  Baboonbones  has done a tremendous job in creating add-ons   providing custom inputs for Splunk. Most of his custom inputs provide the means to pre-process data by allowing custom event handlers to be written. Sometimes you still want to pre-process data that gets collected from Splunk's standard input types, like file monitors, Windows EventLogs, scripted inputs etc. Also, not everyone is capable of writing custom event handlers. A requirement these customers have, is that they have rolled out a large number of Splunk Universal Forwarders and they do not want to install another agent. To summarize, the solution capable of pre-processing data, should be easy to use, be easily integrated and be build on top of their existing architecture. How to plumb Splunk Pipelines Splunk has its own fittings to connect a Univer...