All Articles

Collect, Transform, and Deliver Logs

Preface

When an incident occur, and an investigation is needed, the first thing you’ll want to have is access to any logs associated with those events. Most organizations either log too much, or they don’t log enough. Building a good logging infrastructure that is resilient, having the right data, and an intuitive platform for the analysts to do their is one of the most important project when it come to security engineering.

I first got exposure to logs as a software engineer, where I had to work on writing code that will log all API-related transactions. Then comes my next gig, where my job was on a team that’s responsible for building my company’s in-house SIEM with over 1.2 million EPS (events per second). To be absolutely honest, I first thought that was the most boring job so I steered far away from working on the data ingestion and parsing pipeline. Due to that, I did not directly work on the data pipeline, but I had a brilliant team with a tons of smart co-workers that I got to learn from that sparked my interests in logging. Once I started to understand its importance and the intricacies that comes with building, it became a challenging engineering problem to solve, and that fascinated me! So I signed up and took the SANS’s SIEM class this past week to dive deeper into the topic and had a great time! Again, the quote “If you’re bored, you’re probably not paying enough attention” could not be any truer… Here I am, two years later, coming back in full circle to face my old friend - logs!

Log collections

In order to have logs, you gotta collect them. Ok - but what logs? From where? How? Let’s dive in!

If your company is boujie enough to be able to pay for an EDR platform and they have their own agent to collect logs, then you’re probably cool. But in case your company isn’t super invested in “security” yet… then you’ll just have to make the most of what you have!

Windows

For Windows:

  • Windows Event Forwarded
  • Windows Event Collector
  • PS: These are not on automatically, you gotta turn it on to use it.

Types of Logs to collect:

  • Event Logs: Security, System, Application…etc
  • Powershell Script Block

There are many excellent blogs out there outlining exactly how to set it up. All I have to offer is a summary, but will provide you the resource to execute if you truly desire to!

Linux

Most of your logs are in /var/*.log. Go ahead and choose what you might need. Your choice of log daemons (it’s already on, you just gotta send it):

  • syslog-ng
  • rsyslog
  • auditctl

Mac

For Mac, some of it is similar to Linux. Here are a few important locations to gather those logs:

* System Log Folder: /var/log
* System Log: /var/log/system.log
* Mac Analytics Data: /var/log/DiagnosticMessages
* System Reports: /Library/Logs/DiagnosticReports
* System Application Logs: /Library/Logs
* User Reports: ~/Library/Logs/DiagnosticReports (in other words, /Users/NAME/Library/Logs/DiagnosticReports)
* User Application Logs: ~/Library/Logs (in other words, /Users/NAME/Library/Logs)

Agent based?

  • OSQuery

Log Transformation

Cool - you’ve identified some logs. You’ve figured out how to collect it. But it look kind of ugly… or maybe there’re fields that you don’t need and will never use, so why would you keep it because all it does it eat up your storage space and bandwidth consumption. Let’s fix that!

You can transform logs with filters. For example:

filter {
  json {
    source => "message"
    remove_field => "message"
  }
  date {
    match => [ "timestamp", "ISO8601" ]
    remove_field => "timestamp"
  }
}
  • This filter will remove the message and timestamp fields from your json blob
  • This filter will go in between your input and output block!

Filters

Your logs are probably in different types of format. For example, for your Windows events logs, they are probably in ‘.etvx’ format. Then, for your Linux logs, they’re log in /var/*.log format. Well… maybe you also got some web logs that are in JSON… Or someone just sent you a giant CSV and expect you to make sense of it all… Don’t fret - that are parsers for it (most of the time!)

A few types of parsers:

  • csv
  • json
  • kv

Grok

Since I’ll be working mainly with the ELK stack for this task, I’ll touch a bit on Grok. Grok is a way to match a line against a regex (regular expression), map specific parts of the line into dedicated fields, and perform actions based on this mapping.

A great resource to look into some example patterns, as well as playing with sample data is here.

Deliver Logs

Alright, the hard part is over. You wanted some logs, you got your logs! It was a little too much, so you cleaned it up. And now it’s ready to be ingested. Let’s send it to your favorite platform!

How can you send logs? You can use any of these:

  • Logstash, NXLogs, Fluentd

For this, I will pick Logstash and send my logs to Elasticsearch because it’s free, and super easy to stand up as a demo purposes. But as we live in the free market, you have many options: Elasticsearch, Splunk, Graylog…etc. Pick whichever you want to use. It’s your choice and your decision. Do some research, weigh the pros and cons between each platform against the need of your company (cost/size/speed/efficiency…etc).

P.S: I wouldn’t recommend sending a tons of log files straight into your ES index right out of the bat, pick a queue management system such as RabbitMQ/ZeroMQ/Kafka/Redis so you can have better management over your logs ingestion pipeline. Again, I’m sending it straight to ES in this part for demo purposes ;) (Totally a “do what I say, don’t do what I do moment”…)

P.P.S: As a courtesy reminder, please remember to secure your Elasticsearch instance. On the paid version, there is an option to lock it down. But not so much on the free version (as of this writing anyways) - so research/read up some blogs on how to secure your ES instance before you proceed to forward all of your sensitive logs to the public inter-webz please! :)

Putting it all together

Sending logs from all files in /var/*.log folder

input {
  file {
    path => "/var/*.log"
  }
}
output {
  elasticsearch {
    hosts => "elasticsearch"
    index => "var-logs"
  }
}
  • Run: logstash -f /fullpath/to-file-above/var-logs.conf
  • Now, anytime there are new logs in var folder, Logstash will pick it up!

Send some ‘json’ logs over a TCP port

For this example script below, you can see that I’ve defined the input and output.

  • For input, it will ingest the logs over TCP port 6000, and apply the json parser as the codec to parse out my json data I will pass to it (For example, send it over netcat on that same port or something).
  • For output, it will send my logs to Elasticsearch under the “windows-log” index.
input {
  tcp {
    port => 6000
    codec => json
  }
}
output {
  elasticsearch {
    hosts => "elasticsearch"
    index => "windows-log"
 }
}

In order for it to work, all I have to do is:

* logstash -f /fullpath/to-file-above/windows.conf
* nc 127.0.0.1 6000 -q 1 < /fullpath/to-my/windows.json

Where Logstash will run the configuration file above, and nc will send the windows.json log file over the 6000 port I’ve defined in my configuration file.

Wrapping it up and next steps

Once you understand the mechanics behind setting up a logging infrastructure, the next step is to make sense of the logs. Ingest some logs, pick a time range, and try to put together a story of what happened. Or…if you have to do some type of reports, and you’re too lazy to do it the manual way, then just ingest the logs and create a nice chart/graph in Kibana and send that over as a report! I hope this was helpful for some that are new to the logging world!