All Articles

Sending syslog logs to a S3 bucket

Preface

In my last post, I touched a bit on collecting and sending logs to an Elasticsearch instance. This will be a quick blog on how to utilize fluentd to forward syslog to an S3 bucket.

Set up

  • Install awscli
  • Download & Install Fluentd
  • Setup your S3 Bucket, Instance Profile, and IAM Role

    • IAM Role = Write only to S3
    • Allow EC2 to assume the role
    • Attack the IAM instance profile to the EC2 instance

Install Fluentd

  • Install Fluentd
  • Choose appropriate version of Ubuntu and install it from Apt repository
  • td-agent is the stable distribution of fluentd
  • To read more about configuration syntax

Send your logs to S3 bucket

Consume your syslog messages locally and send it to an S3 bucket:

  • Change (/etc/td-agent/td-agent.conf):
<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  tag system
</source>

<match system.**>
  @type s3

  <assume_role_credentials>
    duration_seconds 3600
    role_arn arn:aws:iam::<AWS-ACCOUNT-ID>:role/<ORG-PREFIX>S3WriteSecurityData
    role_session_name "#{Socket.gethostname}"
  </assume_role_credentials>

  s3_bucket <AWS-ACCOUNT-ID>-security-us-<YOUR-REGION>
  s3_region <YOUR-REGION>

  path syslog/
  store_as gzip

  <format>
    @type json
  </format>

  <buffer tag,time>
    @type file
    path /var/log/td-agent/buffer/s3
    timekey 3600
    timekey_wait 60m
    timekey_use_utc true
    chunk_limit_size 256m
  </buffer>
</match>
  • timekey 3600 stands for 1 hour partition
  • timekey_use_utc true set timezone to be in UTC

The configuration above will take your syslog over port 5140 as system logs, assume the IAM role you created, store it up in a json format and zip it up, and send it to your S3 bucket.

Next, configure rsyslog to forward messages to the local Fluentd daemon by adding these two lines to the bottom of /etc/rsyslog.d/50-default.conf:

# Send log messages to Fluentd
*.* @127.0.0.1:5140

So the whole entire file should look something like this:

#  Default rules for rsyslog.
#
#                       For more information see rsyslog.conf(5) and /etc/rsyslog.conf

#
# First some standard log files.  Log by facility.
#
auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
#cron.*                         /var/log/cron.log
#daemon.*                       -/var/log/daemon.log
kern.*                          -/var/log/kern.log
#lpr.*                          -/var/log/lpr.log
mail.*                          -/var/log/mail.log
#user.*                         -/var/log/user.log

#
# Logging for the mail system.  Split it up so that
# it is easy to write scripts to parse these files.
#
#mail.info                      -/var/log/mail.info
#mail.warn                      -/var/log/mail.warn
mail.err                        /var/log/mail.err

#
# Some "catch-all" log files.
#
#*.=debug;\
#       auth,authpriv.none;\
#       news.none;mail.none     -/var/log/debug
#*.=info;*.=notice;*.=warn;\
#       auth,authpriv.none;\
#       cron,daemon.none;\
#       mail,news.none          -/var/log/messages

#
# Emergencies are sent to everybody logged in.
#
*.emerg                         :omusrmsg:*

#
# I like to have messages displayed on the console, but only on a virtual
# console I usually leave idle.
#
#daemon,mail.*;\
#       news.=crit;news.=err;news.=notice;\
#       *.=debug;*.=info;\
#       *.=notice;*.=warn       /dev/tty8

# Send log messages to Fluentd
*.* @127.0.0.1:5140

Start the agent, restart your pipeline, and check on its status:

$ sudo systemctl start td-agent.service
$ sudo systemctl restart rsyslog.service
$ sudo systemctl status td-agent.service

Make sure that is is properly running:

$ sudo systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/lib/systemd/system/td-agent.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-04-03 13:37:03 UTC; 15min ago
...
$ sudo tail -f /var/log/td-agent/td-agent.log
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-s3' version '1.2.0'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-td' version '1.0.0'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.4'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.2.4'
2020-04-03 13:40:01 +0000 [info]: gem 'fluentd' version '1.7.4'
2020-04-03 13:40:01 +0000 [info]: adding match pattern="system.**" type="s3"
2020-04-03 13:40:01 +0000 [info]: adding source type="syslog"
2020-04-03 13:40:01 +0000 [info]: #0 starting fluentd worker pid=3272 ppid=1388 worker=0
2020-04-03 13:40:01 +0000 [info]: #0 listening syslog socket on 0.0.0.0:5140 with udp
2020-04-03 13:40:01 +0000 [info]: #0 fluentd worker is now running worker=0

If it does not show that it is running or is erroring out, verify that you have no syntax errors in your configuration file and the IAM role is properly attached to your instance.

To view data in S3 bucket, you can go to Select From and perform SQL query.

  • File format - json
  • JSON Type - JSON line
  • Compression - GZIP for mine
select * from s3object s
where s.ident = 'sshd'
LIMIT 4;

And you should be able to see some logs!