Preface
In my last post, I touched a bit on collecting and sending logs to an Elasticsearch instance. This will be a quick blog on how to utilize fluentd to forward syslog to an S3 bucket.
Set up
- Install awscli
- Download & Install Fluentd
-
Setup your S3 Bucket, Instance Profile, and IAM Role
- IAM Role = Write only to S3
- Allow EC2 to assume the role
- Attack the IAM instance profile to the EC2 instance
Install Fluentd
- Install Fluentd
- Choose appropriate version of Ubuntu and install it from Apt repository
td-agent
is the stable distribution of fluentd- To read more about configuration syntax
Send your logs to S3 bucket
Consume your syslog messages locally and send it to an S3 bucket:
- Change
(/etc/td-agent/td-agent.conf)
:
<source>
@type syslog
port 5140
bind 0.0.0.0
tag system
</source>
<match system.**>
@type s3
<assume_role_credentials>
duration_seconds 3600
role_arn arn:aws:iam::<AWS-ACCOUNT-ID>:role/<ORG-PREFIX>S3WriteSecurityData
role_session_name "#{Socket.gethostname}"
</assume_role_credentials>
s3_bucket <AWS-ACCOUNT-ID>-security-us-<YOUR-REGION>
s3_region <YOUR-REGION>
path syslog/
store_as gzip
<format>
@type json
</format>
<buffer tag,time>
@type file
path /var/log/td-agent/buffer/s3
timekey 3600
timekey_wait 60m
timekey_use_utc true
chunk_limit_size 256m
</buffer>
</match>
timekey 3600
stands for 1 hour partitiontimekey_use_utc true
set timezone to be in UTC
The configuration above will take your syslog over port 5140 as system logs, assume the IAM role you created, store it up in a json format and zip it up, and send it to your S3 bucket.
Next, configure rsyslog to forward messages to the local Fluentd daemon by adding these two lines to the bottom of /etc/rsyslog.d/50-default.conf
:
# Send log messages to Fluentd
*.* @127.0.0.1:5140
So the whole entire file should look something like this:
# Default rules for rsyslog.
#
# For more information see rsyslog.conf(5) and /etc/rsyslog.conf
#
# First some standard log files. Log by facility.
#
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none -/var/log/syslog
#cron.* /var/log/cron.log
#daemon.* -/var/log/daemon.log
kern.* -/var/log/kern.log
#lpr.* -/var/log/lpr.log
mail.* -/var/log/mail.log
#user.* -/var/log/user.log
#
# Logging for the mail system. Split it up so that
# it is easy to write scripts to parse these files.
#
#mail.info -/var/log/mail.info
#mail.warn -/var/log/mail.warn
mail.err /var/log/mail.err
#
# Some "catch-all" log files.
#
#*.=debug;\
# auth,authpriv.none;\
# news.none;mail.none -/var/log/debug
#*.=info;*.=notice;*.=warn;\
# auth,authpriv.none;\
# cron,daemon.none;\
# mail,news.none -/var/log/messages
#
# Emergencies are sent to everybody logged in.
#
*.emerg :omusrmsg:*
#
# I like to have messages displayed on the console, but only on a virtual
# console I usually leave idle.
#
#daemon,mail.*;\
# news.=crit;news.=err;news.=notice;\
# *.=debug;*.=info;\
# *.=notice;*.=warn /dev/tty8
# Send log messages to Fluentd
*.* @127.0.0.1:5140
Start the agent, restart your pipeline, and check on its status:
$ sudo systemctl start td-agent.service
$ sudo systemctl restart rsyslog.service
$ sudo systemctl status td-agent.service
Make sure that is is properly running:
$ sudo systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
Loaded: loaded (/lib/systemd/system/td-agent.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2020-04-03 13:37:03 UTC; 15min ago
...
$ sudo tail -f /var/log/td-agent/td-agent.log
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-s3' version '1.2.0'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-td' version '1.0.0'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.4'
2020-04-03 13:40:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.2.4'
2020-04-03 13:40:01 +0000 [info]: gem 'fluentd' version '1.7.4'
2020-04-03 13:40:01 +0000 [info]: adding match pattern="system.**" type="s3"
2020-04-03 13:40:01 +0000 [info]: adding source type="syslog"
2020-04-03 13:40:01 +0000 [info]: #0 starting fluentd worker pid=3272 ppid=1388 worker=0
2020-04-03 13:40:01 +0000 [info]: #0 listening syslog socket on 0.0.0.0:5140 with udp
2020-04-03 13:40:01 +0000 [info]: #0 fluentd worker is now running worker=0
If it does not show that it is running or is erroring out, verify that you have no syntax errors in your configuration file and the IAM role is properly attached to your instance.
To view data in S3 bucket, you can go to Select From
and perform SQL query.
- File format - json
- JSON Type - JSON line
- Compression - GZIP for mine
select * from s3object s
where s.ident = 'sshd'
LIMIT 4;
And you should be able to see some logs!