Fluentd vs. Logstash for OpenStack Log Management

Fluentd vs. Logstash
Masaki Matsushita
NTT Communications

About Me
● Masaki MATSUSHITA
● Software Engineer at
○ We are providing Internet access here!
● Github: mmasaki Twitter: @_mmasaki
● 16 Commits in Liberty
○ Trove, oslo_log, oslo_config
● CRuby Commiter
○ 100+ commits for performance improvement
2

What are Log Collectors?
● Provide pluggable and unified logging layer
Without Log Collectors With Log Collectors
Images from http://fluentd.org/ 3

Input, Filter and Output
4
Input Plugins
tail
syslog
Filter Plugins
grep
hostname
Output Plugins
InfluxDB
Elasticsearch
● They are implemented as plugins
● Can be replaced easily
Log FIles
Components

Two Popular Log Collectors
● Fluentd
○ Written in CRuby
○ Used in Kubernetes
○ Maintained by Treasure Data Inc.
● Logstash
○ Written in JRuby
○ Maintained by elastic.co
● They have similar features
● Which one is better for you? 5

Agenda
● Comparisons
○ Configuration
○ Supported Plugins
○ Performance
○ Transport Protocol
● Integrate OpenStack with Fluentd/Logstash
○ Considering High Availability 6

Configuration: Fluentd
● Every inputs are tagged
● Logs will be routed by tag
nova-api.log
(tag: openstack.nova)
cinder-api.log
(tag: openstack.cinder)
<match openstack.nova>
<match openstack.cinder>
Filter/Route
7

Fluentd Configuration: Input
<source>
@type tail
path /var/log/nova/nova-api.log
tag openstack.nova
</source>
Example of tailing nova-api log
● Every inputs will be tagged
8

Fluentd Configuration: Output
<match openstack.nova> # nova related logs
@type elasticsearch
host example.com
</match>
<match openstack.*> # all other OpenStack related logs
@type influxdb
# …
</match>
Routed by tag
(First match is priority)
Wildcards can be used
9

Fluentd Configuration: Copy
<match openstack.*>
@type copy
<store>
@type influxdb
</store>
<store>
@type elasticsearch
</store>
</match>
Copy plugin enables multiple
outputs for a tag
Copied Output
tag: openstack.*
10

Logstash Configuration
● No tags
● All inputs will be aggregated
● Logs will be scattered to outputs
nova-api.log
cinder-api.log
Filter/Aggregate
aggregated logs
11

Logstash Configuration
input {
file { path => “/var/log/nova/*.log” }
file { path => “/var/log/cinder/*.log” }
}
output {
elasticsearch { hosts => [“example.com”] }
influxdb { host => “example.com”... }
}
12

Case 1: Separated Streams
Input1
Input2
Input3
Output2
Output3
Output1
● Handle multiple streams separately
13

Case 1: Separated Streams
Fluentd: Simple matching by tag
<match input.input1>
@type output1
</match>
@type output2
</match>
@type output3
</match>
Logstash: Conditional Outputs
output {
if [type] == “input1” {
output1 {}
} else if [type] == “input2” {
output2 {}
} else if [type] == “input3” {
output3 {}
}
}
Need to split aggregated logs
14

Case 2: Aggregated Streams
Input1
Input2
Input3
Output2
Output3
Output1
● Streams will be aggregated and scattered
15

Case 2: Aggregated Streams
Fluentd: Copy plugins is needed
<match input.*>
@type copy
<store>
@type output1
</store>
<store>
@type output2
</store>
<store>
@type output3
</store>
</match>
Logstash: Quite simple
output {
output1 {}
output2 {}
output3 {}
}
16

Configuration
● Fluentd
○ Routed by simple tag matching
○ Suited to handle log streams separately
● Logstash
○ Logs are aggregated
○ Suited to handle logs in gather-scatter style
17

Plugins
● Both provide many plugins
○ Fluentd: 300+, Logstash: 200+
● Popular plugins are bundled with Logstash
○ They are maintained by the Logstash project
● Fluentd contains only minimal plugins
○ Most plugins are maintained by individuals
● Plugins can be installed easily by one command
18

Performance
● Depends on circumstances
● More than enough for OpenStack logs
○ Both can handle 10000+ logs/s
● Applying heavy filters is not a good idea
● CRuby is slow because of GVL?
○ GVL: Global VM (Interpreter) Lock
○ It’s not true for IO bound loads
19

GVL on IO bound loads
● IO operation can be performed in parallel
20
Thread 1 Thread 2
Idle :
User Space:
Kernel Space:
Actual Read/Write
Ruby Code Execution
GVL Released/
Acquired
IO operations
in parallel

Transport Protocol
● Both collectors have their own transport protocol.
○ Failure Detection and Fallback
● Logstash: Lumberjack protocol
○ Active-Standby only
● Fluentd: forward protocol
○ Active-Active (Load Balancing), Active-Standby
○ Some additional features
21

Logstash Transport: lumberjack
● Active-Standby lumberjack { #config@source
hosts => [
“primary”,
“secondary”
]
port => 1234
ssl_certificate => …
}
primary
secondary
source
secondary is used
when primary fails
Fail
Fallback
22

Fluentd Transport: forward
● Active-Active
(Load Balancing)
<match openstack.*>
type forward
<server>
host dest1
</server>
<server>
host dest2
</server>
</match>
source dest1
dest2
Equally balanced
outputs
23

● Active-Standby <match openstack.*>
type forward
<server>
host primary
</server>
<server>
host secondary
standby
</server>
</match>
primary
secondary
source
Fail
Fallback
24

● Weighted Load Balancing
<match openstack.*>
type forward
<server>
host dest1
weight 60
</server>
<server>
host dest2
weight 40
</server>
</match>
source dest1
dest2
60%
40%
25

● At-least-one Semantics
(may affect performance)
<match openstack.*>
type forward
require_ack_response
<server>
host dest
</server>
</match>
destsource
send logs
ACK
Logs are re-transmitted
until ACK is received
26

Transport Protocol
● Both can be configured as Active-Standby mode.
● Fluentd has great features:
○ Active-Active Mode (Load Balancing)
○ At-least-one Semantics
○ Weighted Load Balancing
27

Forwarders
● Fluentd/Logstash have their own “forwarders”
○ Lightweight implementation written in Golang
○ Low memory consumption
○ One binary: Less dependent and easy to install
28
Node
Tail log files
Forwarder
Log AggregatorForward/
Lumberjack
Protocol

Forwarders: Config Example
fluentd-forwarder:
[fluentd-forwarder]
to = fluent://fluentd1:24224
to = fluent://fluentd2:24224
logstash-forwarder:
"network": {
"servers": [
"logstash1:5043",
"logstash2:5043"
]
}Always send logs to both servers.
Pick one active server and send logs only to it.
Fallback to another server on failure. 29

Integration with OpenStack
● Tail log files by local Fluentd/Logstash
○ must parse many form of log files
● Rsyslog
○ installed by default in most distribution
○ can receive logs in JSON format
● Direct output from oslo_log
○ oslo_log: logging library used by components
○ Logging without any parsing 30

Log
Aggregators
OpenStack nodes
Tail Log Files
31
Tail log files
Forward Protocol
dest1
dest2

Tail Log Files
• Must handle many log files…
syslog
kern.log
apache2/access.log
apache2/error.log
keystone/keystone-all.log
keystone/keystone-manage.log
keystone/keystone.log
cinder/cinder-api.log
cinder/cinder-scheduler.log
neutron/neutron-server.log
neutron/neutron-server.log
nova/nova-api.log
nova/nova-conductor.log
nova/nova-consoleauth.log
nova/nova-manage.log
nova/nova-novncproxy.log
nova/nova-scheduler.log
mysql/error.log
mysql/mysql-slow.log
mysql.log
mysql.err
nova/nova-compute.log
nova/nova-manage.log...
32

Tail Log Files
• But you can use wildcard
Fluentd:
<source>
type tail
path /var/log/nova/*.log
tag openstack.nova
</source>
Logstash:
input {
file {
path => [“/var/log/nova/*.log”]
}
}
33

Parse Text Log
● Welcome to regular expression hell!
<source>
type tail # or syslog
path /var/log/nova/nova-api.log
format /^(?<asctime>.+) (?<process>d+) (?<loglevel>w+) (?
<objname>S+)( [(-|(?<request_id>.+?) (?<user_identity>.+))])?
((?<remote>S*) "(?<method>S+) (?<path>[^"]*) S*?" status: (?
<code>d*) len: (?<size>d*) time: (?<res_time>S)|(?<message>.
*))/
</source>
34

Log
Aggregators
OpenStack nodes
Rsyslog
35
via /dev/log
Syslog Protocol
(TCP or UDP)
rsyslog

Rsyslog: Logging.conf
● Logging Configuration in detail
● Handler: Syslog, Formatter: JSON
# /etc/{nova,cinder…}/logging.conf
[handler_syslog]
class = handlers.SysLogHandler
args = ('/dev/log', handlers.SysLogHandler.LOG_LOCAL1)
formatter = json
[formatter_json]
class = oslo_log.formatters.JSONFormatter 36

Example Output: JSONFormatter
{
"levelname": "INFO",
"funcname": "start",
"message": "Starting conductor node (version 13.0.0)",
"msg": "Starting %(topic)s node (version %(version)s)",
"asctime": "2015-09-29 18:29:57,690",
"relative_created": 2454.8499584198,
"process": 25204,
"created": 1443518997.690932,
"thread": 140119466896752,
"name": "nova.service",
"process_name": "MainProcess",
"thread_name": "GreenThread-1",
...
37

Syslog Facilities
● Assignment of local0..7 Facilities for components
● Logs are tagged as like “syslog.local0” in Fluentd
● Example:
○ local0: Keystone
○ local1: Nova
○ local2: Cinder
○ local3: Neutron
○ local4: Glance
38

Rsyslog: Config@OpenStack nodes
● Active-Standby Configuration
# /etc/rsyslog.d/rsyslog.conf
user.* @@primary:5140
$ActionExecOnlyWhenPreviousIsSuspended on
&@@secondary:5140
39

Rsyslog: Config@Aggregator
Fluentd:
<source>
type syslog
port 5140
protocol_type tcp
format json
tag syslog
</source>
Logstash:
input {
syslog {
codec => json
port => 5140
}
} Listen on both TCP and UDP
Specify TCP or UDP 40

Rsyslog: Config@Aggregator
Fluentd:
<source>
type syslog
port 5140
protocol_type tcp
format json
tag syslog
</source>
Logstash:
input {
syslog {
codec => json
port => 5140
}
}
41

Log
AggregatorsOpenStack nodes
42
via FluentHandler
Forward Protocol
Direct output from oslo_log
Local Fluentd for buffering/load balancing
(Logstash also can be used)

Direct output from oslo_log
# logging.conf:
[handler_fluent]
class = fluent.handler.FluentHandler # fluent-logger
formatter = fluent
args = (’openstack.nova', 'localhost', 24224)
[formatter_fluent]
class = fluent.handler.FluentFormatter # our Blueprint
43
Format logs as Dictionary

Our BP in oslo_log: FluentFormatter
{
"hostname":"allinone-vivid",
"extra":{"project":"unknown","version":"unknown"},
"process_name":"MainProcess",
"module":"wsgi",
"message":"(4132) wsgi starting up on http://0.0.0.0:8774/",
"filename":"wsgi.py",
"name":"nova.osapi_compute.wsgi.server",
"level":"INFO",
"traceback":null,
"funcname":"server",
"time":"2015-10-15 10:09:12,255"
}
Don’t need to parse!
44

Conclusion
● Log Handling
○ Fluentd: Logs are distinguished by tag
○ Logstash: No tags. Logs are aggregated
● Transport Protocol
○ Both supports active-standby mode
○ Fluentd supports some additional features
■ Client-side load balancing (Active-Active)
■ At-least-one semantics
■ Weighted load balancing 45

Conclusion
● Integration with OpenStack
○ Tail log files: regular expression hell
○ Rsyslog: No agents are needed
○ Direct output from oslo_log w/o any parsing
○ Review is welcome for our Blueprint
(oslo_log: fluent-formatter)
46

Thank you!
Please visit our booth!
Robot Racing over WebRTC! →

Fluentd vs. Logstash for OpenStack Log Management

In this document

More Related Content

What's hot

Viewers also liked

Similar to Fluentd vs. Logstash for OpenStack Log Management

More from NTT Communications Technology Development

Recently uploaded

Fluentd vs. Logstash for OpenStack Log Management