Telegraf

InfluxDB monitoring and integration with ZabbixTelegraf is a server-based agent for collecting all kinds of metrics for further processing. It’s a piece of software that you can install anywhere in your infrastructure and it will read metrics from specified sources – typically application logs, events, or data outputs

Step 1: Install Telegraf

Add the InfluxDB GPG key to trusted repository

# get -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -

Add the InfluxDB package repository to the apt sources /etc/apt/sources.list.d/influxdb.list

# echo "deb https://repos.influxdata.com/debian $(lsb_release -cs)

Install the telegraf software pacakge

# apt install -y telegraf

Copy the telegraf sample configuration file and place appropriate permissions on it (as it will contain passwords). Or in some cases just only add sections from the sample file you need….

# cp /etc/telegraf/telegraf.conf.sample /etc/telegraf/telegraf.conf
# chmod 640 /etc/telegraf/telegraf.conf
# chown root:telegraf /etc/telegraf/telegraf.conf

Step 2: Define Basic Configuration

Define some basic Telegraf settings edit /etc/telegraf/telegraf.conf :

[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Collection offset is used to shift the collection by the given amount.
  ## This can be be used to avoid many plugins querying constraint devices
  ## at the same time by manually scheduling them in time.
  # collection_offset = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  # quiet = true
  ## Collected metrics are rounded to the precision specified. Precision is
  ## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  ##
  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s:
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ##
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  precision = "0s"

  ## Log target controls the destination for logs and can be one of "file",
  ## "stderr" or, on Windows, "eventlog".  When set to "file", the output file
  ## is determined by the "logfile" setting.
  logtarget = "file"

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  logfile = "/var/log/telegraf/telegraf.log"

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  # logfile_rotation_interval = "0h"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  logfile_rotation_max_size = "10MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  logfile_rotation_max_archives = 5

  ## Pick a timezone to use when logging or type 'local' for local time.
  ## Example: America/Chicago
  log_with_timezone = "Europe/Amsterdam"

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

  ## Method of translating SNMP objects. Can be "netsnmp" which
  ## translates by calling external programs snmptranslate and snmptable,
  ## or "gosmi" which translates using the built-in gosmi library.
  ## netsnmp will be deprecated in future releases
  snmp_translator = "gosmi"
 

Step 3: Output InfluxDB

Generate host certificates for this specific service and place them in the system SSL repository with appropriate permissions:

# mkdir /etc/telegraf/ssl

# cp <service-key> /etc/telegraf/ssl/telegraf.key
# cp <service-cert> /etc/telegraf/ssl/telegraf.pem

# chown root:influxdb /etc/telegraf/ssl/telegraf.key
# chown root:influxdb /etc/telegraf/ssl/telegraf.pem

# chmod 640 /etc/telegraf/ssl/telegraf.key
# chmod 644 /etc/telegraf/ssl/telegraf.pem

Define for Telegraf an InfluxDB output databbase, edit /etc/telegraf/telegraf.conf and define InfluxDB section:

[[outputs.influxdb]]
#   ## The full HTTP or UDP URL for your InfluxDB instance.
#   ##
#   ## Multiple URLs can be specified for a single cluster, only ONE of the
#   ## urls will be written to each interval.
#   # urls = ["unix:///var/run/influxdb.sock"]
#   # urls = ["udp://127.0.0.1:8089"]
  urls = ["https://influxdb.mydomain.org:8086"]
#
#   ## The target database for metrics; will be created as needed.
#   ## For UDP url endpoint database needs to be configured on server side.
  database = "telegraf"
#
#   ## The value of this tag will be used to determine the database.  If this
#   ## tag is not set the 'database' option is used as the default.
#   # database_tag = ""
#
#   ## If true, the 'database_tag' will not be included in the written metric.
#   # exclude_database_tag = false
#
#   ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
#   ## Telegraf with a user without permissions to create databases or when the
#   ## database already exists.
  skip_database_creation = true
#
#   ## Name of existing retention policy to write to.  Empty string writes to
#   ## the default retention policy.  Only takes effect when using HTTP.
  retention_policy = "internal_14d"
#
#   ## HTTP Basic Auth
   username = "telegraf"
   password = "T3CMGTWb7xBzb2aW"
#
#   ## HTTP User-Agent
   user_agent = "telegraf"
#
#   ## TLS Config for use on HTTP connections.
  tls_ca = "/etc/ssl/certs/ca-certificates.crt"
  tls_cert = "/etc/telegraf/ssl/telegraf.pem"
  tls_key = "/etc/telegraf/ssl/telegraf.key"
#   ## Use TLS but skip chain & host verification
  insecure_skip_verify = false

Step 4: Input Collection: InfluxDB

The following step describes how to collect an InfluxDB database information. Create a new collection file /etc/telegraf/telegraf.d/telegraf-input-influxdb.conf:

# Read InfluxDB-formatted JSON metrics from one or more HTTP endpoints
[[inputs.influxdb]]
  ## Works with InfluxDB debug endpoints out of the box,
  ## but other services can use this format too.
  ## See the influxdb plugin's README for more details.

  ## Multiple URLs from which to read InfluxDB-formatted JSON
  ## Default is "http://localhost:8086/debug/vars".
  urls = [
    "https://grafana.ops.cyberfront.org:8086/debug/vars"
  ]

  ## Username and password to send using HTTP Basic Authentication.
  username = "influxadmin"
  password = "fJJzkkdlsjfklMnb2"

  ## Optional TLS Config
  tls_ca = "/etc/ssl/certs/ca-certificates.crt"
  tls_cert = "/etc/telegraf/ssl/telegraf.crt"
  tls_key = "/etc/telegraf/ssl/telegraf.key"
  ## Use TLS but skip chain & host verification
  insecure_skip_verify = false

  ## http request & header timeout
  timeout = "5s"

Restart the telegraf service

# systemctl restart telegraf

Step 5: Input Collection: Apex Neptune

The following step describes how to collect an Apex Neptune aquarium computer. For more information visit this url as its part of the default telegraf sample configuration file. Create a new collection file /etc/telegraf/telegraf.d/telegraf-input-apex.conf:

[[inputs.neptune_apex]]
servers = [ "http://<neptune-apex-ip-address>" ]
#response_timeout = "5s"

###############################################################################
#                            PROCESSOR PLUGINS                                #
###############################################################################

[[processors.enum]]
 namepass = ["neptune_apex"]
 [[processors.enum.mapping]]
  field = "state"
  dest = "state_code"
  default = 0
  [processors.enum.mapping.value_mappings]
   AON = 1
   ON  = 2
   AOF = 3
   OFF = 4

Restart the telegraf service

# systemctl restart telegraf

Step 6: Input Collection: iDRAC

The following step describes how to collect Dell iDRAC metrics. Grafana dashboard and other information can be found at this url. Create a new collection file /etc/telegraf/telegraf.d/telegraf-input-idrac.conf:

#
#   ===============================================================
#   iDRAC
#   ===============================================================


[[processors.regex]]
  [[processors.regex.fields]]
    key = "log-dates"
    pattern = "^(?P\\d{4})(?P\\d{2})(?P\\d{2})(?P\\d{2})(?P\\d{2})(?P\\d{2})\\.(?P\\d{6})(?P[-+]\\d{3,4})$"
    replacement = "${YYYY}-${MM}-${DD} ${HH}:${mm}:${ss}"

[[inputs.snmp]]
  agents = [ "esx-01-idrac:161" , "esx-02-idrac:161" , "esx-03-idrac:161" ]
  version = 1
  community = "cybro"
  name = "idrac-hosts"

  [[inputs.snmp.field]]
     name = "system-name"
     oid  = ".1.3.6.1.2.1.1.5.0"
     is_tag = true

  [[inputs.snmp.field]]
     name = "system-osname"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.6.0"

  [[inputs.snmp.field]]
     name = "system-osversion"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.14.0"

  [[inputs.snmp.field]]
     name = "system-model"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.12.0"

  [[inputs.snmp.field]]
     name = "idrac-url"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.1.6.0"

  [[inputs.snmp.field]]
     name = "power-state"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.4.0"

  [[inputs.snmp.field]]
     name = "system-uptime"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.5.0"

  [[inputs.snmp.field]]
     name = "system-servicetag"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.2.0"

  [[inputs.snmp.field]]
     name = "system-globalstatus"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.1.0"

  [[inputs.snmp.table]]
     name = "idrac-hosts"
     inherit_tags = [ "system-name" , "disks-name" ]

    [[inputs.snmp.table.field]]
       name = "bios-version"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.50.1.8"

    [[inputs.snmp.table.field]]
       name = "raid-batterystate"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4"

    [[inputs.snmp.table.field]]
       name = "intrusion-sensor"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.70.1.6"

    [[inputs.snmp.table.field]]
       name = "disks-mediatype"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.35"

    [[inputs.snmp.table.field]]
       name = "disks-state"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4"

    [[inputs.snmp.table.field]]
       name = "disks-predictivefail"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.31"

    [[inputs.snmp.table.field]]
       name = "disks-capacity"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.11"

    [[inputs.snmp.table.field]]
       name = "disks-name"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.2"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "memory-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.27"

    [[inputs.snmp.table.field]]
       name = "storage-status"
       oid = ".1.3.6.1.4.1.674.10892.5.2.3"

    [[inputs.snmp.table.field]]
       name = "temp-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.63"

    [[inputs.snmp.table.field]]
       name = "psu-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.9"

    [[inputs.snmp.table.field]]
       name = "log-dates"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.8"

    [[inputs.snmp.table.field]]
       name = "log-entry"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.5"

    [[inputs.snmp.table.field]]
       name = "log-severity"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.7"

    [[inputs.snmp.table.field]]
       name = "log-number"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.2"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "nic-name"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.30"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "nic-vendor"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.7"

    [[inputs.snmp.table.field]]
       name = "nic-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.4"

    [[inputs.snmp.table.field]]
       name = "nic-current_mac"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.15"
       conversion = "hwaddr"

  [[inputs.snmp.field]]
     name = "fan1-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.1"

  [[inputs.snmp.field]]
     name = "fan2-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.2"

  [[inputs.snmp.field]]
     name = "fan3-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.3"

  [[inputs.snmp.field]]
     name = "fan4-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.4"

  [[inputs.snmp.field]]
     name = "fan5-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.5"

  [[inputs.snmp.field]]
     name = "fan6-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.6"

  [[inputs.snmp.field]]
     name = "inlet-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.1"

  [[inputs.snmp.field]]
     name = "exhaust-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.2"

  [[inputs.snmp.field]]
     name = "cpu1-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.3"

  [[inputs.snmp.field]]
     name = "cpu2-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.4"

  [[inputs.snmp.field]]
     name = "cmos-batterystate"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.600.50.1.6.1.1"

  [[inputs.snmp.field]]
     name = "system-watts"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.600.30.1.6.1.3"

Restart the telegraf service

# systemctl restart telegraf