Heka Debugging

Материал из noname.com.ua
Перейти к навигацииПерейти к поиску

Heka Debugging

From time to time something went wrong with any software so there are a few words about debugging.

Save data to file

The easiest way to debug is to configure FileOutput and save all data in debug file.
To encode output is used RstEncoder

To enable such debugging it is necessary to add config file, e.g /etc/lma_collector/00_debug.toml
Attention: this configuration is dangerous because it will create I/O load. Please be careful with debugging on production!

[RstEncoder]

[output_file]
type = "FileOutput"
message_matcher = "Fields[aggregator] == NIL"
path = "/var/log/heka-debug.log"
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "RstEncoder"

Example of output:


tail -100 /var/log/heka-debug.log
... <SKIP> ...
:Timestamp: 2016-01-31 16:58:42.232999936 +0000 UTC
:Type: metric
:Hostname: node-6
:Pid: 5505
:Uuid: 2a73fa6f-21fc-40b0-acf9-7fe96bcb9b80
:Logger: collectd
:Payload: {"type":"gauge","values":[0],"type_instance":"q-agent-notifier-dvr-update_fanout_57e5b132e51840ebb6.messages","meta":{"0":true},"dsnames":["value"],"plugin":"rabbitmq_info","time":1454259522.233,"interval":10,"host":"node-6","dstypes":["gauge"],"plugin_instance":""}
:EnvVersion:
:Severity: 6
:Fields:
    | name:"type" type:string value:"gauge"
    | name:"source" type:string value:"rabbitmq_info"
    | name:"openstack_release" type:string value:"2015.1.0-7.0"
    | name:"deployment_mode" type:string value:"ha_compact"
    | name:"openstack_roles" type:string value:"primary-controller"
    | name:"queue" type:string value:"q-agent-notifier-dvr-update_fanout_57e5b132e51840ebb6"
    | name:"tag_fields" type:string value:"queue"
    | name:"openstack_region" type:string value:"RegionOne"
    | name:"name" type:string value:"rabbitmq_queue_messages"
    | name:"hostname" type:string value:"node-6"
    | name:"value" type:double value:0
    | name:"deployment_id" type:string value:"3"
    | name:"environment_label" type:string value:"test2"
    | name:"interval" type:double value:10


:Timestamp: 2016-01-31 16:58:42 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: bbc94442-67c8-4445-acf8-341b14c2d43f
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:1212
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 16:58:42 +0000 UTC
:Type: log
:Hostname: node-6
:Pid: 11905
:Uuid: 36cfae57-cb3e-4777-a886-60a351fb230c
:Logger: pacemaker
:Payload: cib_process_ping: 	Reporting our current digest to node-6.domain.tld: 70954c66c71f324b3851af4d92ef93dd for 0.90.152 (0x1c12960 0)

:EnvVersion:
:Severity: 7
:Fields:
    | name:"deployment_mode" type:string value:"ha_compact"
    | name:"environment_label" type:string value:"test2"
    | name:"openstack_region" type:string value:"RegionOne"
    | name:"deployment_id" type:string value:"3"
    | name:"severity_label" type:string value:"DEBUG"
    | name:"programname" type:string value:"cib"
    | name:"openstack_roles" type:string value:"primary-controller"
    | name:"openstack_release" type:string value:"2015.1.0-7.0"

Debug data filtering

To avoid I/O overload if is possible to filter messages. To filter data only from test_decoder and save it into the /var/log/heka-debug-test-decoder.log file is used the following configuration:

01_debug_test_decoder_data.toml

[output_file]
type = "FileOutput"
message_matcher = "Type == 'test_decoder'"
path = "/var/log/heka-debug-test-decoder.log"
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "RstEncoder"

Remark: depends on your current Heka configuration you may need to enable RstEncoder. In this example it is already enabled in 00_debug.toml file.

Example of output:

:Timestamp: 2016-01-31 17:07:16 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 38ffa77b-8f5c-4f73-aba5-f5de1f15f0b8
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:29
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 17:07:17 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 1545163c-cc97-46aa-aba8-431ab6ca3338
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:30
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 17:07:18 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 5da75795-63fd-48f8-ac52-44db368f54ed
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:31
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08

If Heka can't start

If Heka is not able to start it is possible to run it manually:

  • Disable resorce in CRM
    • Run crm resource command:
#crm resource crm(live)resource# list ... <SKIP> ... Clone Set: clone_lma_collector [lma_collector] Started: [ node-6.domain.tld ]
    • Stop lma_ollector resource:
#crm(live)resource# stop lma_collector crm(live)resource# stop clone_lma_collector
    • Check lma_collector resource status. It should be stopped.
#crm(live)resource# list Clone Set: clone_lma_collector [lma_collector] Stopped: [ node-6.domain.tld ]

Now it is safe to run Heka manually.

  • Create screen session:
#screen


  • In screeen section switch into heka user
#sudo -H -u heka /bin/bash
$hekad -config /etc/lma_collector/ 2016/01/31 17:23:39 Error reading config: open /etc/lma_collector/01_debug_test_decoder_data.toml: permission denied


  • As you can see in example above error is related to file permissions:
# ls -lsa /etc/lma_collector/01_debug_test_decoder_data.toml 4 -rwx------ 1 root root 199 Jan 31 17:06 /etc/lma_collector/01_debug_test_decoder_data.toml
  • File is owned by root and can't be read by heka. To fix it it is nessesary to fix owner or permittions
#chown heka:heka /etc/lma_collector/01_debug_test_decoder_data.toml
  • Start heka again:
$ hekad -config /etc/lma_collector/ 2016/01/31 17:28:13 Pre-loading: [RstEncoder] 2016/01/31 17:28:13 Pre-loading: [output_file] 2016/01/31 17:28:13 Pre-loading: [output_file] 2016/01/31 17:28:13 Pre-loading: [openstack_error_amqp] 2016/01/31 17:28:13 Pre-loading: [openstack_info_amqp] 2016/01/31 17:28:13 Pre-loading: [openstack_warn_amqp] 2016/01/31 17:28:13 Pre-loading: [collectd_decoder] 2016/01/31 17:28:13 Pre-loading: [http-check_decoder] 2016/01/31 17:28:13 Pre-loading: [keystone_7_0_decoder] 2016/01/31 17:28:13 Pre-loading: [keystone_wsgi_decoder] 2016/01/31 17:28:13 Pre-loading: [mm_decoder] 2016/01/31 17:28:13 Pre-loading: [mysql_decoder] 2016/01/31 17:28:13 Pre-loading: [notification_decoder] 2016/01/31 17:28:13 Pre-loading: [openstack_decoder] 2016/01/31 17:28:13 Pre-loading: [ovs_decoder] 2016/01/31 17:28:13 Pre-loading: [pacemaker_decoder] 2016/01/31 17:28:13 Pre-loading: [rabbitmq_decoder] 2016/01/31 17:28:13 Pre-loading: [swift_decoder] 2016/01/31 17:28:13 Pre-loading: [system_decoder] 2016/01/31 17:28:13 Pre-loading: [elasticsearch_encoder] 2016/01/31 17:28:13 Pre-loading: [influxdb_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_afd_nodes_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_afd_nodes_encoder_debug] 2016/01/31 17:28:13 Pre-loading: [nagios_gse_global_clusters_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_gse_node_clusters_encoder] 2016/01/31 17:28:13 Pre-loading: [afd_api_backends_filter] 2016/01/31 17:28:13 Pre-loading: [afd_api_endpoints_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_cpu_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_log-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_root-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_mysql-nodes_mysql-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_apache_worker_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_cinder-api_http_errors_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_glance-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_heat-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_keystone-admin-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_keystone-public-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_mysql_node-status_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_neutron-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_nova-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_disk_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_memory_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_queue_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_swift-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_workers_filter] 2016/01/31 17:28:14 Pre-loading: [gse_global_filter] 2016/01/31 17:28:14 Pre-loading: [gse_node_filter] 2016/01/31 17:28:14 Pre-loading: [gse_service_filter] 2016/01/31 17:28:14 Pre-loading: [heka_monitoring_filter] 2016/01/31 17:28:14 Pre-loading: [http_metrics_filter] 2016/01/31 17:28:14 Pre-loading: [influxdb_accumulator_filter] 2016/01/31 17:28:14 Pre-loading: [influxdb_annotation_filter] 2016/01/31 17:28:14 Pre-loading: [instance_state_filter] 2016/01/31 17:28:14 Pre-loading: [resource_creation_time_filter] 2016/01/31 17:28:14 Pre-loading: [service_heartbeat_filter] 2016/01/31 17:28:14 Pre-loading: [collectd_httplisten] 2016/01/31 17:28:14 Pre-loading: [http-check_httplisten] 2016/01/31 17:28:14 Pre-loading: [aggregator_tcpinput] 2016/01/31 17:28:14 Pre-loading: [keystone_7_0_logstreamer] 2016/01/31 17:28:14 Pre-loading: [keystone_wsgi_logstreamer] 2016/01/31 17:28:14 Pre-loading: [test_input] 2016/01/31 17:28:14 Pre-loading: [mysql_logstreamer] 2016/01/31 17:28:14 Pre-loading: [openstack_7_0_logstreamer] 2016/01/31 17:28:14 Pre-loading: [openstack_dashboard_logstreamer] 2016/01/31 17:28:14 Pre-loading: [ovs_logstreamer] 2016/01/31 17:28:14 Pre-loading: [pacemaker_logstreamer] 2016/01/31 17:28:14 Pre-loading: [rabbitmq_logstreamer] 2016/01/31 17:28:14 Pre-loading: [swift_logstreamer] 2016/01/31 17:28:14 Pre-loading: [system_logstreamer] 2016/01/31 17:28:14 Pre-loading: [aggregator_decoder] 2016/01/31 17:28:14 Pre-loading: [aggregator_tcpoutput] 2016/01/31 17:28:14 Pre-loading: [DashboardOutput] 2016/01/31 17:28:14 Pre-loading: [elasticsearch_output] 2016/01/31 17:28:14 Pre-loading: [influxdb_output] 2016/01/31 17:28:14 Pre-loading: [nagios_afd_nodes_output] 2016/01/31 17:28:14 Pre-loading: [nagios_afd_nodes_output_file] 2016/01/31 17:28:14 Pre-loading: [nagios_gse_global_clusters_output] 2016/01/31 17:28:14 Pre-loading: [nagios_gse_node_clusters_output] 2016/01/31 17:28:14 Pre-loading: [aggregator_flag_decoder] 2016/01/31 17:28:14 Pre-loading: [openstack_splitter] 2016/01/31 17:28:14 Pre-loading: [rabbitmq_splitter] 2016/01/31 17:28:14 Pre-loading: [ProtobufDecoder] 2016/01/31 17:28:14 Loading: [ProtobufDecoder] 2016/01/31 17:28:14 Pre-loading: [ProtobufEncoder] 2016/01/31 17:28:14 Loading: [ProtobufEncoder] 2016/01/31 17:28:14 Pre-loading: [TokenSplitter] 2016/01/31 17:28:14 Loading: [TokenSplitter] 2016/01/31 17:28:14 Pre-loading: [HekaFramingSplitter] 2016/01/31 17:28:14 Loading: [HekaFramingSplitter] 2016/01/31 17:28:14 Pre-loading: [NullSplitter] 2016/01/31 17:28:14 Loading: [NullSplitter] 2016/01/31 17:28:14 Loading: [collectd_decoder] 2016/01/31 17:28:14 Loading: [http-check_decoder] 2016/01/31 17:28:14 Loading: [keystone_7_0_decoder] 2016/01/31 17:28:14 Loading: [keystone_wsgi_decoder] 2016/01/31 17:28:14 Loading: [mm_decoder] 2016/01/31 17:28:14 Loading: [mysql_decoder] 2016/01/31 17:28:14 Loading: [notification_decoder] 2016/01/31 17:28:14 Loading: [openstack_decoder] 2016/01/31 17:28:14 Loading: [ovs_decoder] 2016/01/31 17:28:14 Loading: [pacemaker_decoder] 2016/01/31 17:28:14 Loading: [rabbitmq_decoder] 2016/01/31 17:28:14 Loading: [swift_decoder] 2016/01/31 17:28:14 Loading: [system_decoder] 2016/01/31 17:28:14 Loading: [aggregator_flag_decoder] 2016/01/31 17:28:14 Loading: [aggregator_decoder] 2016/01/31 17:28:14 Loading: [RstEncoder] 2016/01/31 17:28:14 Loading: [elasticsearch_encoder] 2016/01/31 17:28:14 Loading: [influxdb_encoder] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_encoder] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_encoder_debug] 2016/01/31 17:28:14 Loading: [nagios_gse_global_clusters_encoder] 2016/01/31 17:28:14 Loading: [nagios_gse_node_clusters_encoder] 2016/01/31 17:28:14 Loading: [openstack_splitter] 2016/01/31 17:28:14 Loading: [rabbitmq_splitter] 2016/01/31 17:28:14 Loading: [openstack_error_amqp] 2016/01/31 17:28:14 Loading: [openstack_info_amqp] 2016/01/31 17:28:14 Loading: [openstack_warn_amqp] 2016/01/31 17:28:14 Loading: [collectd_httplisten] 2016/01/31 17:28:14 Loading: [http-check_httplisten] 2016/01/31 17:28:14 Loading: [aggregator_tcpinput] 2016/01/31 17:28:14 Loading: [keystone_7_0_logstreamer] 2016/01/31 17:28:14 Loading: [keystone_wsgi_logstreamer] 2016/01/31 17:28:14 Loading: [test_input] 2016/01/31 17:28:14 Loading: [mysql_logstreamer] 2016/01/31 17:28:14 Loading: [openstack_7_0_logstreamer] 2016/01/31 17:28:14 Loading: [openstack_dashboard_logstreamer] 2016/01/31 17:28:14 Loading: [ovs_logstreamer] 2016/01/31 17:28:14 Loading: [pacemaker_logstreamer] 2016/01/31 17:28:14 Loading: [rabbitmq_logstreamer] 2016/01/31 17:28:14 Loading: [swift_logstreamer] 2016/01/31 17:28:14 Loading: [system_logstreamer] 2016/01/31 17:28:14 Loading: [afd_api_backends_filter] 2016/01/31 17:28:14 Loading: [afd_api_endpoints_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_cpu_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_log-fs_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_root-fs_filter] 2016/01/31 17:28:14 Loading: [afd_node_mysql-nodes_mysql-fs_filter] 2016/01/31 17:28:14 Loading: [afd_service_apache_worker_filter] 2016/01/31 17:28:14 Loading: [afd_service_cinder-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_glance-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_heat-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_keystone-admin-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_keystone-public-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_mysql_node-status_filter] 2016/01/31 17:28:14 Loading: [afd_service_neutron-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_nova-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_disk_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_memory_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_queue_filter] 2016/01/31 17:28:14 Loading: [afd_service_swift-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_workers_filter] 2016/01/31 17:28:14 Loading: [gse_global_filter] 2016/01/31 17:28:14 Loading: [gse_node_filter] 2016/01/31 17:28:14 Loading: [gse_service_filter] 2016/01/31 17:28:14 Loading: [heka_monitoring_filter] 2016/01/31 17:28:14 Loading: [http_metrics_filter] 2016/01/31 17:28:14 Loading: [influxdb_accumulator_filter] 2016/01/31 17:28:14 Loading: [influxdb_annotation_filter] 2016/01/31 17:28:14 Loading: [instance_state_filter] 2016/01/31 17:28:14 Loading: [resource_creation_time_filter] 2016/01/31 17:28:14 Loading: [service_heartbeat_filter] 2016/01/31 17:28:14 Loading: [output_file] 2016/01/31 17:28:14 Loading: [output_file] 2016/01/31 17:28:14 Loading: [aggregator_tcpoutput] 2016/01/31 17:28:14 Loading: [DashboardOutput] 2016/01/31 17:28:14 Loading: [elasticsearch_output] 2016/01/31 17:28:14 Loading: [influxdb_output] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_output] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_output_file] 2016/01/31 17:28:14 Loading: [nagios_gse_global_clusters_output] 2016/01/31 17:28:14 Loading: [nagios_gse_node_clusters_output] 2016/01/31 17:28:14 Starting hekad... 2016/01/31 17:28:14 Output started: output_file 2016/01/31 17:28:14 Output started: aggregator_tcpoutput 2016/01/31 17:28:14 Output started: DashboardOutput 2016/01/31 17:28:14 Output started: influxdb_output 2016/01/31 17:28:14 Output started: nagios_afd_nodes_output 2016/01/31 17:28:14 Output started: elasticsearch_output 2016/01/31 17:28:14 Output started: nagios_afd_nodes_output_file 2016/01/31 17:28:14 Output started: nagios_gse_global_clusters_output 2016/01/31 17:28:14 Output started: nagios_gse_node_clusters_output 2016/01/31 17:28:14 Filter started: afd_api_endpoints_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_root-fs_filter 2016/01/31 17:28:14 Filter started: afd_service_cinder-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_keystone-admin-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_mysql_node-status_filter 2016/01/31 17:28:14 Filter started: afd_service_nova-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_memory_filter 2016/01/31 17:28:14 Filter started: gse_service_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_disk_filter 2016/01/31 17:28:14 Filter started: influxdb_annotation_filter 2016/01/31 17:28:14 Filter started: influxdb_accumulator_filter 2016/01/31 17:28:14 Filter started: afd_api_backends_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_cpu_filter 2016/01/31 17:28:14 Filter started: afd_service_swift-api_http_errors_filter 2016/01/31 17:28:14 Filter started: service_heartbeat_filter 2016/01/31 17:28:14 Filter started: afd_service_apache_worker_filter 2016/01/31 17:28:14 Filter started: afd_service_heat-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_neutron-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_queue_filter 2016/01/31 17:28:14 Filter started: http_metrics_filter 2016/01/31 17:28:14 Filter started: afd_workers_filter 2016/01/31 17:28:14 Filter started: heka_monitoring_filter 2016/01/31 17:28:14 Filter started: instance_state_filter 2016/01/31 17:28:14 Filter started: resource_creation_time_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_log-fs_filter 2016/01/31 17:28:14 Filter started: afd_node_mysql-nodes_mysql-fs_filter 2016/01/31 17:28:14 Filter started: gse_node_filter 2016/01/31 17:28:14 Filter started: afd_service_glance-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_keystone-public-api_http_errors_filter 2016/01/31 17:28:14 Filter started: gse_global_filter 2016/01/31 17:28:14 MessageRouter started. 2016/01/31 17:28:14 Input started: test_input 2016/01/31 17:28:14 Input started: openstack_7_0_logstreamer 2016/01/31 17:28:14 Input started: pacemaker_logstreamer 2016/01/31 17:28:14 Input started: rabbitmq_logstreamer ... <SKIP> ...

So heka was successfully started and you can see debug messages.
To detach from screen use ctr-a d shortcut key combination.