YDB Syslog
Overview
YDBSyslog is a YottaDB plugin to capture syslog data in a YottaDB database, to allow for more sophisticated analytics, forensics, and troubleshooting, for example by using Octo. Furthermore, by consolidating the syslogs of several systems in a single database, queries can run on data that cuts across multiple systems, e.g., to investigate concurrent events.
It operates in two modes to ingest data in the journalctl --output=export
format:
By running
journalctl --follow
in a PIPE device, YDBSyslog can continuously ingest syslog entries in real time.Reading a
journalctl
export from stdin. Reading fromjournalctl --output=export --follow
in a pipe is effectively the same as reading from a PIPE device using the--follow
option.
YDBSyslog can output a DDL which Octo will accept, allow the syslog databaase to be queried using SQL.
Quickstart
As a YottaDB plugin, YDBSyslog requires YottaDB. You can install YottaDB and YDBSyslog together:
mkdir /tmp/tmp ; wget https://gitlab.com/YottaDB/DB/YDB/raw/master/sr_unix/ydbinstall.sh
cd /tmp/tmp ; chmod +x ydbinstall.sh
sudo ./ydbinstall.sh --utf8 --syslog
Although you can omit the --utf8
option if you do not want UTF-8 support installed, we recommend installing UTF-8 support as syslogs can include UTF-8 characters. If you already have YottaDB installed, use sudo $ydb_dist/ydbinstall --syslog --plugins-only --overwrite-existing
to install or reinstall the YDBSyslog plugin without reinstalling YottaDB.
Installation
If you don't use the Quickstart method, you can install YDBSyslog from source. In addition to YottaDB and its requirements, YDBSyslog requires cmake
, git
, make
, and pkg-config
. Clone the YDBSyslog repository, and then install the plugin, using the following commands:
git clone https://gitlab.com/YottaDB/Util/YDBSyslog.git YDBSyslog-master
cd YDBSyslog-master
mkdir build && cd build
cmake ..
make && sudo make install
Usage
The most common usage of YDBSyslog is to run %YDBSYSLOG from the shell.
yottadb -run %YDBSYSLOG op [options]
Where op
and [options]
are:
help
- Output options to use this program.ingestjnlctlcmd [options]
- Run thejournalctl --output=export
command in a PIPE. Options are as follows; all options may be omitted.--boot [value]
---boot
is mutually exclusive with--follow
. There are several cases ofvalue
:If omitted, the
--boot
parameter is omitted when invokingjournalctl
. This ingests the syslog from the current boot.If a hex string prefixed with
0x
, the string sans prefix is passed tojournalctl --boot
.If a decimal number, it is passed unaltered to
journalctl --boot
.If a case-independent
all
, that option is passed tojournalctl --boot
.
--follow
is mutually exclusive with--boot
. The--follow
option is used to invokejournalctl --follow
, and results in %YDBSYSLOG running as a daemon to continuously ingest the syslog exported byjournalctl
.--moreopt
indicates that the rest of the command line should be passed verbatim to thejournalctl
command as additional options. See the Linux commandman journalctl
for details. YDBSyslog does no error checking of these additional options.
ingestjnlctlfile
– readjournalctl --output=export
formatted data from stdin.octoddl
- output an Octo DDL to allow analysis of syslog data using SQL. If the database combines syslog data from multiple systems, Octo SQL queries can span systems.
The following M entryrefs can called directly from programs.
INGESTJNLCTLCMD^%YDBSYSLOG(boot,follow,moreopt)
runsjournalctl --output=export
in a PIPE device. Parameters are:boot
is the parameter for the--boot
command line option ofjournalctl
. There are several cases:If unspecified or the empty string, the
--boot
option is omitted.If a hex string prefixed with
"0x"
, the string sans prefix is passed tojournalctl
as the value.If a decimal number, it is passed unaltered to
journalctl
.If a case-independent
"all"
, that option is passed tojournalctl
.
If
follow
is non-zero, INGESTJNLCTLCMD follows journalctl, continuously logging syslog output in the database.boot
andfollow
are mutuially exclusive.moreopt
is a string intended to be passed verbatim to the journalctl command. See the Linux commandman journalctl
for details. INGESTJNLCTMCMD does no error checking of these additional options.
INGESTJNLCTLFILE^%YDBSYSLOG
readsjnlctl --output=export
formatted data from stdin.OCTODDL^%YDBSYSLOG([scanflag])
generates the DDL that can be fed to Octo to query the ingested syslog data using SQL. Ifscanflag
evaluates to 1, the routine scans the database for additional fields beyond those indentified in the code.
Data are stored in nodes of ^%ydbSYSLOG
with the following subscripts, which are reverse engineered from the __CURSOR
field of the journalctl
export format. While __CURSOR
is documented as opaque, reverse engineering provides a more compact database and faster access:
Cs
– a UUID for a large number of syslog records.Cb
– evidently a boot UUID.Ci
- evidently the record number in a syslog.Ct
- evidently the number of microseconds since the UNIX epoch.Cm
– evidently a monolithic timestamp since boot.Cx
- a UUID that is unique to each syslog entry.
Fields that journalctl
has been found to flag as binary, e.g., "MESSAGE"
and "SYSLOG_RAW"
have an additional, seventh, subscript, the tag for the field.
Note that since querying syslog entries is content based (e.g., the USER_ID field) and not by the subscripts, if the reverse engineering of __CURSOR
is imperfect, or if a future systemd-journald
changes the fields, it will not affect the correctness of queries; it will only incrementally increase database size and consequently access speed (smaller databases are marginally faster).
The numerous fields exported by journalctl
are not well documented. Systemd Journal Export Formats is helpful, as is man systemd.journal-fields. However, outside the source code, there does not appear to be a comprehensive list of all fields. The fields listed in the _YDBSYSLOG.m
source code were captured from a couple dozen Linux systems running releases and derivatives of Arch Linux, Debian GNU/Linux, Red Hat Enterprise Linux, SUSE Linux Enterprise, and Ubuntu. Even if journalctl
exports additional fields not identified, %YDBSYSLOG captures them, and generates reasonable DDL entries for them.
Should you find additional entries not identified by the _YDBSYSLOG.m
source code, please create an Issue or a Merge Request in the YottaDB project.
Syslog from multiple systems
Although there are many ways to script gathering data from multiple systems using %YDBSYSLOG, the program UseYDBSyslog is a sample script you can use. After reading the comments in the file UseYDBSyslog.txt:
Edit the file
UseYDBSyslog.txt
to replace the sample loghost name, server names, and starting TCP port with the specific values for your environment.Save the file as
UseYDBSyslog.m
on the loghost and on each server in a location where YottaDB can execute it.To use it, first start it on the loghost, and then on each server, and confirm that the two port numbers reported by the loghost for each server match those the server reports.
To collect all syslogs from all servers, intially, start it with
yottadb -run %XCMD 'do ^UseYDBSyslog(1)'
. Subsequently, a simpleyottadb -run UseYDBSyslog
suffices to capture syslogs from the current boot.To collect all syslogs from all servers starting at a specific time, pass the time as the third parameter, e.g.,
yottadb -run %XCMD 'do ^UseYDBSyslog(,,,"--since=""2023-08-13 14:04""")'
.
The default configuration of UseYDBSyslog creates an unjournaled database that uses the MM access method. If you use journaling for recoverability, remember to monitor space used by prior generation journal files, and to delete those old journal files when they are no longer needed.