5 years of vaporware
These slides represent the work and opinions
and do their fault!
of the author It is not not constitute official
positions of any organization sponsoring the
author’s work It is not my fault!
It not been peer
This material has is your fault! reviewed and
is presented here as-is with the permission of
The author assumes no liability for any
content or opinion expressed in this
presentation and or use of content herein.
Developer (not manager)
◦ Not working with Nagios
Accidentally ended up in our NOC
◦ Hated BB so we migrated to Nagios
2003: The birth of NSClient++
◦ NSClient sucked (Broke Exchange)
◦ NRPE_NT was to much work
2004: The open source of NSClient++
◦ “just for fun”
2007: The rebirth of NSClient++
◦ Got a lot of emails and hits on the webpage
2011: The Present
◦ 0.3.9 out last may
◦ 0.4.0 out as alfa
Windows Monitoring and NSClient++
◦ Quick Introduction
What’s new in 0.3.9
◦ Scheduled Tasks
◦ Crash Handling
What’s new in 0.4.0
◦ New core
◦ Unix support
◦ New settings subsystem
◦ New protocol
◦ Python Scripting
The end of NSClient++!
What is NSClient?
◦ A (pretty old) program
A (pretty limited) protocol
◦ A (pretty incorrect) concept
What is it not?
NSClient++ was written as a replacement for pNSClient
But it has evolved much since then
Decentralized or centralized
Active or Passive
Can monitor “anything” (including your application)
Can perform “tasks” (fix your problems)
Generally complex to use and limited on “standard” hardware
Old, outdated and usually limited functionality
◦ “Agentless” WMI
Enforces centralized and active monitoring
◦ I am biased, so might not want to take my word for it...
Protocol Method Encryption Auth Payload M. args. M. cmds HTTP
NSClient Active No Yes No Yes No No
NRPE Active No No 1024 Yes No No
NSCA Passive Yes Yes 512 Yes Yes No
NRDP Passive Yes Yes ∞ Yes Yes Yes
NSCP Active Yes Yes ∞ Yes Yes Yes
DNSCP MQ No Yes ∞ Yes Yes No
check_mk Active ? No ∞ No Yes No
◦ Around 75.000 lines of code
◦ Actively developed (unfortunately only by me)
◦ Modularized design (use what you need)
◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …
◦ Unix: Linux/Debian (probably many/most others as well)
◦ 0.3.9 with 0.4.0 in beta
Most features require NRPE or NSCA (or NSCP)
Documentation online (WIKI)
Not supported by a commercial entity
◦ Donations welcome
◦ Sponsoring available (contact me for details)
Used by a lot of people (I think)
◦ Impossible to estimate any figures
Please, Help out!
◦ Add documentation
◦ Report problems
◦ Come with ideas, thoughts, etc…
NSClient++ is a command line program!
◦ nsclient++ -start (net start nsclientpp)
◦ nsclient++ -stop (net stop nsclientpp)
◦ nsclient++ -test
Is your friend!
◦ notepad nsc.ini
1. Local (nsclient++ -test)
2. From CLI (check_nrpe ...)
3. From Nagios (add command)
Works with “anything”
◦ Including many non Nagios based systems
New command line syntax!
◦ nscp --service --start
◦ nscp --service –-stop
◦ nscp --help
Testing nscp --test
Is your friend!
◦ nscp --test
◦ nscp --settings-help
◦ nscp --settings --migrate-to ini
◦ nscp --settings --set …
◦ nscp --client --module PythonScript --command
execute-and-load-python --script test.py --install
Major simplification to the disk/file checker
◦ CheckFile (removed)
◦ CheckFile2 Deprecated
◦ CheckFiles (replaces above)
Volume support (for real this time)
Scheduled task checks
A bunch of new commands
Bug fixes and many more things…
We have recruited a new member to the team!
A girl actually…
…Still a bit wet behind the ears…
◦ Powerfull interface!
◦ Simple to use!
◦ out-of-the-box solution!
(on which you can expand)
◦ Nothing! Really, I mean it!
…and then… yesterday…
◦ …in the bar…
◦ …all hopes shattered…
◦ …aparently it is still to complicated…
Same as was introduced for eventlog last year
Based on SQL WHERE clauses
◦ generated > -2d AND severity = 'error‘
◦ size > 5k
◦ size > 5k OR size < 1k
◦ size > 5k AND written > -2d
◦ (size > 5k OR size < 1k ) AND written > -2d
filename Name of the file
path Path of the file
size Size of the file
accessed When the file was last accessed
written When the file was last written
creation When the file was created
version The exe file version (slow)
line_count Number of lines in the file (slow)
Operator Safe Meaning
= eq Equality
!= ne Not equal
> gt Greater then
< lt Less then
=> ge Greater then or equal
=< le Less then or equal
like String similarity (substring matching)
not like Opposit of like
regexp Regular expression matching
path The root path to use
pattern The file pattern to use
filter Define the filter (there can only be one)
warn How many hits constitutes a warning state.
warn=>5, warn==5 warn=!=5
crit How many hits constitutes a critical state.
truncate Length of returned data.
Since NRPE/NSCA has a limited capacity this is
important. (Will be deprecated in 0.4.0)
syntax How to format the return data
master-syntax How to format the “message string”
debug=true Displays a lot more information in the logfile/console
CheckDriveSize … CheckAll=volumes …
Other new features
◦ Added a new option to ignore drives which are not
readable (like office 2010 q: drive)
◦ Added magic modifiers (from check_mk)
Works the ”same” as CheckEventLog
◦ ”filter=exit_code ne 0”
Works on Windows NT4 and beyond
But cannot check ”new” tasks (from Vista and beyond)
Works on Windows Vista and beyond
Has fewer filter keywords
title Tasks name
application The application
comment Retrieves the comment for the work item.
parameters Retrieves the command-line parameters of a task.
working_directory Retrieves the working directory of the task.
Retrieves the last exit code returned by the executable
exit_code associated with the work item on its last run.
max_run_time Retrieves the maximum length of time the task can run.
Retrieves the status of the work item. Possible values include:
ready, running, not_scheduled, has_not_run, disabled,
status has_more_runs, no_valid_triggers
most_recent_run_time Retrieves the most recent time the work item began running.
"filter=exit_code ne 0"
"filter=status = 'running' AND most_recent_run_time < -30m"
WARNING:test.job (2011-02-10 23:14:35)
CPU Load past 5 minutes, 80/90% bounds
CPU Load past 5 minutes, custom bounds
Memory utilization (all) 80/90% bounds.
Memory utilization (all), custom bounds
All fixed drives
All fixed drives, ignore any problematic drives
All volumes, ignore any problematic drives
Check the size of a given file (filename, size)
Check the age of a given file
Check for errors in the event log
No scheduled jobs have failed
No task has been running for longer then a given time.
Check if a given task succeeded
Check that updates are applied
All services in “sensible state”
All services in “sensible state” (exclude various services)
A process must be running
A process must not be running
A process must not have more then X instances
A process must not be hung
Using Google break pad
◦ same as Google Chrome, Mozilla Firefox, etc
Three options (not mutually exclusive)
1. Send crash dumps to crash.nsclient.org
Server can be changed
if you want to have an internal server or proxy server.
2. Store crash dumps for analysis
Will also be checked with check_nscp
3. Restart service
◦ Fixed problems with sending ”many” results back
◦ Added support for large payloads
◦ Added ”check_nscp” to check health of NSClient++
◦ Added new check for running other checks ”with a timeout”
◦ Added new negate check (to negate the result of another check)
All filters (read CheckEventLog et al)
◦ Many fixes and additions (regular expressions)
◦ Added support for checking if processes has ”hung”
◦ Added it to many places where it was intermittently missing
Whats to come?
• New windows
• Core switch • True passive
• Linux checks
0.3.9 support • Distributed
• Distributed Monitoring
• Last 0.3.x (v2)
Brand new core based upon libraries
◦ Things should *work* not just “work”
◦ More modular and extensible
◦ Both as a client and server
New settings subsystem
◦ Registry, improved ini support, http, etc
◦ NSCP (HTTP(s), MQ, Native)
◦ Many new things in this area (including MQ)
◦ Primary goal (for me) is to create “unit-test”
◦ Wix 3.5, more customizable
◦ Monitoring solutions for “standard things”
New windows check-subsytem
◦ More modern and less arcane (no NT4 support)
◦ Remote checking
.Net plugin support
◦ Possibly internal VBA scripting support
Metrics cache and aggregation
◦ Lightweight version of CEP
◦ “crit=cpu > 80% AND transactions_per_sec < 10”
Filter-like API (in addition to options)
◦ “warn=any drive > 90% OR c: > 80%”
◦ Allow NSCP to upgrade itself
“port” of the “standard plugins”?
◦ Run your favorite check_xxx from inside NSClient++
◦ Run CheckCPU on unix machines?
◦ A nice little program (systray)
Let me know what you would like to see!
Brand new core
This is why it was so long in the making
◦ Merging each new version took forever!
New internal protocol
◦ Removed all internal “limits” (think buffer sizes)
◦ Allows many new features
◦ Allows much more advanced internal scripts
◦ Allows for “non NRPE based checks”
A lot of new bugs?
◦ This is the scary part (for me)
but my testing has show it seems very stable
◦ Since no one seems to like to program on Windows
I brought NSClient++ to “unix”
◦ Because I can
With the new core comes portability
So, perhaps the better question was:
Will NOT be supported for some time though
◦ Unless someone wants to help out
Hierarchical settings subsystem
◦ allow arguments=false
◦ [NRPE Server]
Why did I do this?
◦ Because it was fun
◦ Number of options has started to explode
◦ Simpler to use the registry (as well as xml?)
Since settings have “url:s”
Allows extensions (not via plugins though)
◦ Maybe in the future:
You can mix and match:
Which in turn includes
Ability to load the same plugin twice.
Normal (default alias is python)
Multiple modules (define two aliases foo and bar)
◦ If you are “still” using check_nt:
◦ If you are using NSCA:
◦ If you want to use all new features
How do I change?
◦ It is pretty simple…
nscp --settings --migrate-to ini
nscp --settings --migrate-to registry
Windows Computer Nagios Server
CPU Fork ... Fork
Disk Fork ... Fork
Mem Fork ... Fork
... Fork Fork ... ... Fork
Windows Computer Nagios Server
Allows more then one command to be sent
Used internally for plugins
Support both passive and active checks
Supports configuration, management, etc…
But will also support:
◦ Multiple locales (based on utf)
◦ Unlimited payloads (soft configurable)
◦ Support real performance data (not strings)
XXX Agent XXX Server
Real time broker NSCA Agent NSCA Server
broker NSCA Agent NSCA Server
Check broker SYSLOG Agent SysLog Server
an extension of the passive checks
◦ ”Something” can send notification events
◦ ”Something” can receive notification events
◦ Agents can forward notification events
◦ Replaces NSCAListener module
Not a one-to-one mapping.
◦ Multiple consumers
◦ multiple producers
◦ Passive plugins (other then the built-in NSCA)
◦ Script and rule based routing
Built-in python scripting
Has full API support
◦ Can build ”modules” in python
◦ Can access settings
◦ Can do “anything”
Primarily used by me for unit-testing
Requires a working python install
Le Roi est mort, vive le Roi!
0.4.x (ish) will be the last ”Windows”
The idea is to make it more:
◦ A platform/client/server for distributed monitoring
Regardless of os/system
Regardless of Monitoring solutions
◦ It will still work just fine as a ”Windows Monitoring
◦ But in addition to this you will be able to do more.