yahoo ops logo

Y!RTG

bill fumerola@yahoo-inc.com's page



download yrtg.patch

questions
 
answers     
 
what is yrtg? yrtg is a patch that contains Yahoo!'s modifications to the nifty rtg poller. the requirements for this project arose from our need to support snmp polling a variety of datacenter networks with diverse sizes. rtg is lightweight enough to use in our small installations and powerful enough to poll our large datacenters (even with small poll intervals). most of the modifications were done to fix scaling issues we ran into along the way.

yrtg is still a work in progress but it is currently deployed in production polling hundreds of thousands of targets on thousands of devices.
what is yrtg not? yrtg is not officially supported. yrtg definitely is not a Yahoo! product. yrtg is not versioned much beyond the timestamps on the patches. yrtg is not guaranteed in any way. yrtg is not a standalone product.
so i would be running what Yahoo! runs? more or less.

active development is done on an internal branch and then those changes are integrated (using perforce's baseless merge) into a public branch after completion. additionally, the private version has some uninteresting changes to some compile time defaults done for packaging and tuning.
what value does yrtg add to rtg?
  • pERFORMANCE:
  • "sqlbufs": sql statements are buffered per-table and multiple rows are inserted in bulk instead of per-target these buffers are only flushed when eiher the poll interval has ended or when the per-packet limit (dynamically obtained from mysql) is reached. mysqld disk access and cpu usage is significantly reduced. (N.B.: mysql only).
  • per-thread mysql connections have been removed. this allows for more poller threads without consuming excessive resources on the mysql server.
  • INSERT DELAYED is used to spend less time blocked in a mysql_query() call which can prevent the next poll interval from starting (and the snowball effect which results from that).
  • per-device snmp sessions are created once when hosts are read in instead of each time a target is polled.
  • FNV hash is used for even distribution of targets across the hash buckets. if needed in the future, a 1:1 bucket:thread ratio could be used which would reduce lock contention on threads acquiring a target.
  • the target hash size is now a runtime tunable (-s [buckets]) instead of a compile time #define.
  • a plumber thread runs to flush sqlbufs to the database in the background. buffers can hold two rounds of data for as many targets that point at the table. buffers that contain an entire poll round of data (or more) are flushed (or in the case of a downed db, resized) synchronously at the end of a poll round. buffers with less than a round of data are flushed in the background. (N.B.: mysql only).
  • snmp OIDs strings compiled for the snmp library once on target insertion instead of each time a target is polled.
  • sTABILITY:
  • targets that report a hard failure condition or those that timeout twice in a row are removed to prevent them from stalling the poll round.
  • only one snmp query will be sent to a host at the same time. this avoids snmp timeouts and stressing of target's cpu cycles. increasing the thread count is now a bit less hazardous to target hosts. exception: if the same host is configured more than once in targets.cfg (possibly to use a different snmp version or community).
  • if the database server is down at the time of data insertion, rtgpoll will buffer the inserts until the connection comes back. it will retry once per poll interval. it will try indefinitely until the poller process runs out of memory at which point it frees the buffers. (N.B.: mysql only).
  • fEATURES:
  • snmp port number is configurable per-host instead of as a global value.
  • snmp query timeout length and retry counts can now be changed from the net-snmp default and are configurable per-host.
  • mysql's tcp port (DB_Port) and unix domain socket (DB_Socket) are configurable in rtg.conf.
  • cODE & sTYLE cLEANUP:
  • various buglets (example: predicates which evaluated to impossible or guaranteed conditions) have been squashed and other minor nits have been picked.
  • linked list macros were stolen from FreeBSD's <sys/queue.h> and have been used to replace several handrolled structures.
  • all freshly written and/or standalone code (rtgsqlbuf.c, rtghash.c) uses a style resembling KNF from FreeBSD's style(9) guide.
    for changes to original rtg code, existing style patterns were preserved when it was possible to ascertain them.
are other features being planned?
    i no longer am working on yrtg.

  • pre-allocate snmp pdus
  • coalesce pdus going to the same device
  • insert internal poll statistics into a database table per-round
  • add support for polling snmp tables with a single clause in the targets file. could be configured using the root oid of the table and an index:rtgid map that could be shared between tables referencing the same index (like IF-MIB/ifXTable).
  • error/warning/syslog/fprintf() consistancy
  • state cleanup to the targets.cfg lexer
  • pool of database handles of a configurable count
  • support for connecting to multiple database servers
how do i download and install yrtg? after downloading yrtg.patch, apply it by executing:
$ cd /path/to/rtg && patch < /path/to/yrtg.patch
following that, run 'autoreconf' (or 'automake' and 'autoconf') in the top level directory. compilation, installation and usage are the same as the standard rtg once the patch is applied and 'autoreconf' run.
why is the patch so huge? the yrtg changes touch every part of rtgpoll. the two largest changes in the patch come from the target hash code being completely rewritten and adding an sql buffering system. in addition, a lot of code cleanup was done in the process of all the changes.
do you have a patch just for feature X? nope. at one point i was keeping individual branches for each major project, but i now just maintain one public branch used to generate the patch. i have several reasons for doing this: avoiding overly complex patch dependencies, spending less time repeatedly merging in both directions, and lack of access to the upstream cvs.
what if the patch doesn't apply cleanly? make sure your rtg directory is from a recent cvs (not a packaged release). if the patch fails with an up-to-date cvs tree, please report it to me as a bug. check out the source code circa 2005, which was when the patch was generated.
who do i report a bug or crash to? if possible, it would simplify debugging to determine if the bug also exists in a clean rtg installation without the patch. if so, report details of the bug to the rtg mailing list. if you discover that the bugs are yrtg-specific please send me mail and i'll fix it.

when reporting a bug (to either the list or to me), please provide lots of juicy details (such as configure/compile logs, stdout/stderr, syslog'd messages, targets.cfg, rtg.conf file). if reporting a crash, it would be helpful to recompile with debug symbols and send gdb backtraces.
what about feature requests? in addition to the rtg mailing list, you can always drop me a line with new ideas. no promises that i can write every requested feature, but suggestions are more than welcome. also, i only use and write code for rtgpoll, not rtgplot.
why isn't there postgresql support for sqlbuf? nothing at all against pgsql, but i don't use it. sorry. the sqlbuf code is fairly clean and adding support for pgsql shouldn't actually be much work. an ambitious hacker who was interested in writing support for this would only need to read rtgsqlbuf.c, find and eradicate any mysqlisms in the generic functions, and write two database dependent functions.

want to take it on?
write one function (see: sqlbuf_mysql_cfg()) that calculates the size of: the largest allowed query, the initial preamble to each insert query per-table, the maximum size of a "values" string appended per-table. the other function (see: sqlbuf_mysql_flush()) is the code needed to flush the sqlbuf out to the database and handle any errors that could arise.

Copyright © 2004-2005 Yahoo! Inc. All rights reserved.
Copyright Policy