Provide a means for storing a history of DNS/Name changes for the IP Addresses extracted from web log files. The major target being that multiple analyses of older log files do not require re-lookups of IP Address to FQDNs, and additionally maintain the accuracy of the lookup as it was then and not as it is now.
Latest Production Release is version 1.3 dnshistory-1.3.tar.gz
- Save on disk space! Estimates for one system I look after, shows that using DNSHistory vs dnstran, saves around 4Gb of disk space a year. Given the cost of high performance SCSI drives, that does translate into not insignificant dollars!
- Accuracy. dnstran has a distressing tendency to translate parts of logfiles that shouldn't be translated. Additionally, compared with "cache" style translators, you will get different results if you ever need to re-run in years to come.
- Accessible. DNSHistory can be pipelined with other tools. You're not tied to a single product to do log analysis. egrep, gawk and cut can do a lot of simple fast analysis.
With versio 1.3, DNSHistory can also process squid, ftp xferlog and iptables log files.
dnshistory currently has five modes of operation:
- Do Lookups. The default mode. Given a web log file, dnshistory will perform DNS reverse lookups on each unique IP Address and store the results in a history database.
- Do Translations. Given a raw web log file, dnshistory will make use of a previously created history database and send to STDOUT the same web log but with addresses replaced by the Fully Qualified Domain Name as previously looked up.
- Do Recombining. Given two web log files, one raw and one previously translated (eg. by using dnstran): Create a history database from the values in these separate log files.
- Do Dump. Dump a given history database to STDOUT.
- Do Import. Import a previously dumped history into a new database.
- Show History. Given one or more IP Addresses on the command line, display their history from the database.
It's quite possible that most users would only ever use the first two modes.
The lookups make use of threads for near maximum speed, and use the standard resolution libraries on a system. Thus hosts files, NIS, LDAP and other name resolution methods should work transparently. Unfortunately most other tools ignore local name resolution methods in favour of DNS lookups only.
It is strongly recommended that for massive raw lookups a DNS server is "nearby". Preferably not a forwarding server, or your upstream provider will not like you.
dnshistory can read .gz files. Any input sent via STDIN is currently assumed to not be gz encoded.
dnshistory assumes that the logs being sent are already sorted into oldest --> most_recent date/time order.
A Berkeley Database is used to store the history; as well as possibly reducing the memory footprint within a run.
dnshistory is released under the General Public License
Testing has shown that there are diminishing returns by increasing the number of threads. Most particularly, the accuracy of results rapidly decreases. This will of course depend on a multitude of factors. When in doubt, trial multiple runs and vary the maximum number of threads created. This should help determine an optimal figure for your configuration.
Using a raw, no cache, DNS/BIND server, 1 retry (-l 2) and a 1 second retry wait (-w 1), 100 threads maximum, on a 556,000 line log file with 7200 unique IP Addresses takes about 2 minutes with near perfect accuracy.
Changes from v1.2 ==> v1.3
- Process squid, ftp xferlog and iptables log files
- The log type can be set or auto detected
There are no changes of any significance with the upgrade from v1.3-beta1.
The latest version is 1.3, released the 31st January 2007.
The source code for dnshistory can be found here: dnshistory-1.3.tar.gz.
Feedback on the use of, or new feature requests for dnshistory are most welcome. Contact: firstname.lastname@example.org.
This package requires three additional libraries:
- Berkeley DB. Built with DB4, it probably won't compile or work with earlier versions.
- Perl-compatible regular expression library. Built against version 4.5.
- Zlib Compression library. Built against version 1.2.2.
You will also need the PThreads library. As this is (usually?) part of libc this should be fine.
$ dnshistory -d dnsdb.db -f access.20050425.log.gz
At it's simplest. This simply takes all unique IP Addresses from the compressed log file, does the required lookups and stores the results in dnsdb.db in the current directory.
$ dnshistory -v -d /dev/shm/dnsdb.db -f access.log.gz
Add a touch of verbosity with a single -v to get some simple feedback. Here we hit the maximum default number of threads doing name lookups. An immediate repeat of the same log file would possibly see this number reduced, due to the caching effects of the local DNS server.
We can also see that we had a few names that did not convert to FQDNs (1548). And that we had 27 successful retries.
$ dnshistory -T -d /dev/shm/dnsdb.db -f access.log.gz | webalizer ....
Following on from the previous command, we now run dnshistory in "DoTranslate" mode where the IP Addresses in the log file are replaced by the Names as stored in the database file: dns db.db. The output from same can be sent anywhere, here we pipe into the webalizer log analysis tool.
$ DB=dnsdb.2004 ; DIR=/export/logs/2004 ; for i in `seq -w 1 1 52` ;
Doing a Recombine action. This command line assumes we have the original, raw logs as access2004-??.ServerName.log.gz and the previously translated logs as access2004-??.log.gz for each week.
Zmergelog is used to merge the original individual server logs in this example, the output from same is fed into dnshistory.
This entire command will process the entire years worth of logs, and save a copy of the database after each week, in case of failures.
$ dnshistory -S -d dnsdb.db 127.0.0.1,192.168.1.254,10.10.10.10
Doing a ShowHistory against three IP Addresses. Demonstrating the three styles of returns. A successful lookup, A successful lookup that didn't resolve and lastly an IP address that has not been recorded in the database.
$ dnshistory -D -d dnsdb.db > dnsdb.dump dnshistory -I dnsdb.dump -d dnsdb-new.db
A combo dump/import process. Normally you should be able to get away with using the Berkeley DB db_dump/db_load commands. However I personally had no success with doing so and transferring data from an x86 server to it's upgraded x86_64. Hence the import. FWIW, the dump is semi useful for advanced log analysis. You can quickly see just what groups and ranges of IP addresses are hitting your website. Judicious use of cut, uniq and sort can be mildly revealing.