Books by Bill Inmon

This is a note for Future John.

I am reading Fundamentals of Data Engineering which says that beside Ralph Kimball, Bill Inmon had a lot to do with data warehousing in the early days. I’m going to try and read a few of his books, many of which are available in Kindle (and Audible) format from Amazon.

Subversion @ GitHub

I wanted to use Subversion to checkout one of my GitHub repo branches, because an svn checkout only downloads the files it needs, not a full copy of every file ever added. But I discovered that GitHub sunset Subversion integration earlier this year. Sad face. Still, I suppose the economics justify that decision. As a consequence of my research, which was a bit sketchy because there is still heaps of documentation out there referring to the GitHub features which no longer exist, I did happen to learn about:

Mail log IP address count

The following monster will parse the mail log and report on unique host connections along with a count.

cat /var/log/mail.log | \
  grep ' connect from unknown' | \
  awk '{ print $8 }' | \
  sort | \
  sed -n 's/.*\[\([0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\)\].*/\1/p' | \
  awk '{count[$1]++} END {for (word in count) print count[word], word}' | \
  sort -n

Data formats including YAML

I finally got around to reading about YAML at Wikipedia. Worth doing if you use YAML anywhere and haven’t read the YAML page yet.

The Wikipedia article links to the official website: yaml.org, which is good fun. It’s written in YAML! :D

If you’re interested in data formats and markup languages here’s some other reading on the subject:

CRC

I watched Ben Eater’s videos How do CRCs work? and Checksums and Hamming distance (the full course is Learn about error detection) then found the Wikipedia page: Cyclic redundancy check.

Also of interest: