Friday, August 29, 2008

The Music Project

I was talking to my friend Brian Goetz recently, and he reminded me of a blog entry he posted a while back. He's digitizing his entire music collection, and he's done all the research. This is appealing to me because I recently switched over to zero reliance on the shiny plastic disks once I've ripped them. All my music consumption is now electronic, mostly via iPod (the car was the last frontier, and now that's switched over as well). I have a large-ish CD collection (~1500 CDs), so translating them to electronic form is no small feat. And I only have to do it once for the rest of my life.

Based on my conversations with Brian, I realized the following:
  • Ripping and encoding are completely separate activities
  • The most important thing is to get a reliable, error corrected rip
  • You don't want to get stuck with a proprietary format (as much as I love Apple, I don't want my music collection tied up in one of their formats forever)
  • Lossless is the way to go, so that you don't loose any data in this process
  • Cross-encoding allows you to rip your music in a "core" format and selectively support proprietary formats and reduced file sizes (and lossiness)
His original entry is about his Windows and Linux based infrastructure. I replicated his same setup (with minor differences) on the Mac.

A Home

First, you need a home for all these files. I recently got a network attached storage device to hold our growing collection of digital photos. I'm my family's tech support, and if anything ever happens to those photos, I'm a dead man! After doing a lot of research on quality, Mac compatibility (my household is 100% Mac now), I ended up with the NETGEAR RND4250 ReadyNAS, which comes with 500 GB of storage (2 mirrored 500 GB drives), with a capacity of 2 TB. I ended up bumping it up to a TB when I realized how big the combination of music + pictures would be, and the Netgear handled it beautifully.

Ripping

I didn't realize this, but most rippers don't take advantage of the error correction bits on the CD, and they are wimps: they give up way too easily. The trick is to find a ripper that is relentless and tries its hardest to get all the bits off the CD. The ripper/encoder I ended up using is Max, which supports Leopard in it's latest (allegedly) unstable form, and previous Mac OS X versions in it's previous (stable) release. I caveat "unstable" because I used it a lot and it was rock solid for me on Leopard. One of the nice things about Max is its support for different rippers: fast & carefree, or paranoid. The latter is the one I want, based on the cdparanoia project. You can configure this ripper to never give up until it gets a clean read of the disk. Several of my CDs trundled for 6-10 hours before Max finally reported success, including a couple that I'd given up for dead.

Encoding

OK, so now I have a good rip, I need to encode it (this post makes it sound like separate steps, but Max handles both for you). As I stated earlier, I don't want to get trapped by a specific format. I ended up (like Brian) choosing FLAC. FLAC is a open standard encoding for music that offers lossless compression, which is what I wanted. The FLAC spec also allows for more aggressive compression without loss of data, depending on your patience. It's designed to take longer to encode but have no impact on playback time. I choose the most aggressive because I have time, hardware (a MacPro with 4 processors), and I want to conserve space if possible. But, iTunes (which is how I play and sync my music) doesn't support FLAC. Max to the rescue: it will let you do parallel encoding. I set Max up to encode the ripped music files to both lossless FLAC and lossless MP4 (Apple's format). The only downside is that it won't allow you to choose different directories for the encoding. The FLAC files I'm placing on a RAID mirrored network-attached storage drive (remember, I never want to do this again!). So, I ended up writing a little Rake file to handle automatically moving the files from one place to another. I rip them all to the RAID drive, then let the script move them (preserving directories) to the other. The script is here, if anyone wants it (no warranty expressed or implied -- you'll have to change all the directories, and if you use this to erase your hard drive I'll shed a tear for you, but might just laugh).
task :copy do
count = 0
skipped = 0
FileList["**/*.m4a"].each do |f|
artist, album = recording_info_based_on f
if File.exist? "#{DEST}/#{artist}/#{album}/#{File.basename(f)}"
puts "\tsomething is amiss; I'm skipping: #{f}"
skipped += 1
else
FileUtils.mkdir "#{DEST}/#{artist}" unless File.exist? "#{DEST}/#{artist}"
FileUtils.mkdir "#{DEST}/#{artist}/#{album}" unless File.exist? "#{DEST}/#{artist}/#{album}"
puts "#{artist} - #{album} - #{File.basename(f)}"
count += 1
FileUtils.cp f,"#{DEST}/#{artist}/#{album}"
end
end
puts "copied #{count} files\nskipped #{skipped} files"
end

def recording_info_based_on filename
File.expand_path(filename) =~ /.*\/(.*)\/(.*)\/.*/
return $1, $2
end

I also made a rake task to report any that ended up missing from the original FLAC directories to the AAC files (just in case something went amiss during a copy process, or I screwed up and deleted something by mistake). I want to make sure that the convenience Apple-format files match the canonical source (the FLAC) files. So, this is the "missing" rake task:

task :report_missings do
count = 0
FileList["**/*.flac"].each do |f|
artist, album = recording_info_based_on f
dest_file_name = File.basename(f).sub /flac/, "m4a"
unless File.exist? "#{DEST}/#{artist}/#{album}/#{dest_file_name}"
puts "missing #{f.sub /\.flac/, ''}"
count += 1
end
end
puts "found #{count} missing files"
end

Result

It took me about 2 months of ripping while I'm around my computer, running 2 computers (my laptop and desktop) in parallel. In the end, though, I ended up with 453 GB of music files, the FLAC ones safely tucked away on a mirrored drive and the M4A ones on my desktop, ready to be synced to my iPod (or a subset of them, anyway). Now, when I get a new CD, I rip it using Max to the NAS and either copy the files by hand (if it's just one CD) or use the Rake file to move lots en-masse. Storage is now dirt cheap, and I've leveraged almost a terabyte of it keeping the music files in 2 formats. I also recently bought a portable 500 GB drive so that I can keep all my music with me on the road. It's a copy of the desktop M4A files, but it's easy just to mirror the Music directory from the desktop to the portable drive.

I achieved my goal: an open archival format that I hope will be around for a very long time, and a convenience version for the way I happen to consume them today. And the shiny disks? I put them all in binders, so that if I ever need one of them (or it's sleeve), I can rummage around in the (mostly) alphabetical CD volumes. I didn't put a huge amount of effort creating an expandable storage that makes it easy to keep them in strict order because that would take lots of effort and it isn't something I expect to have to do often. If it turns out I got back to them all the time, I'll invest the time then.

Wednesday, August 20, 2008

97 Things Every Software Architect Should Know

A while back, Richard Monson-Haelfel was working on a presentation called "10 Things Every Software Architect Should Know", which was a great idea for a talk. To solicit ideas, he posted to several mailing lists where architect-types lurk about, and he got flooded with responses. I was one of the early contributors because I had been thinking about some of this stuff anyway at the time he posted the call. Richard liked the entries so much, he decided to put up a wiki through O'Reilly to publish all these little snippets of advice (or axioms). O'Reilly liked the results so much that they are considering making it a book, sort of like Beautiful Code, but with much shorter (and many more) axioms. I'm not sure where the magic number of 97 came from, but that seems like a reasonable number. The whole "About" story appears on the site, here.

Well, it's out of the shadows now. O'Reilly has moved the site to NearTime (a sparkly hosted wiki site) and it's now public here. Go visit: there is some incredibly good advice here, in little bite-sized chunks. I greatly enjoyed reading the site and soaking up some of the great advice, all derived from real-world slings and arrows. Kudos to Richard for seeing this through.

Tuesday, August 12, 2008

In Praise of Technical Reviewers

It came to my attention recently that I had made a bad assumption about the Productive Programmer book. My understanding (and apparently this is common) is that the technical reviewers of the book would get an entry on the title page of the book. Apparently, that's not the case. That's why I didn't put them in the acknowledgements: I assumed they had already been recognized. But they haven't, so I'm going to rectify it.

First, I've added a special paragraph to the acknowledgements in the 2nd printing of the book, thanking the hard-working technical reviewers. This is a little unusual (generally, nothing changes between printings, but I felt badly about this). The other thing I'm going to do is thank them here. This is the new paragraph in the 2nd printing:

A special thanks goes out to the technical reviewers for this book. Without their hard work and dedication, this book would suffer lots of silly mistakes and confusing explanations. Thanks to Greg Ostravich (who has reviewed every book of mine for the last few years and gotten no recognition, unfortunately), Venkat Subramaniam, David Bock, Nathaniel Schutta, and Matthew McCullough.

Greg gets a special thanks. He's reviewed everything I've written over the past few years, and circumstances keep preventing him from being acknowledged. In the 2006 No Fluff, Just Stuff Anthology (which he reviewed), I was under the same mistaken assumption that the reviewers got a shout out. In the 2007 No Fluff, Just Stuff Anthology (which he reviewed), I specifically wrote a thanks to him and the other reviewers. But, alas, the book came in too long, and several pieces got cut, along with my original introduction to the book (it was replaced by Ted Neward's). Unfortunately, the shout out got axed with the introduction. And, now, clearly demonstrating hope over experience, Greg volunteered to review the Productive Programmer, and the same thing happened. So while I'm thanking the other Productive Programmer technical reviewers, I'm both thanking and begging forgiveness from Greg. Good job, buddy, and unacknowledged for too long.