Sunday, February 23, 2014

Tooling Around With Midnight Commander

If you followed my simple SSH guide, you'll be logging in to your OMV installation and probably coming across a BASH prompt.  For the unitiated, BASH can be a bit intimidating.  And, even once you are fairly comfortable with it, a command line prompt can be a bit painful for some tasks.  So on most linux systems, I like to install midnight commander.  On Debian, this can be done just via apt;

sudo apt-get install mc

Then type mc to run midnight commander.  Basically, this is a text-based file manager that can be used from a linux shell, such as via a PuTTY session.  Most common commands like copying, renaming, etc have been given short cuts.  And the two-window approach allows you to navigate through directories in one window, while keeping your path in another window fixed.  Switching from window to window is done via the tab key.  I find this particularly useful when re-organising files, where I may need to search through multiple directories to find various files before copying and consolidating them.

Midnight Commander vs Rsync


An occasional question in linux forums is what is the best and quickest way to copy files from one directory to another, across different hard drives.  And the most common answer is rsync. 

I'll put together a short tutorial on rsync a little bit later, but basically it is a brilliant little application for synchronising the contents of two directories, potentially across different file systems and across networks.  It really is one of the gems in the linux toolbox.  And the real power of rync is the ability to work incrementally.  That is, it only copies files as required.  So you can set up rsync to copy the contents of one directory across to another directory, but it will only copy the files needed -- if a copy already exists on the destination, copying won't occur (although, there are settings to override this).  Furthermore, rsync checks at the block level, so even if a file is present at the destination, if it is out of date it can be updated with the required blocks.  This is really useful if you have large files, with small changes.  Most backup software will require you to backup the entire file whereas rsync only requires you to back up the differential parts.

So rsync is pretty brilliant.  But my own experience is that it has a few limitations.  If you are transferring files from one directory to another, empty directory, then rsync doesn't have any differential functionality to perform.  And in these instances, I found rsync to actually be slower than a standard linux copy command. 

Recently I needed to consolidate all my movies, scattered across a couple of different directories on two different 2TB hard drives, into a new, single directory on a 4TB hard drive.  Most of these files are between 1GB and 7GB in size, but there are also hundreds of other small resource files such as thumbnails and nfo files which are used by XBMC.  Anecdotally, I found the small files copied at a similar pace whether using copy of rsync.  But the larger media files were noticably slower. 

In the end, I ran a copy across each directory in a single pass (using midnight commander).  Then, after all the files were copied, I ran rsync just to make sure midnight commander hadn't missed anything.  Over 3TB-3.5TB of data, shaving a minute or two per file can make a reasonably significant difference in overall time.


Wednesday, February 19, 2014

SnapRAID + Open Media Vault

Much as I have found Open Media Vault (OMV) to be an excellent choice for managing my NAS, it is not without limitations. At least, when compared with one it's main competitors...

unRaid

For example, out of the box, OMV only supports fairly standard RAID options, with most NAS users using RAID5 or occasionally RAID6. With RAID5/6, you cannot mix and match drives of different sizes -- this is generally not an issue in a corporate environment, who buy hard drives in large batches. But in my experience, the typical home user cobbles together their first NAS using a mixture of discarded hard drives. As they need (and can afford), this is perhaps supplemented with whatever they can pick up on sale from their local computer shop. It's rarely an orderly process. unRAID's proprietary approach accommodates this well. You can mix and match drives of a variety of different sizes, adding new drives to the pool as required. With RAID5, parity information is striped across the entire pool of drives, whereas unRaid maintains the parity information on a single drive. The only requirement is that your parity drive is as large (or larger) than the largest non-parity drive in the array.
One of the other advantages of unRaid's non-striped approach is that reading data from a drive only requires a single drive to be spinning (ie, the one with the data). With RAID5, since your data is striped across multiple drives, reading a file from the array will require all the drives in the array to be spinning. Again, in a corporate environment this is usually not a problem... file servers are expected to run 24/7, and are designed accordingly. They are usually stored in a server room, where noise, heat and power consumption can be dealt with. In a home situation, this may not be viable.
I'm pretty lucky that I have a "man cave", where my NAS can be tucked away on a shelf. The room is pretty cool most days, so hard drive heat is rarely an issue. And it is sufficiently isolated from the rest of the house that the mild hum of a spinning drive doesn't cause issues. Not everyone is so lucky. About 70% of the activity on my NAS would occur between 6pm-10pm each evening... ie, after I get home from work. The rest of the time, it can mostly sit idle... with the exception of logging, cron jobs, backups, etc. The RAID5 approach means that all my drives are spinning pretty much 24/7, when they really don't need to. Is this really a big deal? No idea. But I wanted to explore some alternative options. Of course, I could always just switch to unRaid. But as per a previous post, it has some limitations.

FlexRAID

FlexRAID aims to provide some of the benefits or unRAID, but takes a different approach. Whereas unRAID is a fully-fledged NAS operating system, FlexRAID is an add-on to an existing operating system. Both Linux and Windows are supported. Like unRAID, FlexRAID doesn't stripe your data, which allows you to have a variety of different drives, of different sizes. And, since the drives operate independently, only a single drive will spin up when data is being read. FlexRAID also offers a really nifty benefit over unRAID in that you can add an existing drive to the array and still retain the data on the drive. FlexRAID is proprietary software, with a typical ticket price of US$59 and they offer a 2 week trial for free. Unfortunately, after making an initial attempt to get set up, I wasn't able to really thoroughly test FlexRAID within my two weeks. So my knowledge kinda ends there. At the very least I can say it has a really natty interface :)

SnapRAID

Like FlexRAID, SnapRAID provides parity-based redundancy. And, like FlexRAID it is available for both Linux and Windows. In fact, being an open source product, it can potentially be compiled for a wide variety of operating systems. With SnapRAID, you nominate the drives to be included in your "array", and one (or more drives) to store the parity information. A sync command can then be used to update the parity data.The requirement to update on command is an important distinction between SnapRAID and most other forms of RAID. If you have an error, you'll only be able to restore data from the last sync point. Typically, you would schedule syncs according to your needs -- maybe only once a day if your data doesn't change regularly. Of course, the more frequent your data changes, the more frequent you'll want to schedule syncs. But in most home situations, data on a NAS doesn't change that regularly, so this isn't a significant limitation.
On the flip side, SnapRAID has a nice little advantage over typical RAIDs. That is, you can actually restore files even if they aren't corrupted or damaged. In other words, it serves as a handy little "undelete" function. Again, you can only undelete files that have parity information stored. One of the really nice features of SnapRAID is that it works on existing drives and existing filesystems. This means that you can add parity protection to an existing drive without having to back-up your data, reformat the drive and then restore your data. This is something unRaid can't do. Furthermore, if you ever choose to stop using SnapRAID, or want to move your drive to another system, you don't really have to do anything. A drive currently being protected by SnapRAID can be read by another system, assuming the other system can understand the filesystem. There are some limitations; that being you really need to use either NTFS (for Windows) and ext4 (for *nix).
I've heard some criticisms of SnapRAID on the basis that it is a command-line driven application. This is true -- running SnapRAID, either through *nix or Windows, will typically require you to run some simple commands. None of them are overly complicated. That said, there is a GUI for Windows, called Elucidate. I can't comment on whether it is any good or not, since I'm not using SnapRAID on a Windows machine.

Installation

I followed the instructions found here. They required me to install some additional packages in order to compile SnapRAID. But if you want, these can be safely removed once SnapRAID is installed. The whole process probably took about 10 minutes in total.
Alternatively, Open Media Vault now has a SnapRAID plugin which integrates quite nicely with the OMV web interface. The plugin keeps the SnapRAID binary up to date without needing to compile from source. I've been using this plugin for several weeks now without any significant issues.