Sunday, July 20, 2014

Hiding Services Behind Apache

Some Background

Skip to the bottom of this post if you just want instructions.

When you set up a service like Sickbeard or Couchpotato, you connect to it via an IP/Port address, like say  The part is the IP number, while the 8080 part of the address is the port number.  For simplicities sake, think of the port number as a little bit like the street number on a physical address.  Each service on your server, listening for incoming traffic, needs it's own unique port identifier to distinguish from other services on the server.  Just like different houses in a street require individual street numbers to distinguish themselves from other houses in the street.

Most web servers, by default, will listen to port 80 (for HTTP traffic) or port 443 (for HTTPS traffic).  So if you server has a standard web interface (like Open Media Vault), port 80 is usually where it will be listening for an incoming connection (unless you change the default settings).  Of course, since port 80 and port 443 are defaults, you'll rarely need to specify them when typing a web address in most browsers.  That is, is functionally the equivalent of

There are a couple of problems with port numbers.

Firstly, as you set up more and more services on your server, trying to remember them all starts to become a headache.  I know of people running sabnzbd, transmission, sickbeard, couchpotato, headphones, htpc-manager, subsonic and calibre... all on the same server.  And of course, there are plenty of other really cool open source server tools out there.  Wouldn't it be a bit easier to be able to use an address like instead of trying to remember that your sickbeard port is 8080?

Secondly, running these services on different ports can cause headaches with firewalls.


For the time being, I'm going to assume you are running your server in a hosted environment, OR you are hosting your server at home and have correctly set up port forwarding on your modem (and, are aware of the potential risks).

So you are outside of your home network, and trying to connect to your sickbeard instance and it keeps failing.  Why?  In a lot of corporate environments, and a lot of public Wi-Fi situations, it's pretty common for non-standard ports to be blocked.  That is, traffic via port 80 and 443 will work, but the 8080 port your sickbeard instance is listening to will be blocked.  Of course, you could change settings for sickbeard so it listens to port 80 instead.  But now your OMV management page won't work.  And of course, what about other services on your server? 

Apache to the Rescue

The Open Media Vault management page is hosted via a webserver called Apache, and it's pretty cool.  When I said that OMV is listening to port 80, what I really meant is that Apache is listening to port 80.  And when it gets a connection, it serves up the OMV management pages.  However, we can configure Apache to re-direct certain traffic.

What does this mean?

Let's look at our scenario above, where we type in the address  Your web browser interprets this as, and sends it off to your server.  Being that this request comes in via port 80, Apache picks it up and tries to interpret the remainder of the address "/sickbeard".  Now, Apache hasn't be configured to understand this part of the address so you'll get a HTTP 404 error.

Alternatively, you could instruct Apache to redirect this request to the correct location (or more specifically in our case, the correct port).

The advantage in doing this, is that the request is transmitted via port 80.  It only gets forwarded to port 8080 after it has been received by Apache.  That is, after, it has passed through firewalls (and, after it has been port-forwarded by your modem). 


There are a variety of ways of doing this, depending on the service and your server configuration.  These instructions are for Open Media Vault (which is based on Debian Squeeze).  But they will probably work on other installations of Apache on Debian.  If you're not sure, backup any config files before editing.  If you are using Ngix, you'll need to follow different instructions.

Go to the directory


And find the file proxy.conf.  If it's not available, you'll need to create a new file.  In the file, insert the following code;

<Location /sickbeard>
order deny,allow
deny from all
allow from all
ProxyPassReverse http://localhost:8080/sickbeard

Of course, substitute port 8080 for whatever port your instance of sickbeard is running on.  If you have other services, like couchpotato, just add additional sections to the file, like;

<Location /cp>
order deny,allow
deny from all
allow from all
ProxyPassReverse http://localhost:8082/cp

Again, substitute port 8082 for whatever port your instance of couchpotato is running on.

Once these configuration changes have been made, save the file and then restart the Apache server;

sudo service apache2 restart

In most instances, using localhost to refer to your server should work fine.  But in a couple of cases, I found this didn't work.  I still haven't worked out why it works in some instances and not others.  The simple solution is to just substitute "localhost" for the IP address of your server (ie, like

Just out of interest, it took me a while to figure out you would modify a configuration file in the directory mods-available and not in the directory mods-enabled.  If I looked a little closer, I would have seen that the files in mods-enabled are just symbolic links to mods-available.  Hopefully this makes a little bit more sense.

I believe this way of setting up Apache configurations is unique to Debian.  So for other distributions, you may need to google to find the best place to add these changes.  Typically, it will be in the httpd.conf file, which will be found in /etc/httpd/conf/ or somewhere similar.

EDIT: Read my new post on using HAProxy to achieve the same thing.

Wednesday, May 21, 2014

Backup Strategies

I spent a number of years living with my documents, music, photos and various files scattered across various computers and different external backup drives. So when I shifted everything to a central NAS it was something of a revelation. It's pretty darned convenient having access to all the same files, no matter which computer or mobile device you use. I also found that consolidated stuff allowed me to save on hard drive space.
But there was a down side. Instead of having several copies of the same files (often, unintentionally), I now had a single copy. My haphazard approach to storing and managing things had the unintended consequence of working as a sort of backup strategy. And by consolidating everything, I no longer had this protection. What to do?

The 3-2-1 Rule

There is a commonly quoted "best practice" that says if your data is not stored in three different places, it isn't truly protected. In other words, you need more than just a backup... you need multiple backups. Some take this a step further -- not only do you need three copies of your data to be safe, but it should be stored on a least two different storage mediums. And at least one of the copies should be stored offsite. Hence the 3-2-1 rule.In a commercial environment, where data is critical to business operations, following the 3-2-1 rule may be difficult but almost certainly worthwhile. In a home environment, it becomes a bit more challenging. My NAS (as at the time of this post) has around 5-6TB of data. Regular backups of this volume of data, including offsite backup, is just not viable -- either economically, technologically or logistically. I'm running SnapRAID with q-parity, which provides something roughly equivalent of RAID6. So that is some measure of protection, but it's not a backup.

Why RAID is not a backup

Spend any time discussing RAID (and/or backup strategies) on an internet forum, and someone is bound to pipe up with the reminder that RAID is not a backup!!!. With the exception of RAID0 (which some geeks don't consider to be real RAID), the various RAID levels are designed to provide some degree of data redundancy. The term redundancy tends to imply that a RAID array creates multiple copies of the data. In the case of RAID1, where the data is mirrored across drives, this is certainly true. In the case of RAID5, the data itself isn't mirrored across drives, but parity information is. And that parity information is sufficient to rebuild your array in the event that one of the drives fails. So in both instances, RAID1 and RAID5, it certainly sounds a lot like a backup.So what is the difference?The purpose of a backup is to help you restore your data in the event of a catastrophic failure. RAID5, for example, allows you to restore your data in the event of a drive failure. But if two drives fail at the same time, then your data is lost. If you don't mind forking over the cash, you can expand your array to RAID6, which provides parity protection like RAID5, but with double the parity. With RAID6, you can survive two drive failures. But what if you have a third fail at the same time?Does that seem statistically unlikely? Maybe so. But drive "failure" can encompass a range of things. Ever had a power surge that has damaged electrical equipment? Flooding? Fire? Theft? Any of these events could see all the drives in your array essential "fail". And in these instances, no level of RAID is going to protect you. Hence the mantra that RAID is not a backup.

Developing a Backup Strategy

So here's the basic approach I took. I roughly classified my content into three broad categories;

  1. High priority -- really important stuff that I really, really don't want to lose. Mostly this includes personal and business documents, family photos, and a few other miscellaneous things (such as online purchases that I can't download again). If these get lost or destroyed, then they are essentially gone forever. Therefore, they should be afforded the strongest level of protection.
  2. Medium priority -- this is stuff that I don't want to lose, but it wouldn't be that devastating if they were destroyed. This is difficult to categorise. I have hundreds of movies and TV shows that have been ripped from DVD or Blu Ray. If they were accidentally lost, it would be truly painful to go through that process again. But the reality is that it could be done. Plus, I could go out and re-buy them if I really needed to (say, in the event of a fire... and assuming they were adequately covered by insurance). This is the sort of stuff that I make my best effort to backup.
  3. Low priority -- stuff where it really doesn't matter if it is lost. Mostly this is temporary files, downloads (ie, the latest copy of Open Office, or Gimp), cloud purchases (which can be re-downloaded), etc. No backup plan for this stuff.
Now that everything has been classified, what next?

Implementing the Strategy

Personally, I've set up my own VPS. The main benefit to using a VPS for backups, is that I can manage the backup process myself. For example, most backup services, like Skydrive, Dropbox, etc, require you to use a proprietary client. If you are lucky, the service also has an API, which means there may be third-party client tools available. But unless you are a developer, you are still going to be limited in the tools that are available to you. Whereas my VPS is just a server, running Linux so it is pretty generic. Why is this important?My favoured approach is using rsync from my NAS to my VPS. But I could transfer files over FTP, NFS, or whatever. I could install owncloud if I wanted, or other similar services. I also like the fact that I can transfer from one VPS to another, pretty easily. In fact, I can set this up to run in the background, offsite location to offsite location, without impacting on my own internet connection. Whereas moving from Dropbox to Skydrive means re-uploading all your data to your new service provider. Given my mediocre upload speeds, this is not something I'd relish.So running my own VPS has it's advantaged. But if setting up a VPS seems like overkill, then pick your backup service provider and use whatever tools they have available. Consider price, speed, reliability, security... whatever is important to you. All the computers in my house connect directly to the NAS, and draw documents from the NAS as required. So as long as I backup from the NAS on a regular basis, everything should be fine. There are a couple of exceptions, though. For example, I keep my music collection on my laptop, so it's available to me whenever I'm away from home. So I periodically sync my laptop with the NAS, usually using rysnc. My laptop is a Macbook Pro, so rsync is available out-of-the-box. If you are using windows.... erm... well, I'm sure there is something similar :)Ok. So, assuming the NAS is more or less the definitive source of all my data, and assuming I've classified all my data according to priority, the following then applies;

  1. High priority stuff gets backed up every night to the VPS. I've written some really simple shell scripts to handle this (which I'll cover in another post), and scheduled them to run at 1am every day. Periodically, high priority stuff is also backed up to external hard drive/s connected to my NAS. This more or less makes me compliant with the 3-2-1 rule -- I have three copies of my data (NAS, cloud and backup drive) and at least one of those is offsite (cloud). Technically, it's not on two different mediums, unless you consider cloud storage to be a different medium to a hard drive. But it's pretty close. Maybe I should consider getting a blu-ray burner?
  2. Medium priority stuff is backed up periodically to external hard drive's, but not stored on the cloud. Since most of this material is ripped DVDs and Blu Rays, you could argue that technically I have three copies of the data -- NAS, backup drive and the original discs. And I comply with the requirement for two different mediums (optical disc and hard drive), but I don't comply with the offsite requirement. If I had the means, I suppose I could box up the original discs and put them in storage. And in the event that something really serious happened, I could always re-buy all the discs. So again, that's pretty close to the 3-2-1 rule.
  3. Low priority stuff doesn't get backup at all. But as with all the data, it is protected by SnapRAID's parity protection. On a few occasions I've accidentally deleted files, and been able to restore them using SnapRAIDs functionality. So that's pretty handy, and about as much as I can expect for temporary files.


So, do I comply with best practices? Not exactly. But for a home user, looking to manage a large volume of data in a cost-effective manner, I think I come pretty close. At the end of the day, it's all about tradeoffs. How important is your data? How much are you willing to spend? How much effort are you prepared to go to? At the very least, these are questions you should stop and think about. There will be a time when you lose your data, either through a careless accident, bad luck or malfeasant technology. So be prepared.

Monday, April 7, 2014

My Home Network Setup

A couple of people have asked me how my home network is basically set up and why I've chosen to do things a particular way. So rather than repeat myself several times, I thought I'd just try and write it all down in a blog post.

Although this is all for home use, I've tried to maintain a certain level of discipline around how I've set things up. But I don't work in IT infrastructure and I know nothing about "best practice". So a lot of it has been trial and error and a bit of guess work.

The NAS is the centre of everything

My HP N40L serves as my NAS and it is basically the centre of everything. It currently has 5 storage drives (3x4TB and 2x2TB) plus a small SSD for the operating system. I use SnapRAID for parity protection, with one of the 4TB drives storing parity and a second drive storing q-parity. For those not familiar with SnapRAID, this gives something roughly like RAID6 protection.

The OS on the NAS is Open Media Vault (OMV) which is basically Debian with a web management interface. I use it because it's free, it's open source and it's pretty easy to use. Truth be told, I could probably use vanilla Debian but in the early days I found configuring Samba to be a real pain and OMV made that a lot easier. OMV has also started to attract a growing community of plugin developers, although I generally don't use plugins much myself. One of the nice things about installing and configuring software from source is the level of control you have. Over time, I've got to the point where I can get software like Sickbeard, SABnzbd, Subsonic, et al up and running pretty quickly, and configured the way I like things.

The primary purpose of my NAS (as you would expect) is to serve content to the rest of the household. Mostly, this is done using samba, although I also run netatalk. Netatalk is an open source implementation of Apple's Appletalk suite, which is required if you want your NAS to serve as a Timemachine destination (and I do).

Playing Media

Initially, I tried to have my NAS also serve as a media player and there are certainly benefits to this arrangement. Having your media player and your media content close to each other means you don't have to worry about bandwidth issues and bottlenecks. And if you throw in a half-height graphics card, the HP N40L is definitely capable of serving as a media player. In fact, the default graphics capability of the N40L is, reportedly, capable of playing 1080p content. But it only has a VGA output, so you'll want an add-on graphics card if you want to plug in to your TV via HDMI.

Unfortunately I encountered a few issues with this approach. Firstly, I wanted my media player to be as easy to use as possible. I played around with a few options and ultimately settled on OpenELEC which is a variation or flavour of XBMC. OpenELEC is designed to create an appliance-like experience. It runs on a stripped down, bare-bones implementation of Linux, with a customised XBMC install over the top. From the end-user perspective they barely know that the media player is a computer. It boots, it displays a splash screen and then XBMC starts. And if run from an SSD, the boot process is very, very fast.

The problem with OpenELEC is that is just doesn't pack a lot of features that I wanted out of a NAS. True, out of the box it provides basic file sharing. And there are a number of 3rd-party plugins available. But I wanted to be able to tinker in a way that OpenELEC just wasn't intended.

This lead me to my second realisation. The design of a media player and design of a NAS are (generally) two different things. The OS, as I pointed out, is fundamentally different. Yes, there are some shared functionality between OpenELEC and OMV. But they are designed with different goals in mind. The same goes with hardware. The HP N40L is a pretty nifty piece of gear and it doesn't look entirely out of place alongside a television. But my media player goes just one step further -- it uses a picoPSU which is super quiet, along with a Ninja Scythe CPU fan. It runs cool, and silent which is exactly what I want in the TV room. The HP on the other hand, has a slight hum, along with the clicking sound of 5 hard drives. It's not super loud or annoying, but it belongs in the study rather than the TV room.

The third realisation is that I wanted my media player and NAS to be separate from a risk mitigation point of view. If the NAS goes down, I can still load content on to the media player and play it locally. Similarly, if the media player goes down, I can just connect my laptop to the TV and continue to source content from the NAS.

Getting the Geography Right

My NAS is located right next to the router, so it makes sense to be connected via wired Ethernet. However, the media player is located in TV room, so I tried a few different options. Connecting by wireless is possible, but typically causes issues with high bit-rate content. Forget about streaming 1080p Blu-Ray rips over wireless. Theoretically, the wireless network can sustain speeds of up to 300Mbps, which should be enough for streaming most 1080p content. But in practice, I pretty much never get speeds like that.

If your content is mostly DVD rips and TV recordings, this should be fine. I tend to leave 1080p ripping for movies that really justify the higher resolution -- big, loud action movies with lots of movement and special effects. So the majority of my content is ripped at 720p or lower (in the case of DVDs). This means that quite a lot of the content can actually play quite comfortably over the wireless connection if required. I have a copy of XBMC on my laptop, and more often than not, it runs content stored on the NAS (over the wireless) quite nicely. But for bigger, meatier content (particularly 3D stuff), and for added reliability, you'll want a hard-wired connection.

I tried Ethernet-over-Power, which generally works ok but proved to be a bit flaky and has a habit of dropping out at inopportune times. Resetting the EOP devices usually resolves everything pretty quickly, but you really don't want to have to do this in the middle of a movie. I still use EOP plugs in a few parts of the house, but have decided against using it for the media player. So in the end, I just wired the media player directly the router -- thereby, creating a 1Gbps link between media player and NAS. This involved some careful routing of cables along the skirting boards to keep everything looking neat. Most hardware stores stock a variety of capping products designed to hide cabling in a neat and unobtrusive way, so check them out. I also painted the cable in some sections so it would blend in with the wall. It took a bit of effort, but in the end I think it was worth it. Take a bit of advice though, test your cables before going through the arduous process of neatly routing them around walls and behind furniture. Although the whole affair only took a couple of hours, I really wouldn't want to go through it all again.

Sunday, February 23, 2014

Tooling Around With Midnight Commander

If you followed my simple SSH guide, you'll be logging in to your OMV installation and probably coming across a BASH prompt.  For the unitiated, BASH can be a bit intimidating.  And, even once you are fairly comfortable with it, a command line prompt can be a bit painful for some tasks.  So on most linux systems, I like to install midnight commander.  On Debian, this can be done just via apt;

sudo apt-get install mc

Then type mc to run midnight commander.  Basically, this is a text-based file manager that can be used from a linux shell, such as via a PuTTY session.  Most common commands like copying, renaming, etc have been given short cuts.  And the two-window approach allows you to navigate through directories in one window, while keeping your path in another window fixed.  Switching from window to window is done via the tab key.  I find this particularly useful when re-organising files, where I may need to search through multiple directories to find various files before copying and consolidating them.

Midnight Commander vs Rsync

An occasional question in linux forums is what is the best and quickest way to copy files from one directory to another, across different hard drives.  And the most common answer is rsync. 

I'll put together a short tutorial on rsync a little bit later, but basically it is a brilliant little application for synchronising the contents of two directories, potentially across different file systems and across networks.  It really is one of the gems in the linux toolbox.  And the real power of rync is the ability to work incrementally.  That is, it only copies files as required.  So you can set up rsync to copy the contents of one directory across to another directory, but it will only copy the files needed -- if a copy already exists on the destination, copying won't occur (although, there are settings to override this).  Furthermore, rsync checks at the block level, so even if a file is present at the destination, if it is out of date it can be updated with the required blocks.  This is really useful if you have large files, with small changes.  Most backup software will require you to backup the entire file whereas rsync only requires you to back up the differential parts.

So rsync is pretty brilliant.  But my own experience is that it has a few limitations.  If you are transferring files from one directory to another, empty directory, then rsync doesn't have any differential functionality to perform.  And in these instances, I found rsync to actually be slower than a standard linux copy command. 

Recently I needed to consolidate all my movies, scattered across a couple of different directories on two different 2TB hard drives, into a new, single directory on a 4TB hard drive.  Most of these files are between 1GB and 7GB in size, but there are also hundreds of other small resource files such as thumbnails and nfo files which are used by XBMC.  Anecdotally, I found the small files copied at a similar pace whether using copy of rsync.  But the larger media files were noticably slower. 

In the end, I ran a copy across each directory in a single pass (using midnight commander).  Then, after all the files were copied, I ran rsync just to make sure midnight commander hadn't missed anything.  Over 3TB-3.5TB of data, shaving a minute or two per file can make a reasonably significant difference in overall time.

Wednesday, February 19, 2014

SnapRAID + Open Media Vault

Much as I have found Open Media Vault (OMV) to be an excellent choice for managing my NAS, it is not without limitations. At least, when compared with one it's main competitors...


For example, out of the box, OMV only supports fairly standard RAID options, with most NAS users using RAID5 or occasionally RAID6. With RAID5/6, you cannot mix and match drives of different sizes -- this is generally not an issue in a corporate environment, who buy hard drives in large batches. But in my experience, the typical home user cobbles together their first NAS using a mixture of discarded hard drives. As they need (and can afford), this is perhaps supplemented with whatever they can pick up on sale from their local computer shop. It's rarely an orderly process. unRAID's proprietary approach accommodates this well. You can mix and match drives of a variety of different sizes, adding new drives to the pool as required. With RAID5, parity information is striped across the entire pool of drives, whereas unRaid maintains the parity information on a single drive. The only requirement is that your parity drive is as large (or larger) than the largest non-parity drive in the array.
One of the other advantages of unRaid's non-striped approach is that reading data from a drive only requires a single drive to be spinning (ie, the one with the data). With RAID5, since your data is striped across multiple drives, reading a file from the array will require all the drives in the array to be spinning. Again, in a corporate environment this is usually not a problem... file servers are expected to run 24/7, and are designed accordingly. They are usually stored in a server room, where noise, heat and power consumption can be dealt with. In a home situation, this may not be viable.
I'm pretty lucky that I have a "man cave", where my NAS can be tucked away on a shelf. The room is pretty cool most days, so hard drive heat is rarely an issue. And it is sufficiently isolated from the rest of the house that the mild hum of a spinning drive doesn't cause issues. Not everyone is so lucky. About 70% of the activity on my NAS would occur between 6pm-10pm each evening... ie, after I get home from work. The rest of the time, it can mostly sit idle... with the exception of logging, cron jobs, backups, etc. The RAID5 approach means that all my drives are spinning pretty much 24/7, when they really don't need to. Is this really a big deal? No idea. But I wanted to explore some alternative options. Of course, I could always just switch to unRaid. But as per a previous post, it has some limitations.


FlexRAID aims to provide some of the benefits or unRAID, but takes a different approach. Whereas unRAID is a fully-fledged NAS operating system, FlexRAID is an add-on to an existing operating system. Both Linux and Windows are supported. Like unRAID, FlexRAID doesn't stripe your data, which allows you to have a variety of different drives, of different sizes. And, since the drives operate independently, only a single drive will spin up when data is being read. FlexRAID also offers a really nifty benefit over unRAID in that you can add an existing drive to the array and still retain the data on the drive. FlexRAID is proprietary software, with a typical ticket price of US$59 and they offer a 2 week trial for free. Unfortunately, after making an initial attempt to get set up, I wasn't able to really thoroughly test FlexRAID within my two weeks. So my knowledge kinda ends there. At the very least I can say it has a really natty interface :)


Like FlexRAID, SnapRAID provides parity-based redundancy. And, like FlexRAID it is available for both Linux and Windows. In fact, being an open source product, it can potentially be compiled for a wide variety of operating systems. With SnapRAID, you nominate the drives to be included in your "array", and one (or more drives) to store the parity information. A sync command can then be used to update the parity data.The requirement to update on command is an important distinction between SnapRAID and most other forms of RAID. If you have an error, you'll only be able to restore data from the last sync point. Typically, you would schedule syncs according to your needs -- maybe only once a day if your data doesn't change regularly. Of course, the more frequent your data changes, the more frequent you'll want to schedule syncs. But in most home situations, data on a NAS doesn't change that regularly, so this isn't a significant limitation.
On the flip side, SnapRAID has a nice little advantage over typical RAIDs. That is, you can actually restore files even if they aren't corrupted or damaged. In other words, it serves as a handy little "undelete" function. Again, you can only undelete files that have parity information stored. One of the really nice features of SnapRAID is that it works on existing drives and existing filesystems. This means that you can add parity protection to an existing drive without having to back-up your data, reformat the drive and then restore your data. This is something unRaid can't do. Furthermore, if you ever choose to stop using SnapRAID, or want to move your drive to another system, you don't really have to do anything. A drive currently being protected by SnapRAID can be read by another system, assuming the other system can understand the filesystem. There are some limitations; that being you really need to use either NTFS (for Windows) and ext4 (for *nix).
I've heard some criticisms of SnapRAID on the basis that it is a command-line driven application. This is true -- running SnapRAID, either through *nix or Windows, will typically require you to run some simple commands. None of them are overly complicated. That said, there is a GUI for Windows, called Elucidate. I can't comment on whether it is any good or not, since I'm not using SnapRAID on a Windows machine.


I followed the instructions found here. They required me to install some additional packages in order to compile SnapRAID. But if you want, these can be safely removed once SnapRAID is installed. The whole process probably took about 10 minutes in total.
Alternatively, Open Media Vault now has a SnapRAID plugin which integrates quite nicely with the OMV web interface. The plugin keeps the SnapRAID binary up to date without needing to compile from source. I've been using this plugin for several weeks now without any significant issues.