File storage strategy

Categories How to, SysAdmin

Hi there!

These last few days, I’m emerging from the soil my army of hard disk that contains all my data for the past 10 years. I had been totally discouraged by the amount of information, and huge quantity of duplicated files due to multiple backups.

As I’m storing a lot of files in a year, I searched a good way to organize, sort and backup all my documents, videos and music in a logical way. The purpose is also to reduce risks of losing files (after a hard-disk failure for example). If you are or were in the same case, this little guide is for you! Throughout this bunch of lines, I will show you how to keep the control on your files and your workspace, and how to jungle with your personal and professional data.

Know your data

It seems to be obvious to many of you. Sometimes, we don’t realize the quantity and the variety of files we have. The first step is quite simple : take a paper and a pen, and plug your hard-disk one by one. Note down their model, the size of the hard drive, free space, and what they contain. For example :

Seagate 500Gb (IDE): 154Gb used

-Backup from 12th. February 2012
-Some music given by Alex
-Photos
-First C programming project

Western Digital (SATA):
-Blah blah…

Often, you will not have enough space on one drive to put all your file, even after the final step of classification. It is for this reason it is interesting to know the size of all your hard disk.

Categorize

As I said in the introduction, your PC could be shared between your professional life and your geek life (Oh! I beg your pardon, I should say ‘private life’ 😉 ). After reading many advises on forums and article on the web, I decided to put each folders/files in a specific root category:

  • confidential : here are all your files you want to keep secret. This category contains scans of your official papers (ID card, administrative forms, etc.), passwords, mails, and other stuff like that. In the final step, these files should be encrypted, because you need to keep the control on them.
  • private : here come all files that are not confidential, but personal, because they are related to your life. For example, your programming projects, your holidays photos, your contact list, etc. You can split this category in two : ‘personal’ and ‘professional’.
  • public : files that “you don’t care” but you need to keep, and that are retrievable through the web. Here, I put all my movies, wallpapers, music and softwares. When I’m putting a file in this category, I know that I could share it with every one.

Define your folder architecture

Before sorting all your files in these 3 folders, you should define a file architecture, that is, a list of rules you’ll use to classify each file regarding its file type or the sub-category it belongs to. In general, I strongly advise people to be careful about the case. Especially if you planned to store all your file on a Linux system (E.G : a server), you should always use LOWER-CASE for folder’s name. The exception is for music and movie libraries where lower-case can decrease the readability (because there are many, many items).

1. Define a semantic

You have to define a semantic for the different folder’s name you’ve got on your computer. For example, you will not sort your holidays pictures as the same way as your music library. Following are some suggestion I did for my folder architecture.

Music :

<music folder>/<Artist Name -or- Band Name>/<Album – [YEAR]>/01 – MytrackNoOne.mp3

Pictures :

<year>/<Month-Content of the folder>/filename.jpg

Regarding pictures, this classification allows to quickly sort groups of pictures by dates. It is because pictures are linked to event of our life, so they have to be sorted by dates.

2. Make categories

Now, it’s important to define some categories which will be under our root categories. If you need some inspiration, you can look at libraries defined in Windows. Here is how my categories look likes on my computer. You should be able to see all categories in your explorer’s window without the need of scrolling. If you have to, you have probably too much categories. Try to regroup some of them.

3. Time to sort!

This is the longest and the most boring part of the work : according to your root-categories and categories you just created, you need to sort your files. Be clever when you’re creating new folders, and try to regroup folders by theme. When you’ll copy files, even cutting them, you have to know that you will probably loose the timestamps of your folders. It’s why I’ve created a little program called RobAutocopy that prevents this by copying timestamps as well.

4. Accessibility

Of course, it’s rarely possible to store all your files on the same hard-drive/device. You’ve to split your data among different storage, it could cause conflicting and/or accessibility problems. It’s why you have to decide which files you’ll often use, and those you will not. It will help you to define an archiving policy.

Servicing your files

This part is dedicated to those people that own a private home server that is dedicated for file sharing. I will expose some ways to share content depending on their nature.

1. For copying (all types + backups)

For a local network, the best way for sharing files is the SMB/CIFS protocol, proudly served by Samba on Linux. It will take an advantage of your LAN speed (100Mbit/s or 1Gbit/s) as the protocol overhead is relatively light. (if you have only Linux systems, NFS could be a good alternative as well).

Here again you can have a lot of different configuration regarding users you will create, but this is how I designed mine:

Samba user Confidential Private Public Password Description
louisbob No Yes Yes Yes Access to everything (public, personnal and backup folders). This is the main account, that you should use in your daily usage. This user should be read-only on the “backup” folder.
backup No No No Yes The backup user has read and write access on the backup folder. This user prevents the main user to involuntary delete a backup folder.
guest No No Yes facultative Mainly used when your friends want to steal you some music or movies on your public folder. Everything should be read-only to avoid disasters.

The default SAMBA configuration is not really optimized for a small amount of users. If you notice slow transfer rates, you can tune Samba to get better performances : https://wiki.amahi.org/index.php/Make_Samba_Go_Faster

2. Sharing movies

Movies should be accessible from everywhere. It’s why I strongly advice the couple FTP + Plex Media server. FTP helps you to pick up a movie quickly while you are not at your place, and Plex is extremely powerful if you need a plug and play server that fetch movie details on online database. Plex is accessible through a dedicated application, Plex Home Theater, and you can also download the Android or iOS application to stream directly on your mobile phone. otherwise, you can still access to it via your web browser. Finally, reencoding video on the fly is another advantage of Plex.

Guides for Plex installation :

A quick tour of Plex : http://laurendc.net/2014/04/07/set-up-plex-media-server-on-debian-wheezy-7-4/

Installation guide for Debian 7 (in French) https://www.kassianoff.fr/blog/fr/installation-de-plex-media-server-sur-debian-7

Resources :

http://www.hongkiat.com/blog/5-effective-ways-to-keep-your-files-under-control/

I’m passionnated


Leave a Reply

Your email address will not be published. Required fields are marked *

Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /var/www/vhosts/owl-black.net/httpdocs/wp-content/plugins/sweetcaptcha-revolutionary-free-captcha-service/library/sweetcaptcha.php on line 81 Warning: fsockopen(): unable to connect to www.sweetcaptcha.com:80 (php_network_getaddresses: getaddrinfo failed: Name or service not known) in /var/www/vhosts/owl-black.net/httpdocs/wp-content/plugins/sweetcaptcha-revolutionary-free-captcha-service/library/sweetcaptcha.php on line 81