This week I made the move to migrate from opennas to Truenas Scale.

The move went very smoothly and was much easier than expected.

I opted to go for Scale rather than Core as firstly I’d rather be on a Linux derivative than BSD and secondly Scale is the future of Truenas.

Motivation for change

Everyone raves about zfs, the filesystem that underpins Truenas. This gives access to the snapshot feature. This is super cool as it allows the system to cut a new snapshot at defined /manual periods that can be restored should accidental/malicious deletes occur. The snapshots also flow nicely into Windows showing as different versions for a given file. Finally those snapshots don’t require full recreations of the dataset. I.e if I’ve deleted one file from my datataset, the snapshot only is the size of that one file! This is awesome.

Redundancy was something I also realised I needed. I’ve spent a lot of effort on protecting my data and I realised having zero redundancy was too high risk for me.

Selecting vdevs

Getting my head around zPools, vdevs and the corresponding array types was the most complicated aspect Ed of Truenas.

My take on this is that a vdev stands for virtual device. Think of it like its own drive. These compartmentalise to create your pool. The fundamental here is if any vdev fails them your whole pool goes down. Redundancy exists inside the vdev not the pool!

A note here on future proofing. Expanding a vdev is not possible. Instead you can add more vdevs to the pool to increase capacity. Note as mentioned above any vdev dies the whole pool fails!

This is why I opted to start my array with larger drives to give me more time.

I’m running 3x8TB drives in a raidz1 array.

Issues with expansion

Hardware wise there are some constraints to take into account.

  • number of sata ports on your motherboard. Remember the OS drive takes one sata.
  • number of physical bays to load in your drives
  • power consumption. My guess is a 20TB drive uses the same power as an 8TB drive. Having 6x4TB drives must use more power that 3x8TB drives. I’d reccomend factoring this into your calculations.
  • cooling. Harddrives dont like being kept under hot temps for sustained periods. I noticed in my case there was a lack of Airflow and so more fans were needed

Types of Vdev

There’s an article commonly linked to on reddit that indicates that the best truenas array to get is a mirror, and then add subsequent mirrors as expansion is needed. I didnt like this approach as it means you need 1 disk for redundancy for every drive actively used which is too costly. From my calculations, RaidZ1 with 3 disks is an ‘acceptable’ risk. I also considered RaidZ with 6 disks but this sucks up more power, requires conversion of the 5.25 inch cdrom bays (more cost and time) plus means I’d need to attempt to mount my OS drive via a USB which is another risk.

The concern that is widely discussed online is that Unrecorable read errors (UREs) occur and as drives get bigger and bigger the chances of hitting UREs increases when doing a resilver. This is where a disk is replaced in the array. If UREs are encountered then this can either result in data loss (in Raidz1 if we’ve lost of the disks already) or potentially fail the array from being rebuilt. The other concern is that if a second drive fails and you’re running Raidz1 then you’re straight out of luck and it’s bybye data!

EEC Memory

[https://en.wikipedia.org/wiki/ECC_memory]

EEC stands for error correction code. This ensures that the RAM has the ability to check itself for data errors and prevent issues where the memory becomes faulty. I’ve heard stories where scrubs fail due to bad RAM which causes the scrub to start adjuting the data on disk (corrupting everything). If I was building my system again, I’d use EEC. The fact of the matter is, you don’t need to use EEC with RAM and many other NAS systems also use non-EEC memory as well as our desktop units without issues. Utilities such as [https://www.memtest86.com/] can be used to check RAM for errors and from what I hear, the failure rate of RAM is very low

Stress testing of drives

In addition to stress testing RAM as mentioned above, we should really Stress test hardrrives. As part of my Truenas migration, I had another drive fail on me in my Synology. This was encounterred after about 10h of copying data. My Synology started beeping indicating the volume crashed while Rsync gave a failure. SMART checks for the drive indicated ‘Failing’. This HD was brand new. My concern is that should this drive have had minimal use, then any faiures wouldn’t have been picked up until it was too late. As such, for important projects I’d highly reccomend stress testing drives. This can be done with [https://wiki.archlinux.org/title/badblocks]

Importance of alerting

I can’t recommend enough the importance of email alerts from truenas. As soon as you start seeing smart errors you should look into changing the faulty drive. Further, truenas needs scrubbing. This is a process often run on monthly intervals to verify the integrity of your data as well as fix. During scrubbing this is often where drives issues are picked up as this is an intensive task. Failing to do data scrubbing is asking for problems.

Lessons learned

  • Plan for the future to avoid wasted time/money. Go BIG early (within reason!)
  • Get different models/age drives to avoid hitting common failure points
  • Remove dust from old units more frequently than you think. This kills drives
  • Scrubbing/SMART checks are your friend
  • Email alerts or it didn’t happen and your Raid array will break
  • Stress test drives where confidence is important

Final thoughts

I’d whole heartedly reccomend others to try truenas scale. Given that these operations take a reasonable amount of time to get setup I’d reccomend trying to future proof your nas. Finding out you’ve undersized your capacity and have filled up your nas int he first 6months is frustrating! I’ve attempted to get different drive models/ages in my array. Running a set of drives all of the same age and model type is high risk as the chances are significantly higher of all drives failing at the same time. Its still unclear to me whether I should have gone wit raidz2 to increase the redundancy of my vdevs. Overall, I didn’t want to spend more on this projects and my truenas will have a backup plus cloud for important data. I guess time will tell!