It has been a little bit since I was at HPE discover talking about their StoreVirtual product and I have spoken to others about software defined storage and their thoughts/experiences. Before I go into software defind storage though it seems most people can’t stop talking about dedup, compression and tiering, but one thing that is prevalent - Most of the people I have spoken to really don’t understand what these tech’s do for them apart from save space and how it affects them.
There are many storage venders touting some very good savings and performance around what they can do and that pure flash is a must buy, even more when looking into VDI (virtual desktops). Now in the SMB market I find this particularly interesting about how these marking departments come up with these statements and how they backup their numbers.
So lets break these down:
	So to represent this I have 5 x 2TB LUNS adding up to 10TB. Without thin provisioning this would not be possible, but with Thin Provisioning I have ~4.1TB of data actually in use so I would still have ~900GB left of data space.
- Thin provisioning - Metaphorically your car has a 55 litre tank but you only fill 44 litre of full. So what I mean by this is that you provision say a 100GB LUN to your server on at 5TB SAN but you only store 40GB of data. When thin provisioning is enabled the reaming 60GB space in this provisioned LUN isn’t reserved and you could use this space on another LUN.

- Dedup is a little more complicated than thin provisioning as vendors have slight twists that I won’t go into. However to explain this simply it looks at the data during writing to disk of after looking for duplicates. This isn’t a file level process such as a word doc more like based of 1’s and 0s, so some vendors looks at 8kB block and some others other sizes. Overall it looks for data that is the same and only stores it once.
.png)
- Compression is in simple description quite simply like you zipping up your files. It takes the raw data and compresses it.
Almost all enterprise SANS and even NAS’s support thin provisioning these days and is a very use full tech. Personally I don’t really see this as a major space saving tech as is isn’t reducing your data on the disk just merely making the techies administration easier by allow LUNS much bigger than needed for the “WHAT IF”.
With dedupe and compression this requires large amount of processing power due to the overhead of matching data sets or compressing along with large amounts of RAM compared to traditional SAN controllers. The more I/O (data requested or written to the storage array) and the size of the disks only increases the amount of RAM and CPU required.
Now I have explained that for the ones that don’t you know I come on the pure flash systems.
Flash is still more expensive than spinning disk per GB by about ~10X but if you look at I/O then flash comes into it’s own and this then makes flash ~10X cheaper over spinning disk. Due to the cost of flash and most SMB not needing such I/O flash vendors are trying to make their product seem better value using the above technologies, such as they quote the effective capacity. This is derived from some simple maths X% saving via thin provisioning, X% via Dedupe and X% via compression, sounds cool hay………. Yes but the key thing is the X% as we are individuals and so is our data set/usage. Due to the X% they are using could give a very different “effective” capacity that you see when putting your dataset on. So in reality is that yes there are savings to be had but without testing on your dataset you won’t know what space savings you will make, and in most circumstances with normal SMB datasets I haven’t seen the savings they have touted, and this effectively increases the real cost per GB for flash.
Taking into all above I come back to software defined storage, but more importantly what it enables on Hyper-Converged systems. Hyper-Converged systems are storage, compute and networking all on one box (saving rack space and power). Now the key element to think about with in Hyper-Converged is making sure you don’t burn too many resources on the storage and networking leaving little left for the virtual machines you actually want to run. There is a vendor (I won’t talk names) that offers dedupe compression with Hyper-Converged platform which in principal sound great doesn’t it.. everything in a single box with all the toys. Well yes and no.
The key element here to think about are your resources for your actual uses such as:
- Disk I/O
- CPU needs for VM’s
- RAM
- Licensing (almost all I have spoken to don’t consider this)
As mentioned above Dedupe and compression take some CPU cycles to work and the more I/O the more CPU needed. As such you may be in a position where your storage is taking most of the CPU vs what you need to run your actual Virtual machine which is the whole point right? I have seen situations where companies have had to purchase more nodes with Hyper-Converged systems than they needed with a traditional SAN which makes the “TIN” more expensive and the increase in licensing the hypervisor. This then really outweighs the benefits unless you like the management side and you feel that is worth the additional cost.
They key thing about Hyper-Converged is that simpler is better. It doesn’t need to be a Swiss army knife as the cost savings and ease of management is already there. Whether you build your own or buy a certified one think about what you actually need and your use case.
Overall the key point is look through the marketing babble and think what you actually need. If you only have 3TB of data the cost of dedupe in 95% of situations isn’t going to provide any benefit.