So to represent this I have 5 x 2TB LUNS adding up to 10TB. Without thin provisioning this would not be possible, but with Thin Provisioning I have ~4.1TB of data actually in use so I would still have ~900GB left of data space.
Almost all enterprise SANS and even NAS’s support thin provisioning these days and is a very use full tech. Personally I don’t really see this as a major space saving tech as is isn’t reducing your data on the disk just merely making the techies administration easier by allow LUNS much bigger than needed for the “WHAT IF”.
With dedupe and compression this requires large amount of processing power due to the overhead of matching data sets or compressing along with large amounts of RAM compared to traditional SAN controllers. The more I/O (data requested or written to the storage array) and the size of the disks only increases the amount of RAM and CPU required.
Now I have explained that for the ones that don’t you know I come on the pure flash systems.
Flash is still more expensive than spinning disk per GB by about ~10X but if you look at I/O then flash comes into it’s own and this then makes flash ~10X cheaper over spinning disk. Due to the cost of flash and most SMB not needing such I/O flash vendors are trying to make their product seem better value using the above technologies, such as they quote the effective capacity. This is derived from some simple maths X% saving via thin provisioning, X% via Dedupe and X% via compression, sounds cool hay………. Yes but the key thing is the X% as we are individuals and so is our data set/usage. Due to the X% they are using could give a very different “effective” capacity that you see when putting your dataset on. So in reality is that yes there are savings to be had but without testing on your dataset you won’t know what space savings you will make, and in most circumstances with normal SMB datasets I haven’t seen the savings they have touted, and this effectively increases the real cost per GB for flash.
Taking into all above I come back to software defined storage, but more importantly what it enables on Hyper-Converged systems. Hyper-Converged systems are storage, compute and networking all on one box (saving rack space and power). Now the key element to think about with in Hyper-Converged is making sure you don’t burn too many resources on the storage and networking leaving little left for the virtual machines you actually want to run. There is a vendor (I won’t talk names) that offers dedupe compression with Hyper-Converged platform which in principal sound great doesn’t it.. everything in a single box with all the toys. Well yes and no.
The key element here to think about are your resources for your actual uses such as:
As mentioned above Dedupe and compression take some CPU cycles to work and the more I/O the more CPU needed. As such you may be in a position where your storage is taking most of the CPU vs what you need to run your actual Virtual machine which is the whole point right? I have seen situations where companies have had to purchase more nodes with Hyper-Converged systems than they needed with a traditional SAN which makes the “TIN” more expensive and the increase in licensing the hypervisor. This then really outweighs the benefits unless you like the management side and you feel that is worth the additional cost.
They key thing about Hyper-Converged is that simpler is better. It doesn’t need to be a Swiss army knife as the cost savings and ease of management is already there. Whether you build your own or buy a certified one think about what you actually need and your use case.
Overall the key point is look through the marketing babble and think what you actually need. If you only have 3TB of data the cost of dedupe in 95% of situations isn’t going to provide any benefit.