Increasingly, object storage is being heralded as an inexpensive, scalable, self-healing, multi-tenant platform for storing the exabytes of the unstructured information we generate every day. But what should end-users be looking for from an object storage platform, and what key requirements are needed to ensure it delivers on the promises outlined above?
Object storage is essentially just a different way of storing, organising and accessing data on disk but in a much more scalable, cost-effective way. Like files, objects contain data, but unlike files, objects are not organised in a hierarchy.
Every object exists at the same level in a flat address space called a storage pool and one object cannot be placed inside another. Objects are immutable, making this storage methodology perfect for long-term retention of data archives, analytics information, and service provider storage with SLAs associated with data delivery.
Both files and objects have metadata associated with the information they contain, but objects are characterised by their extended metadata. Each object is assigned a unique identifier that allows a server, or end user, to retrieve the object without needing to know the physical location of the data. This approach is useful for automating and streamlining data storage in cloud computing environments.
Object storage has the potential to enable customers to build a highly reliable, infinitely scalable and efficient storage pool for all their unstructured data needs. However, for this to be realised, object storage platforms need to meet all of the essential requirements and provide a set of tunable parameters – yet few platforms available today live up to that promise. In essence, there are five key requirements customers will need in order to define the architecture of their scale-out storage infrastructures: efficiency, scalability, reliability, accessibility and performance.
Most object storage platforms claim to be the most cost-efficient solution on the market, and many platforms will live up to that claim, but only for a very specific use case and a very limited range of parameters. What is more difficult is to find the platform that offers the overall best efficiency, including infrastructure and management costs, as well as bandwidth consumption.
Object storage was purposefully designed for very large volumes of unstructured data, with unlimited scalability as the ultimate objective. Inherently, storage platforms can be scaled in three dimensions: the total volume of storage, the number of objects and the number of sites. The number of objects tends to be a particular challenge when objects are very small or the ratio of small vs large objects cannot be accurately predicted. Applications may not initially have high scalability requirements for all three dimensions, but over time those requirements can change due to external elements, so it's important to build in all of these three scalability dimensions from the start.
Central to any storage architecture is reliability. Whether you are building a low-cost archive or a high-performance storage cloud, reliability is key. But there is a lot more to data reliability than most storage providers like to admit. Data reliability dimensions are: availability (the time your data is instantly available for access), durability (the guarantee that your data will not be corrupted) and integrity (the assurance that your data will remain unchanged and cannot be tampered with). Security is a feature inherent to data integrity. Only when reliability is provided in all three dimensions can you achieve true data reliability. Most platforms provide acceptable reliability grades on one, sometimes two, of the above.
One of the key benefits of true object storage is the absence of file systems. This also creates a challenge: the accessibility of the data. Typically, object storage is accessed through applications, which use application protocol interfaces (APIs) to interact directly with the back-end storage pool. Several attempts have been made to have the industry agree on a standard object interface, with mixed success. The Amazon S3 and Openstack Swift APIs are currently seeing the widest adoption, but it still remains to be seen if one of them will ever become a true standard and an open API. Therefore, it is important for object storage platforms to provide wide support for multiple protocols and applications, including file system gateways to integrate with legacy applications.
Storage performance is measured in throughput, IOPS and latency, however most platforms are not optimised for small objects, so IOPS tends to be neglected in performance conversations. Similarly, most object storage platforms do not allow for latency optimisation, which is the reason they do not mention latency in their performance conversations. So it is important to look for solutions which monitor all three performance measurements.
Object-based storage is experiencing huge growth, as customers realise its benefits as an inexpensive, scalable solution. According to IDC, by 2017 this particular market will be worth $38 billion (£23 billion), having outgrown the overall enterprise disk storage systems market back in 2013.
For end users looking at object storage solutions for the first time, following the five key requirements will allow a successful 360 degree approach to the storage of the exabytes of unstructured information generated every day.
Molly Rector is the chief marketing officer at DataDirect Networks