Despite the many changes in data storage over the decades, some fundamentals remain. One of them is to access the storage using one of three methods – block, file and object.
This article will define and expand on the characteristics of these three, while also looking at the on-prem and cloud products you’ll typically find that use file, block, and object storage.
What we’re seeing is that while off-the-shelf (typically) hardware block, file, and object form factor storage products are available, these types of storage access are also being offered in the cloud to serve the workloads out there that require them.
The rise of the cloud has also led to hybrid – data center and cloud – and distributed forms of file and object storage.
So while file, object, and block are long-standing foundations of storage, the ways they are deployed in the cloud era are changing.
File and block: whole and part
The file system has always been a mainstay of storage technology. File access locking and storage offer two ways to interact with the file system.
File access storage is when you access entire files through the file system. This is usually via network attached storage (NAS) or a connected network of scale-out NAS nodes. Such products come with their own file system on board and the storage is presented to applications and users in the drive letter format.
With locked access, the storage product – typically deployed on-premises in a storage network (SAN) systems, for example – addresses only storage blocks in files, databases, etc. In other words, the file system through which applications talk is higher in the stack.
File systems provide all kinds of advantages. Among the most famous is that this is how most enterprise applications are written – and it’s not going away anytime soon.
A key feature of filesystem-based methods is that there are methods—such as those found in the Posix command set—to lock files to ensure that they cannot be simultaneously overwritten, at least not in ways that damage the file or the processes around it.
File storage can access entire files, so it is used for general file storage as well as for more specialized workloads that require file access, such as in media and entertainment. And in its extended NAS form, it is a mainstay of large-scale storage for analytical and high-performance computing (HPC) workloads.
Block storage provides application access to the blocks that contain the files. This could be database access where many users are working on the same file at the same time and from possibly the same application – email, enterprise applications such as enterprise resource planning (ERP), for example – but with subfile-level locking.
Block storage has the great advantage of high performance and no dealing with metadata and file system information, etc.
File and block: cloud and distributed
File storage still exists in the standalone NAS format, especially at the entry level, and scaled-out NAS designed for local deployment is commonplace.
But the advent of the cloud and its tendency to globalize operations changed things and had a double effect.
On the one hand, there are a number of vendors that offer global file systems that combine a file system distributed across a public cloud and local network hardware, with all data in a single namespace. Vendors here include Ctera, Nasuni, Panzura, Hammerspace and Peer Software.
On the other hand, all the key cloud providers—Amazon Web Services, Google Cloud Platform, and Microsoft Azure—offer their own file access storage services, as does NetApp’s, in the case of AWS. IBM also offers file storage through its cloud offering.
Block in the cloud
Some storage vendors, such as IBM and Pure, offer instances of their block storage in the cloud. All three big ones offer cloud block storage services aimed at applications that require the lowest latency, such as databases and analytical caching, as well as virtual machine (VM) operation.
Probably due to the nature of block storage and its performance requirements, distributed block storage doesn’t seem to have taken off the way it did with file.
Object storage: a world apart
Object storage is based on a “flat” structure with objects accessed through unique identifiers similar to domain name system (DNS) method of accessing websites.
For this reason, object storage is quite different from the hierarchical, tree-like structure of a file system, and this can be an advantage when datasets grow very large. Some NAS systems experience strain when they reach billions of files.
Object storage has file-level equivalent data access, but without file locking, and often more than one user can access the object at the same time. Object storage is not strongly consistent. In other words, it is eventually reconciled between the existing mirrors.
Most legacy applications are not written to store objects. But far from necessarily being a disadvantage, historically object storage has actually been the preferred storage access method for the cloud era. That’s because the cloud as a whole is much more of a stateless proposition than the legacy enterprise environment, and it also includes probably most of the storage offered by the big cloud providers.
In addition, objects in an object store offer a richer set of metadata than in a traditional file system. This makes data in the object store also suitable for analytics.
Object in the cloud – and locally with a file
The cloud is the natural home for object storage. Most storage services offered by cloud providers are based on object storage, and this is where new de facto standards such as S3 have emerged.
With its easy access to data that can happily exist as largely stateless and ultimately consistent, the object is the bulk storage of the cloud era.
You can get object storage for on-premise deployments, such as Dell EMC’s Elastic Cloud Storage, which is only for data center deployments. Meanwhile, Hitachi Vantara’s Hitachi Content Platform, IBM’s Cloud Object Storage, and NetApp’s StorageGrid can work in hybrid and multi-cloud scenarios.
Some dedicated object storage providers, such as Cloudian and Scality, offer on-premises and hybrid deployments.
And in Scality’s case, along with Pure Storage (and NetApp, to some extent), converged file and object storage is possible, the rationale here being that customers increasingly want access to large amounts of unstructured data that can are in file or object storage formats.