Hierarchical storage management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid state drive arrays, are more expensive (per byte stored) than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.
HSM may also be used where more robust storage is available for long-term archiving, but this is slow to access. This may be as simple as an off-site backup, for protection against a building fire.
HSM is a long-established concept, dating back to the beginnings of commercial data processing. The techniques used though have changed significantly as new technology becomes available, for both storage and for long-distance communication of large data sets. The scale of measures such as 'size' and 'access time' have changed dramatically. Despite this, many of the underlying concepts keep returning to favour years later, although at much larger or faster scales.
In a typical HSM scenario,[i] data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely used files are on tape, most users will usually not notice any slowdown.
HSM is sometimes referred to as tiered storage.
Hierarchical Storage Manager (HSM, then DFHSM and finally DFSMShsm) was first implemented by IBM on March 31, 1978 for MVS to reduce the cost of data storage, and to simplify the retrieval of data from slower media. The user would not need to know where the data was stored and how to get it back; the computer would retrieve the data automatically. The only difference to the user was the speed at which data was returned. HSM could originally migrate datasets only to disk volumes and virtual volumes on a IBM 3850 Mass Storage Facility, but a latter release supported magnetic tape volumes for migration level 2 (ML2).
Later, IBM ported HSM to its AIX operating system, and then to other Unix-like operating systems such as Solaris, HP-UX and Linux.
CSIRO Australia's Division of Computing Research implemented an HSM in its DAD (Drums and Display) operating system with its Document Region in the 1960s, with copies of documents being written to 7-track tape and automatic retrieval upon access to the documents.
HSM was also implemented on the DEC VAX/VMS systems and the Alpha/VMS systems. The first implementation date should be readily determined from the VMS System Implementation Manuals or the VMS Product Description Brochures.
Recently, the development of Serial ATA (SATA) disks has created a significant market for three-stage HSM: files are migrated from high-performance Fibre Channel storage area network devices to somewhat slower but much cheaper SATA disk arrays totaling several terabytes or more, and then eventually from the SATA disks to tape.
The newest development in HSM is with hard disk drives and flash memory, with flash memory being over 30 times faster than disks, but disks being considerably cheaper.
Conceptually, HSM is analogous to the cache found in most computer CPUs, where small amounts of expensive SRAM memory running at very high speeds is used to store frequently used data, but the least recently used data is evicted to the slower but much larger main DRAM memory when new data has to be loaded.
In practice, HSM is typically performed by dedicated software, such as IBM Tivoli Storage Manager, Oracle's SAM-QFS, Versity Storage Manager, Quantum, Novell's Dynamic Storage Technology (DST) on Open Enterprise Server (OES) Linux Platform, HPE Data Management Framework (DMF, formerly SGI Data Migration Facility), StorNext, or EMC Legato OTG DiskXtender.
The deletion of files from a higher level of the hierarchy (e.g. magnetic disk) after they have been moved to a lower level (e.g. optical media) is sometimes called file grooming.
HSM is often used for deep archival storage of data to be held long term at low cost. Automated tape robots can silo large quantities of data efficiently with low power consumption.
Some HSM software products allow the user to place portions of data files on high-speed disk cache and the rest on tape. This is used in applications that stream video over the internet—the initial portion of a video is delivered immediately from disk while a robot finds, mounts and streams the rest of the file to the end user. Such a system greatly reduces disk cost for large content provision systems.
The key factor behind HSM is a Data migration policy that controls the file transfers in the system. More precisely, the policy decides which tier a file should be stored in, so that the entire storage system can be well-organized and have a shortest response time to requests. There are several algorithms realizing this process, such as Least Recently Used replacement(LRU), Size-Temperature Replacement(STP), Heuristic Threshold(STEP) etc. In research of recent years, there are also some intelligent policies coming up by using machine learning technologies.
Tiered storage is a computer data storage method in which categories of data is assigned to various data storage medium to reduce overall storage costs, I/O performance, reliability, availability and serviceability. Data is tiered into various medium on the basis of system critical and mission critical characteristics, thus system software and business applications are prioritized. Applications that require high I/O performance retain data in solid-state drives, while applications that require moderate performance and non-critical data are retained in nearline storage medium such as SAS 10k or 10K RPM, and 7.2 K RPM HHD drives respectively. The basic idea is, mission-critical and highly accesses or "hot" data is stored in expensive medium such as SSD to take advantage of high I/O performance, while nearline or rarely accessed or "cold" data is stored in nearline storage medium such as HHD and tapes which are inexpensive. Thus, the "data temperature" or activity levels determines the primary storage hierarchy.
Note: Storage Tiers are not delineated by differences in vendor, architecture, or geometry except where those differences result in clear changes to price, performance, capacity and function.