Custom filesystems as optimal solutions for huge media data archives and storages


During the last two decades, the volume of industrial video and audio recordings has been growing exponentially. IDC, a leading market research firm, estimates that the total amount of stored digital information is around 2.8 exabytes (2.8 million terabytes) with tenfold expected growth every five years! This huge and growing volume sets difficult tasks before the industry of media archive management (MAM). The amount of data, both current and archived, stored by a single TV production company, can soon reach thousands of terabytes (a petabyte). This information needs to be securely stored, managed, and most importantly, quickly and efficiently searched through and accessed.

In this paper we will not discuss the pros and cons of different media storages. Instead, we will explore the problems and solutions of archived storage management. The main issues with storage and archive management may be roughly classified into four categories.

  1. The unusually big and growing volume of storages: The storage size requirements for video and audio increase constantly mainly because of two factors: increase in broadcast volume and increase in definition. There is no doubt that the advent of High Definition TV will not end the "quality race," and picture improvement requirements and expectations will grow to match at least those provided by 3D OMNIMAXes.
  2. The heterogeneous and distributed nature of archives: While storage media itself becomes less and less expensive, the cost of an upgrade of the total archive to a new type of media is prohibitive for any media company. It is a ubiquitous situation that archivists and engineers face problems of managing and integrating the data stored in legacy formats, such as tapes with up-to-date archives on hard disks, solid media, etc.
  3. Need for fast search and retrieval based on metadata: Fast search and retrieval of information from distributed heterogeneous archives is a headache for archivists and company IT personnel. Data accumulated over the decades needs to be quickly and efficiently found in archives by brief descriptions, keywords, date of creation, etc. TV production engineers also benefit from short low-resolution extracts of footage, called proxies, which the majority of outdated storages fail to provide.
  4. Reliability and impenetrability of storages: The last but not least problem is insuring that all the data is stored safely and securely. By safety, we understand the protection against data loss due to system malfunction(s). Any archival system should be able to mitigate hardware failure and restore with minimal effort data lost partially or completely. Data security functions protect data against an intruder striving to obtain and use data illegitimately. Since investigations and footage can be confidential or damaging to the company or a third person, the issue of safeguarding them is of utmost importance.

There are several solutions on the market, addressing all or a majority of the problems discussed above. Some approaches claiming to be universal are rather fragmentary, trying to reconcile legacy with current and future requirements.

Another possible solution to the major storage and archive problems is the use of proprietary filesystems. Huge data storage requirements may be easily and comfortably accommodated by a filesystem. Proprietary filesystems present data stored elsewhere as a file and folder similar to "ordinary" filesystems. This provides:

  • Either a convenient way to place or retrieve complex data into a single virtual storage
  • A user mode interface presenting data stored elsewhere as a file and folder similar to "ordinary" filesystems. Neither user nor legacy third-party application have to deal with how these data are actually stored and retrieved. This task is left to developers implementing all required callback functions.

The same approach — the use of a proprietary filesystem — helps with heterogeneous legacy storages. There is no need to invest large sums of money to transfer all archives onto storage media once or twice in a decade. A filesystem based on callbacks allows easy extendability without sacrificing past development. When new storage formats or attractive media opportunities appear, only several new functions need to be implemented, while the system as a whole continues to work with minimal, if any, service interruptions.

Needless to say, the handling of metadata by widespread popular filesystems is less than efficient. There is no easy way to search for a necessary piece of information by user-defined, non-standard, multi-modal annotations. Efficient implementation of a custom task-specific filesystem will permit the creation and storage of any metadata associated with original footage: starting from additional manual annotation to complex proxies, extracts, etc. Any sort and retrieve operations based on these metadata will not require development and implementation of additional technologies.

Data integrity and reliable storage may be assured by using the journaling function provided by custom filesystems. Constant monitoring and recording of all data-related operations makes failure recovery a feasible task. This journaling can be performed on-the-fly without human involvement and in a resource-economical way, thus making continuous data protection possible.

Data security is assured by the use of modern encryption/decryption algorithms. Many symmetric-key-based tools, included as a standard option in custom filesystems, may quickly and easily encode and decode files of any type during the upload or download process. This will efficiently protect the data from many attack vectors, and save the human factor.

Temporary data integration solutions designed to patch the current deficiency are not expendable and may be cost-effective only in a very short run. Modern developments of filesystem software significantly simplify the development, implementation, and use of large archival systems with legacy issues, such as media data storages. These technologies help to bridge the legacy chasm in the most efficient way.

We appreciate your feedback. If you have any questions, comments, or suggestions about this article please contact our support team at support@callback.com.