Capacity optimization is made up of differing and yet often complementary methods of both storing data and reducing storage needs when making backups. Often, businesses and individual enterprises make multiple backups of work, and the need to store, index and retrieve data require optimization to reduce the amount of hardware and resultant overhead needed to handle all that data. When backups are made, there are often redundancies and only small changes between backups. In the light of redundancies, capacity optimization strategies devise solutions that reduce storage costs and size in backups reduced from the originals as much as 95 percent. Capacity optimization is sometimes known as bandwidth optimization when used in wide area networking (WAN) application to enable greater throughput when transmitting and receiving data on a network.
Data compression generally makes use of encoding techniques to reduce the size of data being stored or transmitted. Depending on whether some data is discarded in the process, it may be characterized as lossy — losing data — or lossless. Scanning the data for redundancies or repetition and replacing these with cross-referenced and indexed tokens allows for large reductions in the amount of storage space needed. Data suppression codebooks guide accelerators in communication to synchronize and use either memory or a hard disk to write compression histories into a storage repository enabling a transmission control protocol (TCP) proxy to be used as a buffer of packets or sessions so that transmission speeds are not reduced. Another method of data compression reduces the size of data in real time as it goes to its first backup, and thus through further optimization, resulting in larger savings in both space and time.
Using the traditional means of compression can reduce the size of stored data in a ratio of 2:1; using capacity optimization can increase this reduction to as much as 20:1. Looking for redundancies in byte sequences across comparison windows and using cryptographic hash functions for unique sequences in algorithms for deduplication allows for segmenting streams of data. These stream segments are then assigned unique identifiers and indexed for retrieval. By this means, only new data sets are stored before being compressed further using compression standards algorithms. Some deduplication methods are hardware-based, and combining them with traditional software compression algorithms allows the functions of both to produce substantial space- and time-savings.
Many approaches focus on reductions in cost and space of storage capacity to reduce the costs associated with storage infrastructure, and similar considerations arise in WAN scenarios. A layer known as a transport layer must exist between applications and underlying network structures during transmissions, enabling data to be sent and received efficiently and speedily, yet the transport layer is still the one created in 1981 when TCP was first created and ran at 300 baud rate. Therefore, accelerators use TCP proxies, reducing losses during transmission and making acknowledgments to increase the size of packets using advanced data compression methods to deliver more data per time segment. To overcome obstacles during transmission, these techniques cooperate cohesively to improve the performance of applications and reduce the amount of consumption of bandwidth.