Compression is the process of removing redundant information to reduce file size. The compression algorithm is the math behind a compression program.
Types of compression
There’s two types of compression: lossy and lossless. Lossy compression discards information that seems useless. When you take a photo, for example, you usually don’t need perfect fidelity or a perfect 1:1 recreation of what the camera saw. You’ll often see lossy compression used for media files.
Lossless compression only removes data if it can be perfectly recreated at decompression. For example, audiophiles use FLAC instead of MP3 because it compresses audio without throwing away data. If you’re compressing backups, log files, configurations, or anything else you need a perfect copy of, lossless compression is your only option.
Lossy compression examples: JPEG, MP3, WebP, MPEG
Lossless compression examples: ZIP, bzip, gzip, xzip FLAC, AAC, PNG
Algorithms: LZ, BW, LZMA
| LZ (gzip) | Good speed, good compression, very broad support |
| BW (bzip) | Slower speed, better compression, not as widely supported |
| LZMA (xzip) | Best speed, best compression, not as widely supported |
| PKZIP (ZIP) | Microsoft’s compression algorithm, widely used in Windows |
tar (short for tape archive) is used to combine multiple files into one file, making it possible to store on storage tapes. The tar process itself doesn’t do compression. It just concatenates files together, putting one after another, with some headers thrown in. However, tar does support command options that use gzip/bzip/xzip to compress the .tar file and add a second extension. This is why you might see .tar.gz or just .tgz. Remember that Linux isn’t picky about filenames, unlike Windows.
tar options
| tar option | Description |
|---|---|
| c or -c | Create a tape archive |
| t or -t | List the files in an archive |
| x or -x | Extract the files in an archive |
| v or -v | Verbose mode, has to be combined with c/t/x, prints the name of every file to the terminal |
| f or -f | Saves the archive to the filename given after -f |
| z or -z | Compresses the archive using gzip |
| j or -j | Compresses the archive using bzip2 |
| -C | Extracts to a target folder given after -C |
| For example, the following command sets tar to Create mode, lists every file it puts into the archive, collects all the files in /home/student, and saves them an archive named backup.tar. |
tar -cvf backup.tar /home/studentNotice that you have to put the archive filename right after -f.
Tar also supports STDIN/STDOUT. This means that if you run cat archive.tar and pipe it to tar -x, all the files in archive.tar will be extracted. This allows you to do very hacky file copying over text-only connections.