Compression is the process of removing redundant information to reduce file size. The compression algorithm is the math behind a compression program.

Types of compression

There’s two types of compression: lossy and lossless. Lossy compression discards information that seems useless. When you take a photo, for example, you usually don’t need perfect fidelity or a perfect 1:1 recreation of what the camera saw. You’ll often see lossy compression used for media files.

Lossless compression only removes data if it can be perfectly recreated at decompression. For example, audiophiles use FLAC instead of MP3 because it compresses audio without throwing away data. If you’re compressing backups, log files, configurations, or anything else you need a perfect copy of, lossless compression is your only option.

Lossy compression examples: JPEG, MP3, WebP, MPEG

Lossless compression examples: ZIP, bzip, gzip, xzip FLAC, AAC, PNG

Algorithms: LZ, BW, LZMA

LZ (gzip)Good speed, good compression, very broad support
BW (bzip)Slower speed, better compression, not as widely supported
LZMA (xzip)Best speed, best compression, not as widely supported
PKZIP (ZIP)Microsoft’s compression algorithm, widely used in Windows

tar (short for tape archive) is used to combine multiple files into one file, making it possible to store on storage tapes. The tar process itself doesn’t do compression. It just concatenates files together, putting one after another, with some headers thrown in. However, tar does support command options that use gzip/bzip/xzip to compress the .tar file and add a second extension. This is why you might see .tar.gz or just .tgz. Remember that Linux isn’t picky about filenames, unlike Windows.

tar options

tar optionDescription
c or -cCreate a tape archive
t or -tList the files in an archive
x or -xExtract the files in an archive
v or -vVerbose mode, has to be combined with c/t/x, prints the name of every file to the terminal
f or -fSaves the archive to the filename given after -f
z or -zCompresses the archive using gzip
j or -jCompresses the archive using bzip2
-CExtracts to a target folder given after -C
For example, the following command sets tar to Create mode, lists every file it puts into the archive, collects all the files in /home/student, and saves them an archive named backup.tar.
tar -cvf backup.tar /home/student

Notice that you have to put the archive filename right after -f.

Tar also supports STDIN/STDOUT. This means that if you run cat archive.tar and pipe it to tar -x, all the files in archive.tar will be extracted. This allows you to do very hacky file copying over text-only connections.