How to write a 'tar' command

sebastiancarlos@lemmy.sdf.org to Linux@lemmy.ml – 840 points –
155

easy

1 more...

I remember it like this:
tar -extract ze file
and
tar -compress ze file

And also tar -the fuck is in this file

z is for gz files only though, there are plenty of others. xf autodetects and works with all of them (with GNU tar att least).

I hope whoever thought -l should mean "check links" instead of list has a special place in Hell set aside for them.

I have no idea what print a message if not all links are dumped even means.

Was gonna say this. Why TF is list not -l as...everywhere else?

No no it's this:

  1. Decide you've gotta use tar.

  2. man tar

  3. Guess-and-check the flags until it seems to work.

  4. Immediately forget the flags.

That was my case until I discovered that GNU tar has got a pretty decent online manual - it's way better written than the manpage. I rarely forget the options nowadays even though I dont' use tar that frequently.

As much as I also do step 4, to be honest I don't see people use man anywhere near as much as they should. Whenever faced with the question "what are the arguments for doing xyz", I immediately man it and just tell them - Practically everywhere you can execute a given command, you can also read full and comprehensive documentation, just look!

Ah yes, that's the linux community as I know it. There is one thing someone wants to achieve and dozens of ways to do it. ;)

Those are straightforward; it's the remaining 900 options that are confusing. I always need to look up --excludes and always get --directory wrong, somehow.

You also don't need the dash for the short options.

Also, if you're compressing with bzip2 and have archives bigger than a few megabytes I'll like you a lot more if you do it with --use-compress-prog=pbzip2

You also don’t need the dash for the short options.

True, but I refuse to entertain such a non-standard option format. It's already enough to tolerate find's.

Technically the notation with dashes is the non-standard one - the dash form is a GNU addition. A traditional tar on something like Solaris or HP-UX will throw an error if you try the dash notation.

It's also traditional to eat raw meat, but we discovered fire at some point.

Don't try to take my raw ground pork away from me.

my raw ground pork away from me.

Who are you, the Mett demon?
Matt Damon made out of mett\

(It works great with beef, too. Bonus points for the raw yolk over it. If not homemade though there's literally one bar that I trust with this, salmonella is not fun.)

Not enough onions. Your average mettigel has better mett/onion ratio.

You also don't need the dash for the short options.

You know when you meet someone and you're just like "oh boy, yeah, they're evil. No humanity at all"

I think the -j also compresses with bzip2 but I'm not sure if this is defined behavior or just a shortcut

There's nothing technically wrong with using xjf rather than xzf, but it'll bite you if you ever use a non-linux platform as it's a GNU extension. I'm not even sure busybox tar supports it.

Yes, but I'm asking you to use pbzip. bzip at best utilizes one core, both for packing and unpacking. pbzip uses as many cores as IO bandwith allows - with standard SATA SSDs that's typically around 30.

pbzip can only utilize multiple cores if the archive was created with it as well.

Does something similar happen using xz?

I've searched for it and xz also doesn't use multithreading by default, you can change the program tar uses to compress by passing the -I option. For xz using all possible CPU threads:

tar -cv -I 'xz -6 -T0' -f archive.tar.xz [list of directories]

The number indicates the compression ratio, the higher the number, the more compressed the archive will be but it will cost more in terms of memory and processing time

Why when explaining, giving examples of shell command are people so often providing shortened arguments. It makes it all seam like some random letters you have to remeber by heart. Instead of -x just write --extract. If in the end they endup using the tool so often they need to write it fast they'll check the shortcuts.

I don't even mind the shortened arguments too much, though it doesn't help. It's more that every example seems to smush them together into a string of letters.

I would have found

tar -x -f pics.tar ./pics

to be clearer when I was learning. There's plenty of commands which allow combining flags but every tar tutorial seems to do it from the beginning.

Does every Linux command have options as words instead of single letters?

Tar is as old as IT, that's why it's syntax is a bit special.

tar -xf is not really special combining short options isn't uncommon.

Where tar is nonstandard is that you can leave out the -, tar xf is actually how POSIX specifies it. And we've kinda come full circle on that one with many modern utilities using a command syntax, you can read tar xf as "tar extract file" just as you can read git pull as, well, "git pull".

If you want to see a standard command with truly non-standard syntax have a look at dd.

Nono, dash-parameters are new in fancy GNU tar. And POSIX is not old.

Many do as it's considered good practice, but it's not guaranteed, it just depends on the individual command (program). Usually you can use the --help option to see all the options, so for instance tar --help.

Most commands will have expanded arguments started with 2 dashes that usually look like '--verbose-name-of-option', they're usually listed in the man page/documentation along with the abbreviated letter version

1 more...

Or just use long-forms like

tar --create --file pics.tar ./pics

instead of

tar -cf pics.tar ./pics

or

tar --extract --file pics.tar```
instead of

tar -xf pics.tar


which is honestly way easier to remember... \^\^

I don't think tar is actually hard, we are just in the time where we externalize more information into resources such as Google. Its the same reason why younger people don't remember routes by name or cardinal direction as much anymore.

side note: $ tldr is much better than man for just getting common stuff done.

Yes, but still tar options are kinda janky.

The “-“ is often not necessary. I use it as a guide to see how long the person running tar has been using it.

Example:

tar -xf file.tar == tar xf file.tar

They are functionally flags though and uniletter flags should be preceded by a '-', so I would still prefer to have the '-' written, because it conforms with the standard.

yeah, you can also ditch that f

tar c /etc/passwd > fu.tar

tar t < fu.tar

tar x < fu.tar

I have to Google for this everytime. What I can never remember is how to check whether I should put my tar.gz into the subfolder first or risk getting a thousand files sprayed into my homedir.

If you have a somewhat decent shell, just smack tab twice after the filename, it'll list the directories present.

I know the basics off by heart. Not the hardest command syntax to learn all things considered.

The most annoying would be the growing collection of "uber commands" which are much more of a pain in the ass - aws, systemctl, docker, kubectl, npm, cargo, etc. - the executable has potentially dozens of subcommands, each of which has dozens of parameters.

These "uber commands" tend to be much better since they are more explorable with --help explanations and readable flags.

Much better than the random jumble of characters you're expected to have memorised for awk, sed, find et al.

PowerShell is so much worse.

Powershell is horrible all right. What annoys me is they alias ls, dir and other common commands onto commands which don't act or behave in the same way at all. I just run bash or command prompt rather than deal with the bs of powershell.

I very much disagree, what are you referring to?

Yes, that's all very well, but you'll still need to find that image the next time you want to use it.

just now realizing that .tar files aren't compressed by default, and that that's the reason why it's always .tar.gz

tar was originally was for tape archiving so it's just a stream of headers and files which end up directed to a file or a device. It's not well ordered, just whatever file happens to be found next is the next in the stream. When you compress the tar this stream it's just piped through gzip or bzip2 on its way.

The tradeoff for compressing this way is if you want to list the contents of the tar then you essentially have to decompress and stream through the whole thing to see what's in it unlike a .zip or .7z where there would be a separate index at the end which can be read far more easily.

Simple:

tar -(whatever options you want here, my go to is xvzf or cvzf) archive-name.tar file/folder-to-compress

tar can do things other than this?

Oh I'm aware, I'm just saying this is what I normally do with it

I'm sorry, I was trying to be silly and poke fun at how most of us just use the one or two tar commands and it totally didn't translate in text like it did my head. Have a wonderful day good internet stranger.

As a mnemonic I usually read the "f" as "fucking":

  • tar, compress fucking pics.tar.gz with junk from ./pics
  • tar, extract fucking pics.tar.gz

That's only for scripting though. Most of the time I simply right-click the directory or archive, and let Engrampa deal with it.

dtrx is the way to do it. It's short for "do the right extraction", and it just works.

Also, all you have to remember for tar is "-xtract -zee -vucking -files" (extract the fucking files, but first letters only)

You can drop the awkward one and just -xtract -zee -files without -verbose output

Man page for dummies. Nice! I like it!

That would be tldr

I like this summary much more, it's a great visual explanation and doesn't clutter the poor dummy's mind with ALL the infotar has to offer.

I use Linux for years and still Google every time I have to use it!

Thank you, I still don't understand.

I just have pack and extract functions in my shell RC files that look at file extensions and use the proper tool with proper arguments.

Wrote them 10 years ago and they've worked flawlessly ever since!

Brilliant! As an apple engineer, I think I will do the same thing with image previews in iMessage! What can go wrong?

Who could have guessed that an ancient and forgotten image format suddenly gets that big of a revival.

My tar command is tldr tar then ctrl + c / ctrl + v

I just use atool (archive tool) instead. It works the same for any common compression format (tar, gzip, zip, 7zip, rar, etc) and comes with handy aliases like apack and aunpack obsoleting the need to memorize options.

There's ouch too.

ouch stands for Obvious Unified Compression Helper.

great name

Ouch doesn't do 7z though

It seems like it supports LZMA files which I believe is what 7z files are?

Lzma is a compression algorithm, not (just) a file format

.7z files support lzma compression, but do not use it exclusively

I've written a CLI tool in Rust as a front end to tar with gzip called Targez.

It can definitely just be done with an alias instead, but you can give it a try if you prefer something installable.

I would also recommend -v for verbose and -z when compressing for gzip

What does --auto-compress do?

Auto compress will use gzip if the file ends with .gz, bzip if it ends with .bz, and so on without mentioning -z

tar, please eXtract the Vucking File!

tar -xvf tarbomb.tar.

daily-standup.png eh... :)

Who is taking pics of the standup.. :)

OMG always assumed that -c always stands for "compress" and I always placed .gz at the end to remember to place -x when extracting

I always use tldr for these things, super handy to have.

You should link TealDeer, which is the same but it's compiled in rust instead of node so it takes less memory, also, the name is cooler :)

You are absolutely right, this is way better. Thanks!

That looks really cool. And finally a guide that knows -z is not necessary all the time.

Nowaday I have ChatGPT spew me command. I usually do a quick validation before running. Nevertheless, most of simple operations are correct so I don't need to.

I then note the command to my persional gist cheatsheet. Next time, since the command is "cached", I'll be able to be productive quicker.

So much better than googling.

So a serious question from someone who can't remember console commands ever despite using them constantly.

Why are so many linux CLI commands set up with defaults that no one ever uses? Like if you pretty much always need -f, -v is often used, and --auto-compress is needed to recognize type by extension. Why aren't those the defaults to just using tar?

A lot of applications I find are like this too, they don't come with defaults that work or that anyone would ever use.

One reason to keep in mind is backwards compatibility and the expectancy that every Linux system has the same basic tools that work the same.

Imagine you have a script running on your server that uses a command with or without specific arguments. If the command (say tar) changes its default parameters this could lead to a lot of nasty side effects from crashes to lost or mangled data. Besides the headache of debugging that, even if you knew about the change beforehand it's still a lot effort to track down every piece of code that makes use of that command and rewrite it.

That's why programs and interfaces usually add new options over time but are mostly hesitant to remove old ones. And if they do they'll usually warn the others beforehand that a feature will deprecate while allowing for a transitional period.

One way to solve this conundrum is to simply introduce new commands that offer new features and a more streamlined approach that can replace the older ones in time. Yet a distribution can still ship the older ones alongside the newer ones just in case they are needed.

Looking at pagers (programs that break up long streams of text into multiple pages that you can read one at a time) as a simple example you'll find that more is an older pager program while the newer less offers an even better experience ("less is more", ¿get the joke?). Both come pre-installed as core tools on many distributions. Finally an even more modern alternative is most, another pager with even better functionality, but you'll need to install that one yourself.

Damn, I'm using the "tape archiver" (this is what tar means) since I installed HPUX8 in the 90s, from tape, yes...

great, now how do I use it together with the 'feather' command?

Don't you have to specify the compression algorithm when extracting? I always use tar -xzf for gzip files and if I remove -z it just fails.

I've been using only xf for a long time now. Don't remember ever getting an error from it in the last years. Maybe tar can now check the magic number or something to figure out what the format is?

tar is just the worst shell command in existence. Why do people still bother with it?

Because it is faster to transport one big ass tar than 10k individual files, and compression is waste of time.

What do you use instead?

I avoid it and use zip or 7z if I can. But for some crazy reason some people stil insist on using that garbage tool and I have no idea why.

Are zip and 7z really that much easier?

tar cf foo.tar.xz wherever/
zip -r foo.zip wherever/
7z a foo.7z wherever/

I get that tar needs an f for no-longer-relevant reasons whereas other tools don't, but I never understood the meme about it beyond that. Is c for "create" really that much worse than a for "add"?

If you want to do more than just "pack this directory up just as it is" you'll pretty quickly get to the limits of zip. tar is way more flexible about selecting partial contents and transformation on packing or extraction.

100% of tarballs that I had to deal with were instances of "pack this directory up just as it is" because it is usually people distributing source code who insist on using tarballs.

Because everyone else does, and if everyone else does, then I must, and if I do, then everyone else must, and then everyone else does.

Repeat loop.

For all I care it goes on the same garbage dump as LaTeX.

I think that's pretty mean towards the free software developers spending their spare time on Latex and the GNU utils.
I and many academics use Latex, and I personally am very happy to be able to use something which is plain text and FLOSS.
I also don't see your problems with tar; it does one thing and it does it good enough.

I also don’t see your problems with tar; it does one thing and it does it good enough.

The problem is the usage of the tool which people invent different mnemonics for because it's UX is stuck in 1986 and the only people who remember the parameters are those who use it daily.

Similar thing for LaTeX: it's so absurdly crusty and painful to work with it's only used by people who have no alternative.

//ETA
Also, I don't want to be mean towards the maintainers of LaTeX. I'm sorry if I made any LaTeX maintainer reading this upset or feel inferior. Working on the LaTeX code is surely no easy endeavour and people who still do that in 2023 deserve a good amount of respect.

But everytime I had to work with LaTeX or any of its wrappers was just pure frustration at the usage and the whole project. The absolute chaos of different distributions, templates, classes and whatnot is something I never want to experience again.

speaking of which, you might want to check out typst if you haven't heard of it - I really hope this replaces most uses of LaTeX in the next years.

Thanks I'll keep an eye on that project. I did try pandoc and LyX in the past to ease the pain but typst appears to have the courage to finally let LaTeX be and not build a new wrapper around it.

You do you. Compression is waste of time; storage is cheap in that you can get more, but time? Time, you never get back.

Yes, and I'd rather not have my time wasted by waiting on thousands of small files transfer, rather than just compressing it and the time spent of one file transferring being much smaller.

as in time wasted transferring a highly compressible file that you didn't bother compressing first?

it's only a waste of time when the file format is already compressed.

Unless you measure your baud in dial up modem, it often can take longer to compress / transport / uncompress than just transfer directly.

unless you're picking a slow compressor that's not true at all

Original size | 100 GB
Compressed size | 47.8 GB (2.091 ratio)
Transfer speed | 1 Gbps (125 MB/s)
Original transfer time | 100 GB / 125 MB/s = 800 seconds
Compressed transfer time | 47.8 GB / 125 MB/s = 382.4 seconds

Compressor | Snappy
Compression ratio | 2.091 ratio
Compression speed | 565 MB/s
Decompression speed | 1950 MB/s
Compression time | 100 GB / 565 MB/s = 177 seconds
Decompression time | 47.8 GB / 1950 MB/s = 24.5 seconds

Transfer time w/o compression | 800 seconds
Transfer time with compression | 177 + 382.4 + 24.5 = 584.9 seconds

You've never used find have you? Let's not even get started on the config file syntax for sendmail either.