File extensions are fake

Honestly this is one of the things that consistently blows my mind: file extensions aren’t real, they’re just. Hints. To tell a system what this file ‘might’ be.

Like, famously, a jar is a zip with a hat? A .wav file is a .riff file. A .json file is a specification of plain text bytes, you can just put a poem in a .json file and break everything that expects to load a JSON blob, but it remains a .json file. Here, watch:

Broken PNG file

^^ That’s a PNG file! Nginx serves it as image/png! But its contents are not actually PNG data, they’re in fact a quine in Scheme. We can just do that!


Actually, it’s kind of weird that we treat these as “corrupted”. Cast aside the notion that a file should have a certain kind of data if it has a certain extension!

People have done[citation needed][can’t be assed to find a source] files that open one image if renamed to PNG, open another if renamed PDF, reveal certain inner files if opened as ZIP, etc. The thing is, the file — the raw data underlying this — is all three! There’s no such ontology that forces a file to be “of a single type”, despite how file extensions work. Many file formats have extensions to ignore or even embed arbitrary data, which can be of a different type. How fucked up is that.


Fizzbuzz3k of #vazkii mentioned: “didn’t know nginx took it seriously as well.” Sorta??

So, HTTP expects the server and client to handshake on a MIME type(basically a “filetype metadata” string), for reasons that probably boil down to “the client probably didn’t care to implement the entire machinery of UNIX file(1).”

That means the browser will trust a MIME type sent over the server, which means the server has to come up with a MIME type that’s reasonable. As it turns out, file(1) ops are expensive and looking at the file extension is free, so I guess Nginx opts for the latter?

It is kind of a weird edge case that an HTTP server has to, by mandatory spec, say “this file is of a particular kind” in the first place. You heard it here first, folks— HTTP is upholding the tyrannical social orders of one-file-one-type! Get they ass!