Tensor Considered Harmful

Tensor Considered Harmful, by Alexander Rush

TL;DR: Despite its ubiquity in deep learning, Tensor is broken. It forces bad habits such as exposing private dimensions, broadcasting based on absolute position, and keeping type information in documentation. This post presents a proof-of-concept of an alternative approach, named tensors, with named dimensions. This change eliminates the need for indexing, dim arguments, einsum- style unpacking, and documentation-based coding. The prototype PyTorch library accompanying this blog post is available as namedtensor.

Thanks to Edward Z. Yang for pointing me to this "Considered Harmful" position paper.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

dimensions and units

My impression of this business of named tensors is that it's another form of the units of measure feature that comes up from time to time in PL design, either in statically or dynamically typed form. Though that in itself may say something about the nature of thought versus computation.

Truthfully, I don't grok tensors. I consider it a major gap in my mathematical toolkit. From what I've heard, I'm not alone in not grokking them, either. I've been stuck partway through Penrose's Road to Reality for several years, now, because I decided his overview of tensors just wasn't enough for me to move on past, and I was going to need to go off and get a feel for them elsewhere and come back (by now, I s'pose I'd have to start over from the beginning). So far, for all the explanations of tensors I've read, I haven't got a handle on them. Though I've got a new appreciation of the difference between reading a textbook and taking a well-taught class based on a textbook.

Synchronicity w/ The Morning Paper

Interesting that The Morning Paper picked a paper that highlights a similar issue:

Machine Learning Systems are stuck in a rut

For example:

Named dimensions improve readability by making it easier to determine how dimensions in the code correspond to the semantic dimensions described in, .e.g., a research paper.

abstraction

The lines following that are also worth quoting as they indicate a benefit beyond just readability.

We believe their impact could be even greater in improving code modularity, as named dimensions would enable a language to move away from fixing an order on the dimensions of a given tensor, which in turn would make function lifting more convenient…

This is clearer in the context of quantum computing where you want to work with named qubits and not worry about which sequence in the state tensor product you need your operator to affect. It also makes simulator implementation independent of the order in which the state outer product has been taken.

Misnomer?

This is a good idea. When I read "named tensor", my mind interprets it as "tensor that's been given a name" (as in "named baby") and not as "tensor with named dimensions" (perhaps "baby with named arms and legs"?). I can't seem to think of a better name though.

On the other hand, do we need a new name for this? I think it would be better if we just called them "tensors" and augmented the old API to support string-named dimensions as well as integer-named dimensions. Would that work?

Is not this a

Is not this a relation/table?

This exactly why I'm aiming at my own little lang (http://tablam.org). I have explored using a NDArray as core internal type. And each relation have a schema (so all is named).

I hope to provide things like:


city ?where .population > 100_000 ?select .name, .country
points ?sort .x

For numerical indexes:


city ?where #3 > 100_000 ?select #1, #2
points ?sort .#1
city # 1

Improv > Excel, D > SQL

One of the things that made Lotus Improv much superior to Lotus 123 and its descendants is that instead of giant 2D sheets with numbered rows and columns, it has arbitrary-dimensionality cubes with named dimensions and named entries.

Similarly, the database language D beats SQL which beats Codd's original RA, because D gives all attributes of a relation unique names, SQL gives them non-unique names, and Codd's model has only positional labels.

While Improv is defunct, Quantrix Modeler is not, but unfortunately it costs $$$$$.