How to use PyTorch to train a model to compute cross products
There are many tutorials about training neural networks using PyTorch. For this reason we present here an article about training a model that is not a neural network, with an emphasis of exploring “pythonic” aspect of PyTorch.
The task here is to train a model that will compute cross product of two three-dimensional vectors. While dot product and, in general, matrix multiplication, is used very extensively in deep learning, the cross product is not very common. Below is a picture demonstrating the concept.
I prefer to define a cross product in a tensor form:
Here ϵ is the Levi-Civita tensor. We are using here index notation or simplified Einstein notation, meaning that for repeating indexes summation is implied.
This is the foundation of our model, and the tensor ϵ is a trainable parameter of the model, and we expect it to match the Levi-Civita tensor.
First we will use a supervise learning approach. We will run the model feeding two random vectors and then compare the prediction with the expected cross product.
tensor([[[ 1.0955e-05, 2.9630e-05, -3.7728e-05],
[ 1.4276e-05, 8.8082e-05, 9.9998e-01],
[-9.9078e-06, -1.0000e+00, 1.8971e-05]], [[ 8.3094e-06, 1.7124e-06, -9.9999e-01],
[ 2.7287e-05, -1.4629e-05, -1.5764e-05],
[ 9.9996e-01, -3.1703e-05, -3.2365e-06]], [[ 2.5212e-05, 9.9996e-01, 2.0671e-05],
[-9.9997e-01, -3.3963e-05, 1.0797e-05],
[-3.0542e-05, 3.7616e-05, -6.0207e-05]]], requires_grad=True)
Let’s discuss the code above:
- The model in our case is one parameter
epwith the shape
(3,3,3). It is initialized to random values, but we expect it to converge to the Levi-Civita tensor.
- We do not batch the vectors, hence we iterate only over steps, not epoch. Also this means that none of the tensors have a batch dimension, which certainly makes it easier to understand.
- The function
torch.einsum()is one of my favorite tools in deep learning. I already mentioned index, or Einstein, notation.
einsumstands for Einstein sum. It symbolically shows how the tensor multiplication would work, by which indexes is the summation occurring and what is the expected dimension of the result. I like it because when I work with multidimensional tensors I don’t have to worry about transpose or permute, I can clearly see how the summation is done.
- The loss here is the difference between the predicted value and the actual cross product. As you can see training converges rapidly and the result of the training the tensor
epis really close to the Levi-Civita symbol.
In the previous section, we used supervised learning to learn Levi-Civita tensor and cross product of three-dimensional vectors. To me it felt like cheating. Can we instead derive the cross product based on some general principles, and not by comparison of the predicted value with expected?
We will train our model to satisfy these criteria:
- The cross product operation is antisymmetric, i.e
2. It is normalized, that is the lengths of
are all equals to one.
3. The cross product is orthogonal to both terms, i.e. dot product is zero:
These criteria are still not sufficient to guarantee that we learn correctly the cross-product function, because it is possible that we instead learn negative cross-product. In other words, the model will produce either ϵ or −ϵ
tensor([[[-1.5880e-05, 4.7683e-05, 5.3528e-05],
[-4.6085e-07, 4.7272e-05, -9.9767e-01],
[-2.9965e-05, 9.9688e-01, -4.0021e-05]], [[-4.0523e-05, 1.7235e-05, 9.9770e-01],
[-3.0135e-07, -1.1027e-05, -6.7566e-05],
[-9.9676e-01, 1.5644e-05, 5.9592e-05]], [[-7.9570e-06, -9.9768e-01, -1.1900e-05],
[ 9.9689e-01, -4.3567e-05, -3.2372e-05],
[-7.1624e-06, 3.8180e-05, 6.8536e-05]]], requires_grad=True)
The code block is almost identical, but the loss is computed in a different way. The first part of the loss penalizes the situations when the learned operation is not antisymmetric.
To compute the second part of the loss, we compute the tensor operation on the unit vectors. As for a cross product, we expect the cross product of the unit vectors have the length of 1. We compute three tensor operations of e1 and e2, e1 and e3, e2 and e3 and trying to bring their length to 1, as it should be for cross products.
For the third part of the loss we take dot product of the predictions and vectors a and b, and penalize the non-zero results. Let’s review the results:
tensor([ -5.2354, 88.7160, -26.9144], grad_fn=)
tensor([ 5.2500, -89.0000, 27.0000])
We can see that we indeed learned negative cross product. Indeed, the sign of the cross product is a matter of convention, that the cross product of e1 and e2 is e3 and not -e3. I invite the reader to fix the loss function so that it would prefer the right sign of the cross product.
You probably noticed, how “pythonic” this code looks like. We did not need to create a function that computes loss (even though it is probably a good idea) or a custom layer (which we should do for more complex models). We did not even have to create a model class, as the only thing we are training is a single parameter.
While this tutorial is very simple, it has a lot of opportunity for improvement. For example, we can consider cross products in vector spaces with more than three dimensions, creating a Module or even a Layer classes that will encapsulate the functionality. All code used here can be found on my github repo.
Training a Model to Compute Cross Product Republished from Source https://towardsdatascience.com/training-a-model-to-compute-cross-product-8c9390541fc9?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed