I am trying to build a small scale deep learning framework. I defined my tensor class as follows
class Tensor:
__slots__ = (
"_backend",
"_data",
"_requires_grad",
"_node",
"_grad"
)
def __init__(
self,
data: BackendArray,
backend: Backend | None = None,
requires_grad: bool = False,
) -> None:
# Data
self._backend = backend if backend else get_backend()
self._data = self._backend.as_array(data)
# Data for autograd
self._requires_grad = requires_grad
self._node: Node | None = None # For function (autograd graph) nodes
self._grad: "Tensor | None" = None # Leaf gradient accumulator
def backward(self):
self._grad = backward(self)
def _apply(self, f: Type[Function], *args, **kwargs):
return _apply(self, f, *args, **kwargs)
The functions backward and _apply are supposed to call into my actual autograd engine. The _apply function constructs the autograd graph eagerly during the forward pass, and the backward function orchestrates the backward pass. I tried doing it this way to make Tensor quite a thin class that is supposed to just hold data.
Later on in the same file, I provide the implementations for backward and _apply as functions, not class methods
def _apply(t: Tensor, f: Type[Function], *args, **kwargs):
"""
Apply a Function to a tensor.
Build the computation graph eagerly.
"""
ctx = Context(t._backend)
# We have to unwrap the args to get the raw data, else we create an infinite recursion
# due to the way that functions are currently implemented
raw_args = [
a._data if isinstance(a, Tensor) else a for a in args
]
out_data: BackendArray = f.forward(ctx, *raw_args, **kwargs)
# determine requires_grad: True if any tensor arg requires grad
requires_grad = any(isinstance(a, Tensor) and a._requires_grad for a in args)
out = Tensor(out_data, requires_grad=requires_grad)
# If requires_grad, add to the graph for backprop
if requires_grad:
parents: list[Edge] = []
for idx, a in enumerate(args):
if isinstance(a, Tensor) and a._requires_grad:
if a._requires_grad:
parents.append(Edge(
a,
idx
))
out._node = Node(
grad_buffer = None,
op = f,
ctx = ctx,
parents = parents,
# Placeholder - will be computed just before doing backward as some nodes may not participate
in_degree = None
)
return out
def backward(t: Tensor) -> Tensor:
if t._node is None:
raise RuntimeError("Tensor not attached to graph.")
# Initial gradient
root_grad = F.ones(t.shape, requires_grad=True, backend=t._backend).data
t._node.grad_buffer = root_grad
_backward(t._node)
# return a Tensor wrapping the original root grad
return Tensor(root_grad, backend=t._backend, requires_grad=False)
There is now a natural circular dependency within my code - the tensor needs to know about backward and _apply, and the backward and _apply implementations need to know about Tensor.
I want to move out my actual autograd engine logic into a different file and not have it all in one class, but this natural circular dependency is stopping me from doing it.
The reason why I want the Tensor class to have these as methods is because I want to enable natural syntax like calling x.backward() if x is a tensor instead of backward(x). Indeed, if I was willing to not have that syntax then I wouldn't have this circular dependency problem, because Tensor wouldn't need handles to the functions that contain the autograd logic anymore.
So what is a pythonic way to solve this problem? My end goal is to separate out the tensor (a thin wrapper class around some data) from the core autograd logic, yet still have the ability to have syntax like x.backward() if x is a tensor, for example.
_backwardfunction and the function parameter passed toapply. To me it looks like the two intermediate functions you show are so closely tied to theTensorobject that they should just be class methods, but if you can replace_backwardand that function parameter then you can still use different backends. Is that thinking consistent with what you've built, and does it help you move code around?