The first approach works because Spark is using special serialization strategy to process closures required for transformations which is significantly slower but more powerful than the standard pickle (otherwise we couldn't use .map(lambda x: ...)).
The last approach works because there is no need to serialize function code at all. It references sqrt from numpy module so as long as NumPy is accessible on each worker there is no problem at all.
The second approach doesn't work because pickling doesn't serialize the code.
import pickle
pickle.dumps(my_sqrt)
## b'\x80\x03c__main__\nmy_sqrt\nq\x00.'
All it does it states please give me an object assigned to my_sqrt (my_sqrt.__name__) from the top-level script environment (a.k.a. __main__). When it is executed on the workers it doesn't use the same environment and there is no such object in the scope anymore, hence the exception. To be clear it is neither a bug or something specific to Spark. You can easily reproduce the same behavior locally as follows:
In [1]: import pickle
In [2]: def foo(): ...
In [3]: foo_ = pickle.dumps(foo)
In [4]: pickle.loads(foo_)
Out[4]: <function __main__.foo>
In [5]: del foo
In [6]: pickle.loads(foo_)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
...
AttributeError: Can't get attribute 'foo' on <module '__main__'>
Since it doesn't concern itself with an actual value you can even reassign like this:
In [7]: foo = "foo"
In [8]: pickle.loads(foo_)
Out[8]: 'foo'
Take away message here is if you want to use a function this way put it in a separate module and distribute it among the workers the same way as you do with other dependencies, including custom classes definitions.