Close Enough

You can get a good-enough answer to some questions without all the work.

A Gradient Approximation Story with Zygote and Julia

Some days, you just have too much data, you know? Big heaps of data, piling up, each just a little different than the last. And the function you’re supposed to run on all that data is kinda complicated, and a little slow, and you’re just not feeling it. What’s a poor CPU to do?

Well don’t just throw it out! Give each data point the individual attention it deserves, while giving yourself grace to approximate the function instead of worrying about every little step by gradient approximation. Here, I’ll show you:

Close Enough
A Gradient Approximation Story with Zygote and Julia
An example problem- Black-Scholes
The idea- Gradient Approximation
The surprisingly easy work- building the approximator
The digression- some data
The lie- nothing in the computer is real
The results- things are good
The lesson- make tradeoffs
Footnotes

An example problem- Black-Scholes

Let’s work through an example, in Julia for today. Gradient approximation works on anything with a gradient, but we’ll pick an example that’s pure math. Let’s say you’re supposed to figure out exactly how much a bucket of stock options are worth, and how risky it is, at different times in a day. Luckily, there are formulas for that, which I’ll call Black-Scholes and the greeks. Luckier still, lots of people use that formula, so you don’t even need to write it, you can borrow someone else’s work. It’s got a handful of arguments that change throughout the day- how much the stock is worth, how wiggly the stock is, how long the option lives, even how much other people are selling Treasury bonds for.

So, what you’re supposed to do is for e.g. every minute of the day, look up all these arguments, run Black-Scholes and its siblings on the data, and come back with a vector of results.

The idea- Gradient Approximation

But that sucks. Let’s do something easier. All those arguments? They don’t really change much every minute, or even throughout the day. So let’s ignore the nasty exponents and distributions and such that make Black-Scholes slow, and pretend it’s a friendly linear function. After all, when you zoom in enough, most functions look pretty linear. So the idea is:

Convert a complex function into a linear function, by taking its gradient at a nearby point

We’ll see how useful this idea is, by making a “function approximation measurement function”:

function do_approx_check(func, center, delta_size, time = false)

This will take in any function, along with a representative point for arguments to that function and a guess about how much “real” values vary from that point. Then we’ll try approximating it, optionally timing the “real” and approximate version.

The surprisingly easy work- building the approximator

Building a gradient approximation system of arbitrary functions in Julia is surprisingly easy, thanks to systems like Zygote to differentiate arbitrary code. We first calculate the “real” result and the gradient of the function at the center of our mess of points. Then for each point in our collection, we update the result by the change in that point from the center, times the gradient.

using Zygote
compute_approx_from_deltas(deltas, center_grad, center_val) = (deltas * center_grad) .+ center_val
function compute_approx(collection, func)
  center = mean(collection, dims=2)
  deltas = transpose(collection .- center[:,1])
  center_grad = gradient(func, center)[1]
  center_val = func(center)
  return compute_approx_from_deltas(deltas, center_grad, center_val)
end

The digression- some data

Real market data is expensive, so for this example, we’ll just pretend. First, let’s pick a specific function from the Julia Financial Toolbox. We’ll go with FinancialToolbox.blslambda,¹ which is documented as:

Black & Scholes Lambda for European Options

	Λ=blslambda(S0,K,r,T,σ,d=0.0,FlagIsCall=true)
Where:

	S0         = Value of the Underlying.
	K          = Strike Price of the Option.
	r          = Zero Rate.
	T          = Time to Maturity of the Option.
	σ          = Implied Volatility
	d          = Implied Dividend of the Underlying.
	FlagIsCall = true for Call Options, false for Put Options.

	Λ          = lambda of the European Option.
Example

julia> blslambda(10.0,10.0,0.01,2.0,0.2,0.01)
4.945909973725978

Look, it comes with an example set of arguments, that’s nice. We’ll use that as our center point.

For ease of code, it would be nice to have our computation logic take a vector/1-d matrix instead of distinct arguments. Easy enough, we’ll just make a wrapper function:

blsvec(v) = blslambda(v[1], v[2], v[3], v[4], v[5], v[6]);
#blsvec(v) = blslambda(v...) # This works equivilantly, but is super slow, maybe because it triggers complex unboxing logic?

Next, we’ll have our do_approx_check function build a collection of data around that representative point:

function do_approx_check(func, center, delta_size, time = false)
  # Set up samples
  num_samples = 1_000_000
  deltas = transpose((
            rand(Float32, length(center), num_samples)
                .* (delta_size * 2) .- delta_size) .* center)
  collection = center .+ transpose(deltas)

So, we’ll pick 1,000,000 random points, where each dimension of each point is at the center +- delta_size percent.

And now we’re ready to compare running the “real” function with our approximation, and optionally time it too:

  # Compute results correctly and approximately, compare them 
  results = compute_exact(collection, func)
  results_approx = compute_approx(collection, func)
  error = sum(abs.(results - results_approx)) / sum(abs.(results))
  center_val = func(center)

  error_from_no_change = sum(abs.(results .- center_val)) / sum(abs.(results))
  println("Gradient approximation error for delta size ", delta_size * 100, "%: ", error*100, "%, vs error assuming no change of ", error_from_no_change * 100, "%" )

Note that for reference, we compare to an even faster, more approximate method: Ignore the specifics of each point, and just pretend they all have the same result as the center point.

  if time
    center_grad = gradient(func, center)[1]
    # Neat ways to look closer:
    #@code_llvm compute_approx(collection, func)
    #@code_native syntax=:intel debuginfo=:none compute_approx(deltas, center_grad, center_val)
    @btime compute_exact($collection, $func)
    @btime compute_approx($collection, $func)
    @btime compute_approx_from_deltas($deltas, $center_grad, $center_val)
  end
end

The lie- nothing in the computer is real

Before we see how “wrong” the approximation is, ask yourself, “compared to what?”. The intuitive thing is to see the blslambda() function as the absolute ground truth- after all, it’s commented, has an official looking github, and doesn’t say anything about approximation. But the real Black-Scholes formula uses real valued numbers and exponents and Gaussian error functions, and you’re asking a computer to evaluate it using only discrete physical states. Every line of code the blslambda() function relies on is at best an approximation of the true mathematical function- sometimes a surprisingly bad approximation.

If you’re worried that using a gradient approximation means you’re at best computing a rough sketch of the real function, take comfort: your hope was lost before you even began. Rejoice in the freedom that failure gives you to make your own choices about a speed/accuracy tradeoff.

The results- things are good

So, how close is the approximation? When the points don’t move much, very close:

do_approx_check.(blsvec, [0.01, 0.05, 0.1, 0.2, 0.4], Ref([10.0,10.0,0.01,2.0,0.2,0.01]));

Gradient approximation error for delta size 1.000%: 0.013%, vs error assuming no change of 1.062%
Gradient approximation error for delta size 5.000%: 0.315%, vs error assuming no change of 5.315%
Gradient approximation error for delta size 10.000%: 1.259%, vs error assuming no change of 10.639%
Gradient approximation error for delta size 20.000%: 5.062%, vs error assuming no change of 21.437%
Gradient approximation error for delta size 40.000%: 20.430%, vs error assuming no change of 43.637%

Roughly 1/100th the error of just taking the average for small changes. Error increases substantially the bigger the deltas, which makes sense- the more you move, the less exponents look linear.

Great, so we’re close, but did we save any time?

do_approx_check(blsvec, [0.05], [10.0,10.0,0.01,2.0,0.2,0.01], true);

240 ms (6 allocations: 45.78 MiB)
934 ms (416 allocations: 61.05 MiB)
786 ms (4 allocations: 15.26 MiB)

The approximations are a lot faster- we get a 5x speedup even if we include the time to compute the average and the gradient. If we take those as given, and look at the time for a marginal point, we see a 13x speedup.

The lesson- make tradeoffs

Performance vs complexity is a classic software engineering tradeoff. Others include bandwidth, latency, space, downtime. But a system falls somewhere on an accuracy spectrum too, and consciously deciding on an application’s accuracy requirement can offer overwhelming gains in other areas.

Footnotes

In my testing, I actually remove an argument bounds check from the Financial Toolbox- I’m really only interested in comparing the actual computation. ↩