As usual, let $C : f(x,y) = 0$ be an algebraic curve and $X$ be the Riemann surface resulting from its desingularization and compactification. My final (known) issue is in implementing the Abel map

for when $P$ is a “discriminant place” of $X$ which, by my definition, is a place such that the $x$-projection is a discriminant point of the curve $C$. ($\partial_y(f,x,y)$ vanishes.) The trouble here is that each holomorphic differential is given in the form

and, though holomorphic at $P$ by definition, this expression has a vanishing denominator. Therefore, the “standard” approach of evaluating this differential along a path $\gamma(s) = (\gamma_x(s), \gamma_y(s))$, where $s \in [0,1]$, on the Riemann surface by computing

is insufficient because we would approach a $0/0$ situation.

# Localizing $\omega$ at a Place $P$

Due to the vanishing of the denominator at $P$ we need to localize $\omega$ the place $P$. Locally, a place $P$ is given in term of a Puiseux series expansion

centered at $t=0$ where $t$ is a local parameter. (Just as in defining an atlas for a Riemann surface.) Therefore, localizing $\omega$ at this place gives the expression

By holomorphicity, we know that $r(0) \neq 0$. Therefore, we can completely factor out all powers of $t$ from the denominator and there are enough powers of $t$ in the numerator to cancel them all out. That is, we can assume that $\omega |_P = q(t)dt/r(t)$ where $r(0) \neq 0$.

## Example

We start by computing some holomorphic differentials on, in this case, a genus 2 Riemann surface.

x*y/(2*x**3 + 3*y**2)
x**3/(2*x**3 + 3*y**2)


Let’s take the first differential as an example. We need an example place, as well. We pick one with x-projection of zero, a discriminant point of the curve.

x(t) = (1)t**1
y(t) =  + (1/2)t**4 + (-1/16)t**9 + (3/128)t**14 + O(t**15)

x(t) = (-1/2)t**2
y(t) =  + (-1/2)t**3 + (-1/64)t**8 + (3/4096)t**13 + O(t**14)


Let’s chose the second place with ramification index / branching number of two. Now, we evaluate the differential at this place. The notation in the code is very much subject to change since this part is still in production. Regardless, this is how we computationally localize at $P$:

              16    11    6
3*t     t     t
----- - --- - --
8192   128   4
--------------------------------------
26       21       16      11    6
27*t      9*t      3*t     3*t     t
-------- - ------ - ----- + ----- + --
16777216   131072    2048     64    2


Observe that $t^6$ factors out of the denominator and there is at least a factor of $t^6$ in the numerator. Canceling this factor (and eliminating coefficient denominators) out we get.

                 10           5
6144*t   - 131072*t  - 4194304
---------------------------------------------------
20         15          10           5
27*t   - 1152*t   - 24576*t   + 786432*t  + 8388608


Therefore, we computationally verify that the differential is quite regular at this place, $P$.

# Theory vs. Practice

So now we just need to evaluate this for the appropriate $t$ approaching $t=0$, right?

One of the biggest takeaways from my Ph.D. is that there is a huge difference between theory and practice — a difference not brought up often enough in computational papers, at least in my field. For example, mathematics doesn’t have to worry about floating point approximations. In the above example, mathematics doesn’t care that for various values of $t$ we will be computing the ratio of two large numbers. But $|\omega(t)|$ is typically small.

My current solution to this problem is to compute the Taylor series expansion of the localization of $\omega(t)$ centered at $t=0$:

CPU times: user 792 ms, sys: 9.14 ms, total: 801 ms
Wall time: 1.06 s
-1/2 + t**5/32 - 15*t**10/4096 + O(t**14)


Computationally, this is preferred: we expect the value of $\omega(t)$ near $t=0$ to be small. The coefficients of terms in this expansion is small. Therefore, this should be at least more numerically stable than the rational representation.

However, the compute time for the Taylor series approximation is already on the order of seconds. In order to accurately compute more terms of this Taylor series we need to repeat the above process using a Puiseux series expansion containing more terms. For example, using an $O(t^{24})$ accurate Puiseux series we obtain:

CPU times: user 4.6 s, sys: 81.4 ms, total: 4.68 s
Wall time: 5.93 s
-1/2 + t**5/32 - 15*t**10/4096 + t**15/2048 - 1155*t**20/16777216 + 21*t**25/2097152 + O(t**29)


The compute time increases even more when the Puiseux series coefficients contain any symbolic root or imaginary terms. (For example, at a place over the discriminant point $x = -(2/3)3^{2/5}$ this calculation takes over a minute with an $O(t^{10})$ accurate Puiseux series.)

# Speeding Things Up

The entry point of Sympy’s series is implemented in the Expr.series method. When determining the series expansion of expressions of the form $1/r(t)$ Sympy redirects to the Pow._eval_nseries() method. From what I could glean, the geometric series formula to expand denominators:

One of the computational issues with this technique is the operation count. Suppose $r$ is degree $N$. Then $r(t)^2$ takes $O(N^2)$ multiplications to evaluate modulo $O(t^N)$. We need to compute up to $r(t)^N$ making total compute cost of the Taylor series of $1/r(t)$ approximately $O(N^3)$ via Sympy’s method.

We can instead try the following: write $1/r(t) = s(t) = \sum_{n=0}^N s_n t^n$ where the coefficients $s_n$ are to be determined. Then

This produces a system of equations for the unknowns $s_n$

which we can efficiently solve in $O(N^2)$ multiplications using the standard forward-substitution method. In fact, we can easily modify this approach to find the Taylor series of $q(t)/r(t)$ where $q(t) = \sum_n q_n t^n$:

No deep mathematics is being used here. Just a little trick.

## Implementation

Again, the implementation syntax is subject to change but the basic steps are still the same. Here are the two localization algorithms side-by-side: the baseline using Sympy’s nseries() and the convolution / forward-substitution based one:

Now we test using some example differentials and places.

omega:
x*y/(2*x**3 + 3*y**2)

place:
x(t) = (-1/2)t**2
y(t) =  + (-1/2)t**3 + (-1/64)t**8 + (3/4096)t**13 + O(t**14)


Before we perform our timings, it is very important to reset the Python kernel before each test. The reason why (which has caused many headaches in past attempts to optimize Sympy code) is that Sympy caches most (all?) symbolic arithmetic. The end result is that even though we are testing two separate functions similar intermediate values have been cached.

The following timings are done with the Sympy cache cleared before each function evaluation.

CPU times: user 945 ms, sys: 18.6 ms, total: 963 ms
Wall time: 1.19 s
5       10
1   t    15*t        / 14\
- - + -- - ------ + O\t  /
2   32    4096

CPU times: user 192 ms, sys: 2.19 ms, total: 194 ms
Wall time: 214 ms
10    5
15*t     t    1
- ------ + -- - -
4096    32   2


That’s a factor of $1.19/.214 = 5.5$ speedup. Not bad. Here are some timings where the place is taken to higher order

CPU times: user 4.74 s, sys: 98.4 ms, total: 4.84 s
Wall time: 5.09 s
5       10    15          20        25
1   t    15*t     t      1155*t      21*t      / 29\
- - + -- - ------ + ---- - -------- + ------- + O\t  /
2   32    4096    2048   16777216   2097152

CPU times: user 298 ms, sys: 3.67 ms, total: 302 ms
Wall time: 359 ms
25         20    15        10    5
21*t     1155*t     t      15*t     t    1
------- - -------- + ---- - ------ + -- - -
2097152   16777216   2048    4096    32   2


That’s a factor of $5.09/.359 = 14.1$! This implies that whatever Sympy is doing in the naive implementation it’s doing much more than raw arithmetic with the coefficients of the numerator and denominator. I performed some additional tests where the coefficients contain symbolic square roots or imaginary components and the speedups are similar.

# Concluding Remarks

Trying to squeeze as much performance out of my code as possible is both a strength and a weakness of mine. As much as I enjoy optimizing algorithms and implementations I do need to graduate and produce a working product. And as frustrating as Sympy can be sometimes (looking forward to CSympy!) I probably wouldn’t catch these kinds of inefficiencies if the Python overhead wasn’t around to slow things down. Or perhaps this is simply the nature of symbolic programming.

In ye olden days these millisecond delays may have been second or minute delays. Of course, if this problem were the kind of operation performed thousands of times per session (it’s not, thankfully) then the same delay extension would apply. For the past five years I’ve been caught between completing a Ph.D. and trying to write the most effective software for computing on Riemann surfaces. Trying to stay in the middle can be challenging.

On that note, I came across a paper [1] suggesting the use of continued fractions to approximate rational functions. I’d like to investigate, though, I’m not sure if it will be more efficient at function evaluation: that is, since divisions are more costly than multiplications and, in the Taylor case, Horner’s rule log-time polynomial evaluations.

[1] M. Rosenblum, “A Fast Algorithm for Rational Function Approximations”, 1999.