## Semiprime factorization and custom proof system

Suppose we have $N=pq$, with $p$ and $q$ are unknown odd primes. We can encode factorization problem in the system of polynomial equations. For instance, $p= 1+ \sum_{k=1}^n 2^k x_k$, $q= 1+ \sum_{k=1}^n 2^k y_k$, where $n = \lfloor \log_2 N \rfloor$ is one less the number of bits required to represent $N$, and $x_k, y_k$ are binary indeterminates. By abuse of notation, we can write the following system of equations (here $x, y \in \mathbb{C}^n$)

$\left\{ \begin{array}{lll} f_1= &x_k^2-x_k & =0 \\ g_1= &y_k^2-y_k &=0 \\ h_{1,1}= &\left( 1+ \sum_{k=1}^n 2^k x_k \right) \left( 1+ \sum_{k=1}^n 2^k y_k \right) -N & = 0 \end{array} \right.$

This system has only two solutions for semiprimes – $x$ encoding $p$, $y$ encoding $q$, and vice versa. Therefore, the Groebner basis will consist of $2n$ linear equations encoding line passing through $(x,y)= (p,q)$ and $(x,y)=(q,p)$, and one quadratic equation selecting these two points along this line.

Therefore, we want to represent the linear part of Groebner basis using a linear combination of our equations with polynomial coefficients – $c+ \sum_k a_k x_k +b_k y_k = P_1 h+ \sum_k P_{2,k} f_k + P_{3,k} g_k$ with $P_{...} \in \mathbb{C}[x_1,..., x_n, y_1, ..., y_n]$, Here, the coefficients in the polynomials $P$ are treated as indeterminates. For example, for $n=1$ the correspondin system reads

$\left\{ \begin{array}{lll} f_k= &x_1^2-x_1 & =0 \\ g_k= &y_1^2-y_1 &=0 \\ h_{1,1}= &\left( 1+ 2 x_1 \right) \left( 1+ 2 y_1 \right) -N & = 0 \end{array} \right.$

The linear bases is expressed by

$\begin{array}{l} c+ a_1 x_1 +b_1 y_1 = \\ \left( r_{1,1} x_1 +r_{1,2} y_1 +r_{1,3} \right) (x_1^2-x_1)+ \\ \left( r_{2,1} x_1 +r_{2,2} y_1 +r_{2,3} \right) (y_1^2-y_1)+ \\ \left( r_{3,1} x_1 +r_{3,2} y_1 +r_{3,3} \right) (1+2 x_1+2 y_1+4 x_1 y_1-N) \end{array}$.

and we have the following system for coefficients of different monomials

$\left\{ \begin{array}{rrl} x_1^3: & r_{1,1}&=0 \\ x_1^2y_1: &r_{1,2} +4r_{3,1} &=0 \\ x_1^2:& r_{1,3} - r_{1,1} +2 r_{3,1} &=0\\ x_1y_1^2:& r_{2,1}+4r_{3,2} &=0\\ x_1y_1:& -r_{1,2}-r_{2,1} +2r_{3,1}+2r_{3,2}+4r_{3,3}&=0\\ x_1:& -a_1 -r_{1,3}-r_{3,1}(N-1)+2r_{3,3} &=0\\ y_1^3:& r_{2,2} &=0\\ y_1^2:& r_{2,3}-r_{2,2} +2r_{3,2}&=0\\ y_1:& -b_1 -r_{2,3} -r_{3,2} (N-1) +2r_{3,3} &=0\\ 1:& c+ r_{3,3} (N-1) &=0\\ \end{array} \right.$

$\left\{ \begin{array}{rrl} x_1^2y_1: &-r_{1,2} &=4r_{3,1} \\ x_1y_1^2:& -r_{2,1}&=4r_{3,2} \\ x_1^2:& -r_{1,3} &=2 r_{3,1}\\ y_1^2:& -r_{2,3} &=2r_{3,2}\\ x_1y_1:& 2r_{3,3}&=-3(r_{3,1}+r_{3,2})\\ x_1:& r_{3,1}N+3r_{3,2} &=a_1\\ y_1:& r_{3,2} N+3r_{3,1} &=b_1\\ 1:& -3(r_{3,1}+r_{3,2})(N-1) &=2c\\ \end{array} \right.$

Therefore,
$(N r_{3,1}+3r_{3,2}) x_1+ ( N r_{3,2} +3r_{3,1} ) y_1 - \frac{ 3}{2}(r_{3,1}+r_{3,2})(N-1)= \\ \left( -4r_{3,1} y_1 -2r_{3,1} \right) (x_1^2-x_1)+ \\ \left( -4r_{3,2} x_1 -2r_{3,2} \right) (y_1^2-y_1)+ \\ \left( r_{3,1} x_1 +r_{3,2} y_1 - \frac{ 3}{2}(r_{3,1}+r_{3,2}) \right) (1+2 x_1+2 y_1+4 x_1 y_1-N)$

So the left-hand side of the equation can be written as

$[r_{3,1}, r_{3,2}] \left[ \begin{array}{ccc} N &3 & \frac{ 3}{2}(1-N) \\ 3 &N &\frac{ 3}{2}(1-N) \end{array} \right] \left[ \begin{array}{c} x_1\\y_1\\1 \end{array} \right]$

What we can see here is that the matrix in the above expression has rank 2, unless $N=3$, where the rank of the matrix is 1. When $N=3$ the nullspace of the matrix is

$\left[ \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \right], \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right]$

which correspond to two solutions – $x_1= 1, y_1= 0 \rightarrow 3= 3*1$, and $x_1= 0, y_1= 1 \rightarrow 3= 1*3$. Otherwise the null space is expressed as

$\left[ \begin{array}{c} \frac{3(N-1)}{2(N+3)} \\ \frac{3(N-1)}{2(N+3)} \\ 1 \end{array} \right]$

So, for $N=1$ $x_1=y_1= 0 \rightarrow 1= 1*1$, and for $N=9$ $x_1= 1, y_1= 1 \rightarrow 9= 3*3$.

Overall, despite we started with 9+3 indeterminates there are only 2 are relevant to the problem. The question is now how does it scales with $n$. Or another question, is it at all possible to find an algorithm that will select only relevant monomials in $P_{...}$.

Let’s try to extend analysis to the case $n=2$.
To be continued.