Last week I’ve created a ray marcher 3d engine which renders the Mandelbulb. And I’ve translated it into pure Javascript a couple of days later. After the translation I decided I should optimize the code a little for speed, so I made some speed improvements in the Javascript code. The main optimization was using an array for the vector3d instead of a class/function.

Rendering the Mandelbulb on a 400×400 canvas now took just 1850ms in Javascript (Chrome, V8). Which is very fast! Even faster than my Java implementation (running on Java 1.6.0.33 -server, which was faster than Java 7). But the Java code didn’t have some of the speed optimizations. So I re-translated the Javascript code back to Java. It produced the following numbers (lower is better performance):

Comparing Javascript and Java

What has happened here? The output is the same, why is Java so much slower than Javascript? I would have suspected the opposite…

I fired up the profiler to see what was causing the Java code to be so slow, and it turned out the method it spend most time in was Math.pow(). Other slow methods were Math.acos(), cos(), sin() etc. It turns out that the Math library isn’t very fast, but there is an alternative, FastMath. Apache Commons has implemented a faster Math library for commons-math. Lets see what changing Math.* to FastMath.* does to the performance:

This is already much better. But still the method causing most delay is FastMath.pow(). Why is Javascript so much faster? The method is made so you can calculate the power of two doubles, not only integer values. But I’m only doing Integer powers (7 and 8 to be precise). So I decided to implement my own method:

private double fasterPow(double d, int exp) {
	double r = d;
	for(int i = 1; i<exp; i++) {
		r *= d;
	}
	return r;
}

Warning: This isn’t the same as Math.pow/FastMath.pow!

The speed with this new method is much better and seems comparable with Javascript. Maybe this is an optimization the V8 engine does by default? Who knows.

The slowest method in the program now is FastMath.acos. From highschool I know that acos(x) can also be calculated as atan(sqrt(1-x*x)/x). So I created a own version of acos. When benchmarked, the different methods: Math.acos(), FastMath.acos() and FastMath.atan(FastMath.sqrt(1-x*x)/x), the result is again surprising:

The custom acos() function is a bit faster than FastMath.acos() and a lot faster than Math.acos(). Using this function in the Mandelbulb renderer gives us the following metric:

So it turns out that with a bit of tweaking we can get the Java version faster than Javascript, but I would have never imagined Java would be slower in the first place. The Chrome V8 guys really did an amazing job improving the speed of their Javascript VM. Mozilla isn’t far behind, they are getting +/- 2200 ms in the benchmark. Which is also faster than Java.Math and FastMath! It seems that V8’s math implementation has some optimizations that Java could really use. The tricks used above don’t make any difference with the Javascript version.

Edit 1: Is Javascript faster than Java?

Well surprisingly in this case it is. With the code a 100% the same, using arrays as vector and Math.* the code actually runs faster in my browser!

Edit 2: People have been asking me: What could have been done to make it faster in Java? And, why is it slow?

Well the answer is twofold:

1) The Math libraries are made for ‘double’ in Java. Having a power() method work with doubles is much harder than working with just integer numbers. The only way to optimize this would be to overload the methods with int-variants. This would allow much greater speeds and optimizations. I think Java should add Math.pow(float, int), Math.pow(int, int) etc.

2) All the Math libraries have to work in all situations, with negative numbers, small numbers, large numbers, zero, null etc. They tend to have a lot of checks to cope with all those scenario’s. But most of the time you’ll know more about the numbers you put in… For example, my fastPower method will only work with positive integers larger than zero. Maybe you know that the power will always have even numbers…? This all means that the implementation can be improved. The problem is, this can’t be easily achieved in a generic (math) library.

 
  • Mike McNally

    Your “faster” power function is doing too many multiplies. Raising a number to the 7th or 8th power requires only three multiplication operations. That should make it around twice as fast.

  • Leo

    Note that your use of atan is poorly behaved near x=0. The builtin acos and FastMath version of acos should be written to work well for all input in [-1,1].

    You could probably speed up pow by implementing exponentiation by squaring. If you don’t need arbitrary integer powers, but only exponents of 7 and 8, you could speed things up even more by getting rid of all the branching in exponentiation and just writing a pow7 and pow8 function which require 4 and 3 multiplications, respectively.

  • Alfie275

    Your “fastPow” function only supports integer indices.

    • http://www.redcode.nl royvanrijn

      @Alfie275: I’m aware, read the article and see the warning.

      About the pow, it can be done in ‘log n’ instead of the current code, makes it even faster indeed. But not by much… it isn’t the bottleneck anymore, so I decided to leave it this way.

  • Semmel

    I also knew the problem with the pow function before and implemented one for convenience and “most common” situations of low integer values. I wasnt aware that there is a “fast math” version though, thanks for that!

    Maybe there is even a generic procedual version being in O(log(n)) for n being the exponent. I dont have it here at moment, but Knuts “The Art of Computer Programming” might have an answer or to calculate the power function most effectively.

    
        /**
         * The power function an as implemented in {@link java.lang.Math}, is very time
         * consuming in the general case. If it is likely, that n is small, this function
         * provides some shortcut calculation. In this case, if n is smaller or equal to 20, simple
         * multiplication operations are used instead of the more expensive function in {@link java.lang.Math}.
         * However, if n is larger than 20, {@link java.lang.Math} is used.
         * 
         * @param   a   the base.
         * @param   n   the exponent.
         * @return  the value an.
         * 
         * @see java.lang.Math#pow(double, double)
    	 */
    	public static double pow(double a, int n)
    	{	
    		double b = 1.0d, c = 1.0d;
    		
    		if(n<0) return 1.0d/MyMath.pow(a, -n);
    		
    		switch(n)
    		{
    			case 0: return 1.0d;
    			case 1: return a;
    			case 2: return a*a;
    			case 3: return a*a*a;
    			case 4: b=a*a; return b*b;
    			case 5: b=a*a; return b*b*a;
    			case 6: b=a*a; return b*b*b;
    			case 7: b=a*a; return b*b*b*a;
    			case 8: b=a*a; b=b*b; return b*b;
    			case 9: b=a*a*a; return b*b*b;
    			case 10: b=a*a; c=b*b; return c*c*b;
    			case 11: b=a*a; c=b*b; return c*c*b*a;
    			case 12: b=a*a; b=b*b*b; return b*b;
    			case 13: b=a*a; b=b*b*b; return b*b*a;
    			case 14: b=a*a; c=b*b*b; return c*c*b;
    			case 15: b=a*a*a; c=b*b; return c*c*b;
    			case 16: b=a*a; b=b*b; b=b*b; return b*b;
    			case 17: b=a*a; b=b*b; b=b*b; return b*b*a;
    			case 18: b=a*a*a; b=b*b; return b*b*b;
    			case 19: b=a*a*a; b=b*b; return b*b*b*a;
    			case 20: b=a*a; c=b*b; c=c*c*b; return c*c;
    			default: return Math.pow(a, n);
    		}
    	
    
    	}
    
  • Zjarek

    I’ve seen switch based approach before, but stopping on smaller numbers. Check if for example pow(a, 19) is faster on your target architecture than math.pow(a,19). Using so high exponant is rare in comparison to small ones. Also fast pow is really simple, basic premise is to make smaller number of multiplications for any integer exponent. For example when your exponent is 19 (0b10011), there are 3 bits sets, 4th, 1st and 0th, so you can calculate x^19 as x^16 * x^2 * x which is x^(2^4) * x^(2^1) * x^(2^0). Making it into a simple loop is not hard (bit operators are your friend here), there is only one step for every bit of exponent, consisting of one or two multiplications. It is more commonly used in arbitrary precision arithmetic, because for not very small n it is slower than general pow.

  • SciK

    There is, indeed, a generic method. Here is one in Python:

    def fast_pow(x, n):
        k = 1
    
        while n != 0:
            if n % 2 != 1:
                k *= x
    
            x *= x
            n //= 2
    
        return k
    
  • Andrew

    What about negative integer powers in fasterPow?

    • http://www.redcode.nl royvanrijn

      @Andrew: Like I said before, fastPow isn’t meant to be a replacement. It is just an example of how you can speed up some inner loop by 1/3rd. It doesn’t do the same as Math.pow.

  • Ramses

    Is there any way to determine from your results whether FastMath.pow() is actually slower than the js implementation? I understand that the Java code using FastMath is slower overall compared to the js version, but that’s not the only difference between the two versions. It doesn’t seem like a direct comparison, unless I’m interpreting your results wrong.

    • http://www.redcode.nl royvanrijn

      @Ramses: No, I don’t think you can compare the libraries… because they are a different language and run in a different VM. You could make a simple program with just Math calls and benchmark both the Java en JS version. But even in that case you can’t really tell if it is the library or the VM implementation.

  • Yehordyl

    Did you try to port your version of acos() to JS?

    • http://www.redcode.nl royvanrijn

      @Yehordyl: No I haven’t, the changes to pow() were ported back to Javascript, but didn’t change anything to the runtime. I didn’t profile the Javascript version, so I don’t know what the bottleneck is there.

  • http://maettig.com Maettig

    I’ve seen people using Math.pow(x, 2) which is horribly, horribly slow compared to a simple x * x. Same goes for Math.pow(2, x) which should be replaced with 1 << x. I collected some of my results in a short blog post (German). Rule of thumb: Try to avoid the Math package at all.

  • http://www.redcode.nl royvanrijn

    @Maettig: Sure, for int-mutations. That doesn’t work on double obviously, which is what the Math library is for. However, a good addition to the Math library would be adding int functions!

  • janus

    SciK: congratulations on your white-space sensitive language.

    Seriously, why doesn’t this blog support ?

  • http://www.redcode.nl royvanrijn

    @janus: Just default WordPress… :-) I’ll see what I can do!

    Edit: I’ve fixed the comments, for some reason the <code>-tag doesn’t do <pre>…

  • http://maettig.com Maettig

    Replacing Math.pow(x, 2) with x * x is not limited to integers. ;-)

  • http://www.pingtimeout.fr Pierre Laporte

    I’m also curious about the default behaviour of Java. Have you waited for the JVM to complete the optimisations of your code ?

    Can you post the Java source of your benchmark on github ? I would like to test it a bit too ;)

    Thanks for this article !

  • Andrey

    Could you try my version of power? It is O(log(n)) and my tests show that it is 1.5 faster on powers from 0 to 100.

        private static double binPow(double d, int exp) {
            double res = 1;
            double mul = d;
            int mask = 1;
            while (mask <= exp)
            {
                if ((mask & exp) != 0)
                {
                    res *= mul;
                }
                mul *= mul;
                mask <<= 1;
            }
            return res;
        }
    
    • http://www.redcode.nl royvanrijn

      @Andrey: I’ve tested it, but on small numbers (lets say 8, which I need…?) it isn’t faster on my laptop, it is actually 100 times slower (?!):

      		int times = 1000000000;
      		long t = System.currentTimeMillis();
      		for(int i = 0; i<times;i++) {
      			rayMarching.fasterPow(1.0, 8);
      		}
      		System.out.println(System.currentTimeMillis()-t);
      		t = System.currentTimeMillis();
      		for(int i = 0; i<times;i++) {
      			rayMarching.binPow(1.0, 8);
      		}
      		//One more time just to make sure we are warmed up:
      		System.out.println(System.currentTimeMillis()-t);
      		t = System.currentTimeMillis();
      		for(int i = 0; i<times;i++) {
      			rayMarching.fasterPow(1.0, 8);
      		}
      		System.out.println(System.currentTimeMillis()-t);
      		t = System.currentTimeMillis();
      		for(int i = 0; i<times;i++) {
      			rayMarching.binPow(1.0, 8);
      		}
      		System.out.println(System.currentTimeMillis()-t);
      

      This results in:

      4
      2187
      2
      2344
      
      

      Can you verify this?

  • http://www.pingtimeout.fr Pierre Laporte

    Thanks but I may have been mistaken =).

    If you look at the algorithmic, your code is indeed more optimized for your needs so it will be faster than Math.pow(). But my point is that the JVM optimizes “hot” code (branches or methods). For instance, using the JVM option “-server” and the default parameters, a method will be compiled into native code by the JVM after 10k invocations.

    So in you case, you may see that during the first 9.999 times, the Java implementation take ~5.2 seconds. This behavior is normal since it is warmup time. But after the 10’000th time, it may take less than 2 seconds once the JVM has optimized it.

    You can use the flag -XX:+PrintCompilation if you want the JVM to print a message when it compiles a code block. Until you see no more compilation message, you cannot measure precisely the duration of your code. See this page for a complete listing of JVM options : http://jvm-options.tech.xebia.fr

  • http://pomax.nihongoresources.com Pomax

    Using atan in JS, compared to acos: http://jsperf.com/acos-vs-atan-flavour — in Chrome, the difference is massively in favour of atan (acos 13.3m op/sec vs. atan 24m op/sec), but Chrome has interesting behaviour for trigonometric functions. In Firefox, the difference is marginal, borderline non-existent (11.8 vs. 11). in Opera, there is a big difference, in favour of acos (7.5 vs. 6) and in IE9 there is a big in favour of atan (8,6 vs 14.2).

    So it would speed up things significantly, not really, or to a negative degree, depending on the browser =)

  • http://blog.toadhead.net/ Mike Heath

    It’s no secret that Java’s trig functions are slow. The biggest reason for this is that Java favors cross platform compatibility over performance. It would be really really nice if Oracle would add a JVM flag indicating whether or not performance was desired over strict compatibility for all floating point operations.

  • Andrey

    Roy,

    Yeah it sucks on small powers, but if much faster with higher ones.
    Try this one, it is 10-20% faster (3 multiplications instead of 7):

    private static double casePow(double d, int exp) {
    double d2, d4, d8;

    switch (exp)
    {
    case 0 : return 1;
    case 1 : return d;
    case 2 : return d * d;
    case 3 : return d * d * d;
    case 4 : d2 = d * d; return d2 * d2;
    case 5 : d2 = d * d; return d2 * d2 * d;
    case 6 : d2 = d * d; return d2 * d2 * d * d;
    case 7 : d2 = d * d; d4 = d2 * d2; return d4 * d2 * d;
    case 8 : d2 = d * d; d4 = d2 * d2; return d4 * d4;
    case 9 : d2 = d * d; d4 = d2 * d2; return d4 * d4 * d;
    default: return 1; //use different one
    }
    }

  • Olivier

    For performance testing, use Caliper: http://code.google.com/p/caliper/

    Regarding the implementation itself, switch+case is slow. Use if/else instead. See implementation here: http://jafama.svn.sourceforge.net/viewvc/jafama/src/odk/lang/FastMath.java?view=markup
    Look for the method named powFast.

  • someone

    Just for your information, there is a MUCH better way to implement exponentiation on integers, commonly called fast exponentiation. I’m curious how much your code would improve if you substituted it in. I’ve written the code below (making some assumptions about java behaving like C, I’m not a java guy) This is a logorithmic time algorithm as opposed to linear; although it only works for integer powers.

    private double fastIntExp(double d, int exp) {
      //if the power is even
      if(exp%2==0){
        //take the base to half the power and multiply it with itself
        double result = fastIntExp(d,exp/2);
        return result*result;
      //if the power is odd
      }else if(exp>1){
        //same thing but take one out and multiply with the base again at the end
        double result = fastIntExp(d,(exp-1)/2);
        return result*result*d;
      //base cases to stop recursion
      }else if(exp==1){
        return d;
      }else if(exp==0){
        return 1;
      }
      //invalid data given, can't do a thing with that
      return -1;
    }
    
  • Michael Tiller

    One point that wasn’t addressed. Did you actually make sure that these different approaches got the same answer? I didn’t see this discussed but perhaps I missed it or it was otherwise implied. If you don’t get the same answers, its hard to make such comparisons.

  • imma

    Liked the article, thankyou. It inspired me to tinker a bit with making a fast-pow in javascript (for integer exponents) & it seems *nearly* as fast as Math.pow on my browsers :-)
    Incase it’s vaguely helpful, relevant or of interest in terms of algorithm :

    function fpow3(x, exp) {
    if(exp == 0) return 1;
    if(exp == 1) return x;

    var r = 1, v = x, e = exp;
    while(1) {
    if(e & 1) r = r * v;

    e = e >> 1;
    if(!e) return r; // return asap
    v = v * v; // next exp bit is double the multiplier : x, x*x, (x*x)*(x*x), ((x*x)*(x*x))*((x*x)*(x*x)), etc
    }
    }

  • Alexander Ewering

    You should try actually replacing your Math.pow() stuff directly inline with x*x*x*x*x*x*x if you say that the exponents are only 7 or 8… I bet that will again double the performane without all the function call and loop overhead…

  • Friso

    Not sure if you’re reading comments on this old article but anyway, here goes.

    Just happened to come across this blog, just like I just happened to come across this one which states a possible cause for the difference in speed:

    And there is additional restrictions about when numbers can be considered to be integers. V8 has a faster version of Math.pow because the specification that it is implementing allows for a faster version.

    Groeten,

    Friso

    • http://www.royvanrijn.com royvanrijn

      I do read comments to old posts; and yes I’ve also come across that point! They are allowed to take some shortcuts in the math libraries that Java cannot do. This could very well explain the speed difference, but even so it is still impressive!

  • Friso

    On the speed of java vs javascript: http://developer-blog.cloudbees.com/2013/12/about-paypal-node-vs-java-fight.html

    notice that the JavaScript specification allows for an “implementation-dependent approximation” (of unspecified accuracy) while the JVM version has the addition that

    The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic.

    And there is additional restrictions about when numbers can be considered to be integers. V8 has a faster version of Math.pow because the specification that it is implementing allows for a faster version.