<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>in progress...</title>
	<atom:link href="http://markfalco.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://markfalco.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Thu, 11 Jun 2009 20:07:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='markfalco.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>in progress...</title>
		<link>http://markfalco.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://markfalco.wordpress.com/osd.xml" title="in progress..." />
	<atom:link rel='hub' href='http://markfalco.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Measuring performance via probability</title>
		<link>http://markfalco.wordpress.com/2009/06/09/measuring-performance-via-probability/</link>
		<comments>http://markfalco.wordpress.com/2009/06/09/measuring-performance-via-probability/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 02:20:52 +0000</pubDate>
		<dc:creator>Mark Falco</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://markfalco.wordpress.com/?p=29</guid>
		<description><![CDATA[Consider you have some arbitrary operation which you wish to measure the performance of. For instance: The content of the method is not important, in fact we should not assume to know the implementation, all we know is that it is likely to complete in well under the system&#8217;s clock resolution. Typically we&#8217;d micro benchmark [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=29&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Consider you have some arbitrary operation which you wish to measure the performance of.  For instance:</p>
<p><pre class="brush: java;">
    public static double testMethod(double dfl)
        {       
        for (int i = 0; i &lt; 100; ++i)
            {
            dfl = Math.log(dfl);
            }
        return dfl;
        }
</pre></p>
<p>The content of the method is not important, in fact we should not assume to know the implementation, all we know is that it is likely to complete in well under the system&#8217;s clock resolution.  Typically we&#8217;d micro benchmark this during development time by simply calling it a large number of times in a tight loop and computing the average execution time:</p>
<p><pre class="brush: java;">
    public static Result testClassic(int cIters)
        {
        long   ldtStart  = System.currentTimeMillis();
        double dflResult = 123.456;
        for (int i = 0; i &lt; cIters; ++i)
            {
            dflResult = testMethod(dflResult);
            }
        long   cMillis        = System.currentTimeMillis() - ldtStart;
        double dflMillisPerOp = ((double) cMillis) / cIters;
        return new Result(cIters, cMillis, dflMillisPerOp, dflResult);
        }
</pre></p>
<p>One of the primary reasons we do the loop is that as we expect testMethod to return in well under the clock resolution, and thus without the loop we&#8217;d be left with an arguably meaningless value of &#8220;0&#8243;.  So instead we forcefully run it long enough that we are sure the clock will have &#8220;ticked&#8221; many times.  This is all well and good, but what if we weren&#8217;t in a position to call it in a tight loop.  What if our interest was in measuring the cost of periodic calls to this function inside a working application, where we can&#8217;t simply time a loop over thousands of calls?</p>
<p>At this point you may be asking why would you want to have the application measure the cost, isn&#8217;t that what profilers are for?  Well yes, this isn&#8217;t meant to replace a profiler, but it can be useful in collecting runtime statistics.  For instance in <a href="http://www.oracle.com/technology/products/coherence/index.html">Oracle Coherence</a> we&#8217;ve consider measuring the average serialization time for user data objects.  These are not classes we have control over, or the ability to easily profile, but it would be nice to know track their serialization cost as it is a crucial part of our overall performance.  As Coherence still supports Java 1.4 we are without the higher resolution System.nanoTime(), and so an alternative is needed.</p>
<p>So what do we do with a millisecond resolution clock, when we want to measure sub millisecond operations?  Well we just do the obvious.</p>
<p><pre class="brush: java;">
        // ... other application code ...

        long  ldtStart = System.currentTimeMillis();
        testMethod(dflInpuut);
        cMillisTotal += System.currentTimeMillis() - ldtStart;
        ++cIters;

        // ... other application code ...
        // at some point compute the average cost
        double dflMillisPerOp = ((double) cMillisTotal) / cIters;    
</pre></p>
<p>That is it, we just need to do is cross over this code a few thousand times and we&#8217;re good.  I&#8217;d imagine some might be wondering what the use in this is as the per-operation measurement should yield zero each time.  We&#8217;ll lets give it a try and see what we get.   To simulate the application crossing over the above code we&#8217;ll introduce a loop, which between calls into the above code, burns a random amount of CPU, i.e. our fake application logic.  We&#8217;ll call this approach the &#8220;discrete&#8221; approach as compared to the &#8220;classic&#8221; approach shown above.</p>
<p><pre class="brush: java;">
    public static Result testDiscrete(int cIters)
        {
        int     nFactor   = rnd.nextInt(10);
        long   ldtStart  = System.currentTimeMillis();
        long   cMillis   = 0;
        double dflResult = 123.456;
        for (int i = 0; i &lt; cIters; ++i)
            {
            long ldtIter = System.currentTimeMillis();
            dflResult = testMethod(dflResult);
            cMillis  += System.currentTimeMillis() - ldtIter;

            burnCPU(nFactor); // simulate lots of other code running
            }
        long   cMillisOuter   = System.currentTimeMillis() - ldtStart;
        double dflMillisPerOp = ((double) cMillis) / cIters;
        return new Result(cIters, cMillisOuter, dflMillisPerOp, dflResult);
        }
</pre></p>
<p>The results for testing multiple runs with 100,000 iterations each are as follows:</p>
<p>Classic:  Average execution in 0.0092 ms; after measuring 100000 over 920 ms<br />
Discrete: Average execution in 0.00938 ms; after measuring 100000 over 17192 ms</p>
<p>Classic:  Average execution in 0.00883 ms; after measuring 100000 over 883 ms<br />
Discrete: Average execution in 0.00898 ms; after measuring 100000 over 908 ms</p>
<p>Classic:  Average execution in 0.0089 ms; after measuring 100000 over 890 ms<br />
Discrete: Average execution in 0.00905 ms; after measuring 100000 over 19128 ms</p>
<p>Classic:  Average execution in 0.00885 ms; after measuring 100000 over 885 ms<br />
Discrete: Average execution in 0.0089 ms; after measuring 100000 over 5435 ms</p>
<p>Classic:  Average execution in 0.00884 ms; after measuring 100000 over 884 ms<br />
Discrete: Average execution in 0.0094 ms; after measuring 100000 over 12287 ms</p>
<p>Classic:  Average execution in 0.00887 ms; after measuring 100000 over 887 ms<br />
Discrete: Average execution in 0.00899 ms; after measuring 100000 over 3173 ms</p>
<p>As you can see both approaches yield surprisingly similar results, the cost of our testMethod is around .01 ms.  The variability in the total duration of the discrete test is intentional and caused by the burnCPU() method, which is configured to burn a random amount of CPU per test run simulating the frequency of calls to testMethod().</p>
<p>But how does the discrete approach yield apparently accurate results if always measures 0ms per call?  Well I suppose the obvious answer is that it doesn&#8217;t always measure 0ms, sometimes is measures 1ms.  The fact that the method always completes in well under a millisecond doesn&#8217;t mean we don&#8217;t occasionally cross a clock edge during the measurement and get an even more incorrect 1ms result.  Remember that we aren&#8217;t in control of the clock, and thus we can&#8217;t assume that we are at the beginning of a clock-cycle when we record the start time, thus we could end up recording the start time when we are just a few nanoseconds from the next tick.  Ok, but still how does this yield such good results?  This can be answered is in the question, &#8220;What is the probability of us crossing the clock edge?&#8221;.  Using the result from the classic method we can see that the probability of crossing a one ms clock boundary is around 1%.  This comes straight from our measurement of ~0.01ms.  In the classic case with back to back method calls, for every 100 or so 0.01ms calls the clock would tick.  This cost of testMethod() doesn&#8217;t change with the discrete approach, and thus the probability of us encountering a clock tick is still 1%.  So 99% of our samples result in 0ms, and 1% result in 1ms.  Meaning that the accumulated milliseconds for out 100,000 samples will be around one second, which pretty much lines up with what we get in the classic case.  So this is how we come to the title, we are measuring relying upon the small probability of crossing a clock edge, and the result of adding the results from a large number of samples.</p>
<p>I find it really interesting to watch this work.  Even though the math makes sense it is still surprising to see the results occur which such accuracy.  The results are not the result of a carefully crafted test, change the iterations, take out or change the &#8220;burn&#8221;, change the test function, it all still holds up quite well.  It also reliant on this particular clock resolution, for instance the same thing will work on Windows where the precision of System.currentTimeMillis() is actually ~16ms (resolution is still 1ms).  In such a case the probability of seeing a clock edge becomes 1 in 1600, but the cost of that one, is now 16ms, so the totals and averages will still hold.</p>
<p>Now there are a few small caveats:</p>
<li>Lots of samples are required, though they are required in the classic case as well.  In both cases the more samples the more accurate the result.</li>
<li>A certain amount of non-determinism is required to ensure that you don&#8217;t predictably start at the same point in a clock-cycle every time.  Except for the case of running in a non-multitasking OS, or with a very simple single threaded application, I wouldn&#8217;t expect this to be an issue.</li>
<li>It is not useful for computing stats other then totals and averages.  For instance the standard deviation would not be accurate, as each discrete sample is by itself hopelessly inaccurate.</li>
<li>The CPU overhead of the periodic calls to obtain the time may not be acceptable.  The nice thing is that you can choose to only measure a small percentage of the calls and still yield meaningful results.</li>
<p>Of course the biggest caveat of all is that with higher resolution clocks such as Java 1.5&#8242;s System.nanoTime() this technique isn&#8217;t as necessary, though that doesn&#8217;t change the fact that it is still quite interesting.</p>
<p>And for those that are interested here is the full source:</p>
<p><pre class="brush: java;">
import java.util.Random;

/**
* Scratch test
*/
public class Scratch
    {
    public static void main(String[] asArg)
            throws Exception
        {
        int cRuns  = asArg.length &amp;gt; 0 ? Integer.parseInt(asArg[0]) : 100;
        int cIters = asArg.length &amp;gt; 1 ? Integer.parseInt(asArg[1]) : 100000;

        for (int i = 0; i &amp;lt; cRuns; ++i)
            {
            System.out.println(&amp;quot;Classic:  &amp;quot; + testClassic(cIters));
            System.out.println(&amp;quot;Discrete: &amp;quot; + testDiscrete(cIters));
            System.out.println();
            }
        }

    /**
    * Some non-trivial method which is still very short in duration.
    */
    public static double testMethod(double dfl)
        {
        for (int i = 0; i &amp;lt; 100; ++i)
            {
            dfl = Math.log(dfl);
            }
        return dfl;
        }

    /**
    * Perform &amp;quot;classic&amp;quot; performance test measuring how long it takes to run
    * the operation many times, and then computing an average.
    */
    public static Result testClassic(int cIters)
        {
        long   ldtStart  = System.currentTimeMillis();
        double dflResult = 123.456;
        for (int i = 0; i &lt; cIters; ++i)
            {
            dflResult = testMethod(dflResult);
            }
        long   cMillis        = System.currentTimeMillis() - ldtStart;
        double dflMillisPerOp = ((double) cMillis) / cIters;
        return new Result(cIters, cMillis, dflMillisPerOp, dflResult);
        }

    /**
    * Perform &amp;quot;non-classic&amp;quot; performance measurement where timings are recorded
    * for individual executions, and then computing an average.
    */
    public static Result testDiscrete(int cIters)
        {
        int    nFactor   = rnd.nextInt(10);

        long   ldtStart  = System.currentTimeMillis();
        long   cMillis   = 0;
        double dflResult = 123.456;
        for (int i = 0; i &amp;lt; cIters; ++i)
            {
            long ldtIter = System.currentTimeMillis();
            dflResult = testMethod(dflResult);
            cMillis  += System.currentTimeMillis() - ldtIter;

            burnCPU(nFactor); // simulate lots of other code running
            }
        long   cMillisOuter   = System.currentTimeMillis() - ldtStart;
        double dflMillisPerOp = ((double) cMillis) / cIters;
        return new Result(cIters, cMillisOuter, dflMillisPerOp, dflResult);
        }

    /**
    * Data structure for recording resuls.
    */
    static class Result
        {
        public Result(int cIters, long cMillisTotal, double dflMillisPerOp,
                double dflResult)
            {
            m_cIters         = cIters;
            m_cMillisTotal   = cMillisTotal;
            m_dflMillisPerOp = dflMillisPerOp;
            m_dflResult      = dflResult;
            }

        public String toString()
            {
            return &amp;quot;Average execution in &amp;quot; + m_dflMillisPerOp +
                   &amp;quot; ms; after measuring &amp;quot; + m_cIters + &amp;quot; over &amp;quot; +
                   m_cMillisTotal + &amp;quot; ms&amp;quot;;
            }

        private int    m_cIters;
        private long   m_cMillisTotal;
        private double m_dflMillisPerOp;
        private double m_dflResult;
        }

    /**
    * Helper method to burn a random amount of CPU in order to simulate
    * spreading out the individual measurements.
    */
    public static int burnCPU(int nFactor)
        {
        int r = 0;
        for (int j = 0; j &amp;lt; nFactor * 1000; ++j)
            {
            r = 1 + rnd.nextInt();
            }
        return r;
        }

    static final Random rnd = new Random();
    }
</pre></p>
<p>I hope you&#8217;ve found this as interesting as I have</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/markfalco.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/markfalco.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/markfalco.wordpress.com/29/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=29&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://markfalco.wordpress.com/2009/06/09/measuring-performance-via-probability/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b29fc5a9bcc7a7e2513b5b6a7eadd0fd?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mark</media:title>
		</media:content>
	</item>
		<item>
		<title>Fun with micro benchmarks and optimizers</title>
		<link>http://markfalco.wordpress.com/2009/05/07/fun-with-micro-benchmarks-and-optimizers/</link>
		<comments>http://markfalco.wordpress.com/2009/05/07/fun-with-micro-benchmarks-and-optimizers/#comments</comments>
		<pubDate>Thu, 07 May 2009 02:11:41 +0000</pubDate>
		<dc:creator>Mark Falco</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://markfalco.wordpress.com/?p=11</guid>
		<description><![CDATA[I&#8217;ve been doing some micro benchmarks comparing the apparent cost of Java vs C++ virtual function calls. While the comparison is interesting, what really threw me for a bit was the results I was getting in Java 1.6. Have a look at the first version of my test: And running 10 iterations of 1 billion [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=11&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some micro benchmarks comparing the apparent cost of Java vs C++ virtual function calls. While the comparison is interesting, what really threw me for a bit was the results I was getting in Java 1.6. Have a look at the first version of my test:</p>
<p><pre class="brush: java;">
public class Main
    {
    public static class Test
        {
    	public int getVirtual()
            {
            return m_i;
            }

        public int m_i = 1;
        }

    public static void main(String[] asArg)
	{
    	int cIters = Integer.parseInt(asArg[0]);
    	int cOps   = Integer.parseInt(asArg[1]);

        Test test = new Test();
        for (int i = 0; i &lt; cIters; ++i)
            {
            long ldtStart = System.currentTimeMillis();
            for (int j = 0; j &lt; cOps; ++j)
                {
                test.getVirtual();
                }

            long cMillis = System.currentTimeMillis() - ldtStart;
            System.out.println(cMillis);
            }
        }
    }
</pre></p>
<p>And running 10 iterations of 1 billion virtual calls yields the following millisecond timings:</p>
<p>java14 Main 10 1000000000<br />
1646<br />
1636<br />
1650<br />
1677<br />
1645<br />
1669<br />
1689<br />
1668<br />
1676<br />
1724</p>
<p>java15 Main 10 1000000000<br />
1775<br />
1752<br />
1698<br />
1688<br />
1759<br />
1732<br />
1758<br />
1709<br />
1670<br />
1733</p>
<p>java16 Main 10 1000000000<br />
<strong>8</strong> &lt;&#8212; WTF?<br />
1409<br />
1370<br />
1347<br />
1357<br />
1341<br />
1346<br />
1349<br />
1350<br />
1354</p>
<p>Ok, so on 1.6 the first iteration of billion calls took 8ms and each subsequent iteration took nearly 200 times as long, how could that be?</p>
<p><strong>Q&gt;</strong> I screwed up my measurements</p>
<p><strong>A&gt;</strong> Nope looks ok, double check, still looks ok</p>
<p><strong>Q&gt;</strong> Was is the garbage collector</p>
<p><strong>A&gt;</strong> Shouldn&#8217;t be, other then printing the results the test doesn&#8217;t generate any garbage.  I reran with GC logging just to be sure, and no GCs were logged (not surprising).</p>
<p><strong>Q&gt;</strong> Did I need to let the test run longer so HotSpot could do its thing?</p>
<p><strong>A&gt;</strong> Nope, ran for much longer, results held steady at ~1350ms</p>
<p><strong>Q&gt;</strong> Is the optimizer broken or deoptimizing after the first iteration?</p>
<p><strong>A&gt;</strong> Sure looks like it</p>
<p>Ok so time to start thinking about how to optimizer is going to change my test.  The first thing to notice is that it could identify that there is no reason to actually call my getVirtual() method, it can see there are no side effects from calling it, and can see that the result is discarded.  So lets modify to do something with the result, just looking at the test loop now.</p>
<p><pre class="brush: java;">
for (int i = 0; i &lt; cIters; ++i)
    {
    long ldtStart = System.currentTimeMillis();
    int  cTotal   = 0;
    for (int j = 0; j &lt; cOps; ++j)
        {
        cTotal += test.getVirtual();
        }

    long cMillis = System.currentTimeMillis() - ldtStart;
    System.out.println(cMillis + &quot;, &quot; + cTotal);
    }
</pre></p>
<p>And the results:</p>
<p>java16 Main 10 1000000000<br />
46, 1000000000<br />
1934, 1000000000<br />
1875, 1000000000<br />
1897, 1000000000<br />
1924, 1000000000<br />
1938, 1000000000<br />
1884, 1000000000<br />
1890, 1000000000<br />
1895, 1000000000<br />
1943, 1000000000</p>
<p>A few observations:</p>
<ul>
<li>We can see the getVirtual() method (or least its logic) is getting run</li>
<li>the cost of both the fast iteration and the slow iteration increased</li>
<li>the performance difference between fast and slow runs is down to ~40</li>
</ul>
<p>Still what&#8217;s the deal, running the test for longer doesn&#8217;t yield any additional fast iterations.  So at this point I get a few other people involved, and they work through many of the same suggestions and assumptions I&#8217;d listed above.  We also try the following:</p>
<ul>
<li>pull the logic out of main() and put it in a non-static method</li>
<li>try running the test loop in parallel on multiple threads</li>
<li>try first warming up the JVM by running some random but heavy code prior to running the test</li>
<li>try recording the results into an array rather then printing during the test</li>
</ul>
<p>All of these yield essentially the same results as above.  We do randomly trigger the loss of the fast iteration, but never trigger multiple fast iterations.  So yipee we figured out how to make things go slower.  As a side note it was intersting what would trigger the loss of the fast iteration, which was triggered by recording the results into an array as follows:</p>
<p><pre class="brush: java;">
long[] acMillis = new long[cIters];
int[] acTotal  = new int[cIters];
for (int i = 0; i &lt; cIters; ++i)
    {
    long ldtStart = System.currentTimeMillis();
    int  cTotal   = 0;
    for (int j = 0; j &lt; cOps; ++j)
        {
        cTotal += test.getVirtual();
        }

    acMillis[i] = System.currentTimeMillis() - ldtStart;
    acTotal [i] = cTotal;
    }

for (int i = 0; i &lt; cIters; ++i)
    {
    System.out.println(acMillis[i] + &quot;, &quot; + acTotal[i]);
    }
</pre></p>
<p>java16 Main 10 1000000000<br />
1328, 1000000000<br />
1346, 1000000000<br />
1338, 1000000000<br />
1308, 1000000000<br />
1343, 1000000000<br />
1335, 1000000000<br />
1324, 1000000000<br />
1354, 1000000000<br />
1336, 1000000000<br />
1344, 1000000000</p>
<p>And even odder we could &#8220;fix&#8221; this and get an initial fast result by recording the time measurements int an Object array rather then a long array.  No kidding, dynamic object allocation improved results.</p>
<p><pre class="brush: java;">
Object[] acMillis = new Object[cIters];
int[] acTotal  = new int[cIters];
for (int i = 0; i &lt; cIters; ++i)
    {
    long ldtStart = System.currentTimeMillis();
    int  cTotal   = 0;
    for (int j = 0; j &lt; cOps; ++j)
        {
        cTotal += test.getVirtual();
        }

    acMillis[i] = new Long(System.currentTimeMillis() - ldtStart);
    acTotal [i] = cTotal;
    }

for (int i = 0; i &lt; cIters; ++i)
    {
    System.out.println(acMillis[i] + &quot;, &quot; + acTotal[i]);
    }
</pre></p>
<p>java16 Main 10 1000000000<br />
46, 1000000000<br />
1806, 1000000000<br />
1791, 1000000000<br />
1831, 1000000000<br />
1775, 1000000000<br />
1779, 1000000000<br />
1806, 1000000000<br />
1811, 1000000000<br />
1789, 1000000000<br />
1783, 1000000000</p>
<p>Ok, so clearly this is nuts.  But then things start to come into focus, lets go back to a version where the optimizer can optimize out the getVirtual() call, in fact we&#8217;ll do it for them.</p>
<p><pre class="brush: java;">
for (int i = 0; i &lt; cIters; ++i)
    {
    long ldtStart = System.currentTimeMillis();
    for (int j = 0; j &lt; cOps; ++j)
        {
        // cTotal += test.getVirtual();
        }
    long cMillis = System.currentTimeMillis() - ldtStart;
    System.out.println(cMillis);
    }
</pre></p>
<p>java16 Main 10 1000000000<br />
6<br />
865<br />
875<br />
859<br />
856<br />
969<br />
875<br />
872<br />
861<br />
877</p>
<p>Ok, so we&#8217;ve so removed the thing we were originally trying to test, and performance improves across the board.  We get a 40% savings on the slow runs.  So we see not too surprisingly that of the time we were measuring nearly half was in the inner &#8216;for&#8217; loop not in the thing we&#8217;d intended to test.  And now we the ah ha moment, how does the optimizer look at optimizing the test &#8220;infrastructure&#8221;, i.e. our main method, and we make this change.</p>
<p><pre class="brush: java;">
public static void main(String[] asArg)
    {
    int cIters = Integer.parseInt(asArg[0]);
    int cOps   = Integer.parseInt(asArg[1]);

    for (int i = 0; i &lt; 10; ++i)
        {
        runTest(cIters, cOps);
        System.out.println(&quot;pass &quot; + i + &quot; complete&quot;);
        }
    }

public static void runTest(int cIters, int cOps)
    {
    Test test = new Test();
    for (int i = 0; i &lt; cIters; ++i)
        {
        long ldtStart = System.currentTimeMillis();
        int  cTotal   = 0;
        for (int j = 0; j &lt; cOps; ++j)
            {
            cTotal += test.getVirtual();
            }
        long cMillis = System.currentTimeMillis() - ldtStart;
        System.out.println(cMillis + &quot;, &quot; + cTotal);
        }
    }
</pre></p>
<p>java16 Main 10 1000000000<br />
46, 1000000000<br />
1886, 1000000000<br />
1894, 1000000000<br />
1902, 1000000000<br />
1877, 1000000000<br />
1870, 1000000000<br />
1862, 1000000000<br />
1910, 1000000000<br />
1861, 1000000000<br />
1878, 1000000000<br />
pass 0 complete<br />
1869, 1000000000<br />
1899, 1000000000<br />
1897, 1000000000<br />
1866, 1000000000<br />
1882, 1000000000<br />
1855, 1000000000<br />
1895, 1000000000<br />
1890, 1000000000<br />
1903, 1000000000<br />
1880, 1000000000<br />
pass 1 complete<br />
38, 1000000000<br />
41, 1000000000<br />
40, 1000000000<br />
43, 1000000000<br />
39, 1000000000<br />
42, 1000000000<br />
39, 1000000000<br />
42, 1000000000<br />
61, 1000000000<br />
44, 1000000000<br />
pass 2 complete<br />
40, 1000000000<br />
41, 1000000000<br />
41, 1000000000<br />
42, 1000000000<br />
39, 1000000000<br />
39, 1000000000<br />
42, 1000000000<br />
&#8230;.</p>
<p>Eureka, finally we can get lots sustained fast runs.  So it appears that we need to pop out of our test code in order for the optimizer to undo the apparent deoptimization it had made during the second iteration.  Note that it was not simply that we&#8217;d run more total iterations, because we&#8217;d tried that earlier, it is in fact popping out of the test function, which apparently allows an optimized version to be slipped in.  This ends up being an important bit to remember to account for in doing these types of micro benchmarks.</p>
<p>At this point I&#8217;d assumed the &#8220;fast&#8221; runs had inlined the body of the getVirtual() method into the &#8216;for&#8217; loop, and that the &#8220;slow&#8221; runs did call the function.  Changing the code to manually inline however don&#8217;t back this up, leaving me with more questions.  At least the mystery of the broken optimizer are solved though, that is enough for now.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/markfalco.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/markfalco.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/markfalco.wordpress.com/11/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=11&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://markfalco.wordpress.com/2009/05/07/fun-with-micro-benchmarks-and-optimizers/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b29fc5a9bcc7a7e2513b5b6a7eadd0fd?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mark</media:title>
		</media:content>
	</item>
		<item>
		<title>hello</title>
		<link>http://markfalco.wordpress.com/2009/05/05/hello-world/</link>
		<comments>http://markfalco.wordpress.com/2009/05/05/hello-world/#comments</comments>
		<pubDate>Tue, 05 May 2009 01:41:34 +0000</pubDate>
		<dc:creator>Mark Falco</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[I figured since everyone else on the planet has gone to Twitter, I might as well start blogging.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=1&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I figured since everyone else on the planet has gone to Twitter, I might as well start blogging.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/markfalco.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/markfalco.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/markfalco.wordpress.com/1/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=markfalco.wordpress.com&amp;blog=7621753&amp;post=1&amp;subd=markfalco&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://markfalco.wordpress.com/2009/05/05/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b29fc5a9bcc7a7e2513b5b6a7eadd0fd?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mark</media:title>
		</media:content>
	</item>
	</channel>
</rss>
