In the timing experiments of the previous
section, you may have noticed that the running time of the block
increases approximately by a factor of four when the size of the list
is doubled; also, the running time increases approximately by a factor
of 100 when the size of the list is increased by a factor of ten.
In general, as we scale the size of the list by a certain amount, we
also scale the running time of the block by the square of that
amount. If we were to plot the running time of the block against
the size of the list, we would get a parabola or a quadratic function,
because as we move along a parabola, any increase in the x
-coordinate corresponds to a squared increase in the y
-coordinate. Algorithms similar to that used by this block are thus called quadratic-time algorithms.
Why does the algorithm that checks for the
distinctness of elements in a list run in quadratic-time? Let's
think about how many comparisons we have to do in the worst-case, for a
list of length N
. We compare the first number against (N - 1)
numbers, the second number against (N - 2)
numbers, the third number against (N - 3)
numbers, and so on, until we reach the last number (if we're unlucky),
which we have to compare against no other number. In total, we
have to do (N - 1) + (N - 2) + (N - 3) + ... + 2 + 1
total comparisons, which, according to Gauss, is [N * (N - 1)]/2
total comparisons, or (N2 - N)/2
total comparisons. For large values of N
, the N2
term begins to eclipse the value of N
:
Size of input | N |
N2 |
---|---|---|
N = 10 | 10 | 100 |
N = 100 | 100 | 10,000 |
N = 1,000 | 1,000 | 1,000,000 |
N = 10,000 | 10,000 | 100,000,000 |
The difference between N2
and N
is more pronounced for even larger values of N
. Since we only care about the runtime of algorithms for really large inputs, we say that the contribution of the N
term is essentially negligible, and that the runtime of the algorithm varies as N2
.
You may find the approximation of (N2 - N)/2
as N2
a bit shocking (mathematicians definitely do); however, in computer
science, we make many such approximations because we don't want to
drown ourselves in too much detail, and also because this runtime analysis
is very rough and approximate. The overarching theme of
abstraction is evident here: we want to abstract ourselves away from
the details of the computer that the algorithm will eventually be run
on, and instead gain an idea as to how the runtime of an algorithm will
behave for larger and larger inputs.
We would like to answer the question: does the computer get slower for larger inputs on one kind of algorithm than on another? If so, can we make the algorithm faster or better? Answering these questions can sometimes lead to surprising results, as you saw when comparing two different algorithms to sum the numbers in a list.