Monday, December 12, 2011

Question about binary search algorithm in theta notation.?

Hello,





I've a small question about the performance of the Binary Search Algorithm. I've written some pseudo-code that does the following:





First let me define A as the array in which we are looking for the value v, A is sorted. Next some syntaxis issues, when I write A[1], I mean the first element in the array. The array does NOT start at 0 like in some programming languages, but it starts at 1.





First my pseudo-code will be looking for the value v and tries to match the center of the array.


If A[center] = v then it will the value center.





If A[center] != v then it will check wheter v is bigger or smaller than the center.





If v %26gt; A[center], it will copy from A[center + 1] untill A[length[A]] into array B. Here array B consist of length[A] - center elements. And it will do a binary search in array B.





If v %26lt; A[center], it will copy from A[1] untill A[center - 1] into array B. Here array B consist of center - 1 elements. And it will do a binary search in array B.





Now my problem is as follows, at http://en.wikipedia.org/wiki/Binary_sear鈥?/a> we see in the "That it is fast" part that it takes p = ln( N + 1 ) / ln( 2 ) iterations.





However in this case, every time when I run my algorithm, the array will be (N - 1) / (2^i) at iteration i, because it leaves the A[center] away, as we already know that A[center] != v at the start.





Note: you can safely assume that I used i "as" p because it seemed more logical that I use the variable i for the times of iteration.





If we evaluate my equation:





1 = (N - 1) / (2^i)


2^i = ( N - 1 )


i = ln( N - 1 ) / ln( 2 )





It's equal to 1 because Binsearch ends when the array is 1 large, cause the "center" of 1 is 1 itself.





I know that for the Big O notation and Theta notation and Big Omega notation it doesn't really matter cause both will result in O(log n). But I wonder if my "equation" is correct.





Can someone varify my equation and tell me what I did wrong if it's wrong?








If needed I will post my pseudo-code, but I do not think it is needed at this point.





Thank you for your help!|||Your equation seems correct. The reason for the difference is because in a typical binary search, the center will be put into either A or B. Whether you directly short-circuit it does not change the _maximum_ O (however it does change some constants, which is ultimately ignored; and also notice that the O for a regular binary chop algorithm is NOT amortized O(log(n)) but a predictable _always_ O(log(n)) ). (EDIT: at earlier edit I said a binary search that does middle short-circuiting have an _average_ O(log(n)); that statement is wrong, it should be _maximum_/amortized O(log(n)) )





Nevertheless the more serious mistake you made:





%26gt; If v %26gt; A[center], it will copy from A[center + 1]...





DON'T COPY. However you do it, the copying will be the one that takes most of the time and will make your algorithm clearly NOT log(n).





The O of YOUR pseudocode goes like this:


First pass: 1 (is it middle?) + 1 (less or greater?) + n/2 (copying)


Second pass: 1 + 1 + n/4


Third pass: 1 + 1 + n/8


...


Kth (Last) pass: 2 +n/2^K





Now what is K? A binary search on average does log2(n) iteration, the O of your copying binary search pseudocode would become


K * sum(2+n/2 + 2+n/4 ... + 2+n/2^K)


= log2(n)*sum(2+2+..+2 + n/2+n/4+...+n/2^K)


= log2(n)*sum(2K + n*(1/2+1/4+...+1/2^K)) // factor out n


= log2(n)*(2K+n*1) // as K goes to infinity, 1/2+1/4+..+1/2^K goes to 1


= log2(n)*(2*log2(n)+n) // K = log2(n)





= log2(n)*(log2(n)+n)


= log(n)^2 + n*log(n)





which is completely different O from a regular binary search that does not copy

No comments:

Post a Comment