You've heard of Binary search before and may be wondering what all the hype is. Binary is the most widely used searching algorithm in sorted lists, arrays, sets, etc. Why? Because of it's time complexity and divide and conquer approach.
You'd usually choose Binary search over Naive search because it is so much faster to do so. What do I mean by Naive searching? Using this algorithm basically checks each element one by one for a match. It's the most obvious solution when one is asked a problem. Sure, it will get the job done, but is it the best solution?
For example, let's say you have a sorted list and you are wanting to check if the number 10 is in the list.
By using a naive search algorithm, you would iterate through each and every item and compare that to your target. In our case, the number 10 is our target. Our function would look something like below. If the item matches our target, we will return the index of our target element. If our target is not found in the list, we will return -1.
Pretty simple, right?
But what if we have a list with 3000 elements? Do we really want to sit through comparing each element to our target? One by one? I know I wouldn't. There has to be a better way, a faster way, that can get the job done.
With Binary search, we can use the divide and conquer approach on our sorted list. Let's explore what that means. The first thing we want to do is find the midpoint of our list. This is basically the length of our list divided by two, then rounded down. We find our midpoint, ask if it is equal to our target, and if it is, we return that index like we did before.
If the midpoint is less than our target, we know it has to be on the left side of the list, right? Because remember- we are working with a sorted list, so the numbers in our list are sorted from least to greatest. We can disregard everything else on the right.
If the midpoint is greater than our element, then we know it has to be on the right side of the list and we can disregard all that is on the left. This is basically just chopping off half of the list and giving us a new section of it to work with. We know now that if our target is in the list, it will be in that section! Super convenient.
In each section of our list, we will have a high and a low along with our midpoint. We can add our high and our low together, divide by 2 and round down to get our midpoint as well. The high and low variables are the highest and lowest indexes we can check in each section. We want to start off with them being 'None', because when we first look at our list, we haven't determined what part of the list is the section we will be using, or what exactly is in our list.
So, what if our target is not in our list? Hypothetically, our high should never be less than our low, right? So if that is ever the case, we know our target is not in our list, so we can just go ahead and return -1.
This is the divide and conquer approach that makes using a binary search algorithm so beneficial. Once we figure out the midpoint and what section of the list our target is in, we repeat this approach with that section. We find another midpoint, compare it to our target, and just repeat the process until we either have found our element, or we find that our element does not exist in our list. How cool is that.
If we print out the results of both our binary and naive function, we should see that each one returns index 3. Here is what the code should look like:
When we print the results of both functions, we will see that they both produce the same result, and both indexes show up basically at the same time in our terminal. But this doesn't really show us that Binary Search is faster, right? Let's time both types of searches.
So here we have set the length to 10000 and made our sorted list a set to avoid duplicates. We have a while loop going that basically says if the length of our list is less than the length we provided, while True, keep executing everything in the code block. We are basically lopping through -30000 to 30000. That's a lot of items to check. Can you imaging checking each item one by one?
Notice that we imported the time module to keep track of the total time it takes to perform both types of searches. This will give us the current time when we start and end:
We will be looking for the target in our sorted list, and performing a naive and binary search. You can see we are passing in our list and our target to our functions. We will get the total time it takes to search by substracting our start time from our end time. We can see the total time it takes per iteration by taking the total time and dividing it by it's length.
We will do the exact same thing for our binary search function, so once we add that in, the code should look like so:
Now when we run our program, we will see something similar to this:
Naive Search Time: 0.00032645535469055174 seconds.
Binary Search Time: 5.886459350585937e-06 seconds.
Can you see the difference and why Binary search is a much more effective way to search through sorted lists? I hope this helps some! Feel free to explore more on your own and tweak this around to get a good feel of it.
Have fun and happy coding!