Introduction to Data Structures and Algorithms
Have you ever wondered what happens behind the scenes when you create an array? or what happens when you search google.com in your browser? How do we design scalable systems that can handle requests from 1 million concurrent users? How do we ensure that our systems are highly available?
These are questions that I pondered upon for a while and I’ve finally decided to dig deep and search for the answers.
In the next few articles, I will be covering core computer science fundamentals such as data structures, replication, load balancing, rate limiting, UI/UX and much more.
Before I dive deep into how we can scale a system, guarantee high availability, or ensure that our systems are resilient, fault tolerant and cost effective. It’s only right that we define what data structures and algorithms are; and the reasons why they are (in my opinion) a prerequisite for designing optimal systems.
Ok, so what are data structures?
So, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
Great, so what does that actually mean?
Don’t worry, I asked the same question.
There are different types of data structures e.g. Arrays, Linked Lists, Queues, Stacks, Graphs and Trees. Each of which has its own use case. I will dive deep into each one in separate articles.
We can use data structures to store things such as:
- Numbers, names, packets, pictures, people, stations, locations and much more
The relations among them e.g:
- Routes, friendships, connections, order and much more.
Wondering where algorithms fit into all of this?
An algorithm is defined as a process for solving a computational problem. In simple terms, these are a set of instructions which describe how something should be done.
Earlier we defined data structures as:
a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
Algorithms would probably fit into the latter part of this definition.
Cool, let’s take a step back for a second and completely forget about computer science and mathematics.
We can think of an algorithm as a recipe — when we cook food, we tend to follow a set of steps to put some ingredients together. Once we carry out the process — we produce an output aka our delicious meal.
When designing algorithms there are some core components we need to specify:
- inputs — ingredients
- process — recipe or method
- output — what happens at the end? what do we produce?
Algorithms in computing follow the exact same process — we define a set of inputs, operations we can apply to these inputs and an output we should expect as a result of this applied function.
It’s important to note that algorithms are used in all areas of computing e.g. Data Science, Artificial Intelligence, DevOps and API Development.
Ok sounds cool, but can you give some actual examples of where we’d use algorithms?
In the real world, you may store a company’s monthly revenue or growth as a list of numbers. And you may be tasked with finding the best quarter within a 10 year period. How do you do this? Would you brute force an answer or would you perhaps come up with an efficient process for carrying out this search?
In the world of image processing — images are typically processed into a collection of values that a computer can understand. You may be tasked with finding some information about this image or potentially even editing it e.g. via Photoshop.
Images are typically stored as a sequence of numbers that a computer can understand. Imagine if you were tasked with finding the brightest area in an astronomical image.
In order to find the brightest region, you can either:
- Manually search through the whole array (collection of values, starting at index 0, stored in indices)
- Or you can use an algorithm like Kadane’s to identify the maximum subarray within the array to find the brightest area in the image.
I’ll give you another example/situation, imagine you had to search for a specific page number or chapter in a 20000 page document (assuming chapters are titled in alphabetical order and page numbers are in ascending order).
You can try to brute force it and and search for your thing page by page which could take up a lot of time.
Alternatively, you could go to page 10000, check for your chapter. If it is not there and your title comes after the ones listed on page 10000 in the alphabet, then you could check page 15000. If your title comes before those on page 10000 in the alphabet, then you could search page 5000.
Continue the above process and you will find your title a lot faster than you would’ve trying to brute force it. We’ve just demonstrated that we can use two different approaches to come to the same result.
These were simple examples, but they highlights how we can capitalise on the strengths of data structures and algorithms to come up with optimal solutions.
Btw, congratulations, you have used a binary search algorithm 😉.
Okay, but how can we measure the difference in performance between different algorithms?
Right, so we’re now on the same page. You understand the fundamental importance of what data structures and algorithms are and why we need them. However, you’re probably wondering how we can determine if an algorithm is more efficient than another.
We use a process called complexity analysis.
Firstly, complexity analysis in computing refers to the process of determining the speed of an algorithm. We use 2 key metrics — time and space:
Time complexity is simply measures the speed of an algorithm with respect to the size of the input. In mathematics, this is known as asymptotic analysis.
Space complexity determines how much auxiliary space an algorithm uses. By auxiliary we mean extra/unnecessary space.
Both of which are measured in Big O Notation.
Data structures and algorithms help us understand the nature behind problems at a deeper level. Hence, enabling us to come up with better solutions that make the world a better place. From searching algorithms to compression techniques — we can use our knowledge of data structures / algorithms to arrive at optimal solutions.
If we strengthen our understanding of memory, time complexity and space complexity — we will be able to discuss trade-offs between different solutions. This unlocks a transferable skill which essentially allows individuals to design highly scalable, fault tolerant and cost effective systems.
Ultimately, this ensures that we can deliver real value to our customers and users whilst enabling them meet their goals way more efficiently.