Search

Introduction

In this lesson we will focus on using JavaScript tools to solve the problem of handling search in a web application. Although it may seem like search is a straightforward problem, there are many different ways to approach and implement this feature. We will look at some of the most common approaches and how to implement them using JavaScript.

Fundamentals of Searching

A perfect search feature always returns the correct information for every question, quickly and efficiently. As a developer, balancing these concerns can be very challenging and a poor search experience can be a major source of frustration for users.

A search bar is also the first place that a user will “speak” to the website owner. You may not be aware, but all search queries are logged and often used for SEO refinement or even new product development. If users enjoy their search experience, i.e. they found what they were looking for quickly, they are more likely to use it again.

Search Protocol

In order to implement a client-side search feature, there are some steps that will be common to any approach:

  1. Investigating the “shape” of the data being searched. For example, if you are searching a list of books, you may choose to search the title, tags but not the contents of the book itself. These decisions will be different in one project to another depending on the purpose of the application.
  2. Setting up a user interface. This is most commonly done using an HTML search input element inside a form, with an event listener to handle the user’s input. This could be a submission handler or an input handler depending on the UX approach desired.
  3. Implementing the search algorithm. This is the most complex part of the process and will be the focus of this lesson. Search algorithms can be quite simple, as we will see, but they can also grow in complexity depending on the requirements of the application.
  4. Displaying the results. This is no different to displaying data from a remote source, but this does form part of the search experience. It doesn’t matter how effective the algorithm is, if the results are not displayed in a way that is easy to understand.

Search Algorithms

There are many different algorithms that can be used to implement search. The most common are:

  • Linear Search: This is the simplest search algorithm. It is also the least efficient. It is used to search through a list of items one by one until the desired item is found. This is the most common approach for small lists of data.
  • Binary Search: This is a more complex algorithm that is used to search through a sorted list of items. It is much more efficient than linear search, but it requires the data to be sorted first. This is the most common approach for large lists of data.
  • Fuzzy Search: This is a more complex algorithm that is used to search through a list of items for a partial match. It is used to find items that are similar to the search query. This is the most common approach for large lists of data where the user may not know the exact name of the item they are searching for.
  • Regular Expression Search: This is a more complex algorithm that is used to search through a list of items for a partial match. It is used to find items that are similar to the search query. This is the most common approach for large lists of data where the user may not know the exact name of the item they are searching for.

We will cover these in more detail in the second JavaScript course.

Linear Search

When we use array methods such as find or filter, we are using a linear search algorithm. This makes it almost effortless to implement in JavaScript:

javascript
	console.time("linear-search");
	const items = await getMovies();
	const query = "The Matrix";
	const result = items.find((item) => item.title === query);
	console.timeEnd("linear-search");

In this example, we have employed a linear search algorithm to search through an array of objects and return the results that match the query. This is a very simple approach, but it is not very efficient. With a small list, the delay is unlikely to be noticeable, but with a large list, the delay could be significant.

Typically we work with small arrays in the front-end, making this approach sufficient in many cases. When the front-end is required to handle many thousands, tens of thousands or even millions of items, this would be a clear sign that responsibility for search should be moved to the back-end.

If a search function takes more than 100ms to complete, it is likely that the user will notice a delay, and this will vary based on the hardware that they are using. It’s best to avoid these situations entirely than implement advanced search algorithms from scratch.

Fuzziness

When we are building a search feature we can choose to implement a degree of fuzziness, the opposite of which would be exactness. For example, a search for “man u” would not return results for “Manchester United” without some degree of fuzziness. This is because the string "man u" does not exactly match the string "Manchester United".

We can use methods such as includes to check if one string exists inside another. This is a simple approach to introducing some degree of fuzziness into our search algorithm.

javascript
	items.filter(item => {
	  return item.title.includes(query);
	})

However, this will not return a match for the example above either! That is because “Man U” is not literally contained in “Manchester United”, even though every letter is present.

In order to solve this, we would need to complicate the process by searching each word one by one:

javascript
	items.filter(item => {
	  return query.split(" ").every(word => {
	    return item.title.includes(word);
	  });
	});

Unfortunately, this will not return our intended result either - although we are one step closer. This is because includes is case sensitive, so the uppercase versions of each character are considered a different symbol.

Once again, we can fix this problem with additional complexity:

javascript
	items.filter(item => {
	  return query.split(" ").every(word => {
	    return item.title.toLowerCase().includes(word.toLowerCase());
	  });
	});

By forcing both the search terms and the item title to lowercase, we can be sure that there will not be issues with case.

The more complex a search procedure gets, the more difficult it is to predict its behaviour. Although users may appreciate an advanced search, if it is not correctly implemented it may detract from their experience.

What to Search

When we are searching a dataset, there are many ways to investigate each item to determine if it should be a match. For example, if we are searching movies - it may not make sense to use all of the properties of the movie to determine if it is a match.

Likewise, some properties may not be so easy to search as a string - and may require some transformation before it is suitable to be searched. For example, movie tags may be stored in an array. In order to include this in our search procedure, we must first convert it to a string.

javascript
	items.filter(item => {
	  return item.tags.join(", ").includes(query);
	})

We could achieve this another way, by using array methods on the tag property itself:

javascript
	items.filter(item => {
	  return item.tags.some(tag => tag.includes(query));
	})

In many cases we need to search more than one property at the same time. For example, searching a blog site may use categories, author information, tags and the content of the article itself. In this case, we may need to combine multiple search results into a single result.

javascript
	const results = items.filter(item => {
	  const content = [
	    item.title,
	    item.author,
	    item.content,
	    item.tags.join(", ")
	  ];
	
	  return content.map(value => value.toLocaleLowerCase()).some(value => {
	    return value.includes(query.toLocaleLowerCase());
	  });
	})

In this example, we have combined all of the searchable properties into a single array. We have then converted each value to lowercase to ensure that the search is case insensitive. Finally, we have used some to determine if any of the values in the array contain the search query.

Search Score

So far, we have looked at how to determine if an item is a match or not. However, we have not considered how to determine how good of a match it is. For example, a blog post that gets 4 matches on the search query is very likely to be what the user is looking for. However, a blog post that only gets 1 match may not be what the user is looking for and may be a less relevant result.

In order to solve this problem, we can assign a score to each item that is a match. We can then sort the results by their score, so that the most relevant results are displayed first.

javascript
	const results = items.map(item => {
	  const content = [
	    item.title,
	    item.author,
	    item.content,
	    item.tags.join(", ")
	  ];
	
	  const score = content.map(value => value.toLocaleLowerCase()).reduce(
	    (score, value) => {
	    if (value.includes(query.toLocaleLowerCase())) {
	      return score + 1;
	    }
	
	    return score;
	  }, 0);
	
	  return {
	    ...item,
	    score
	  };
	}).sort((a, b) => b.score - a.score);

In this example, we have added a reducer to loop through each property value and determine if it is a match. If it is a match, we add 1 to the score. If it is not a match, we do not add anything to the score. Finally, we sort the results by their score so that the best matches are at the beginning.

How the search score is calculated will depend on the priorities of the developer, for example, one type of content may be more important and therefore worth more points.

javascript
	const content = [
	  [item.title, 1],
	  [item.author, 0.5],
	  [item.content, 0.25],
	  [item.tags.join(", "), 1]
	];
	
	const score = content.map(value => {
	  return value.toLocaleLowerCase()
	}).reduce((score, [value, weight]) => {
	  if (value.includes(query.toLocaleLowerCase())) {
	    return score + weight;
	  }
	
	  return score;
	}, 0);

Now, the more important properties are worth more points. This means that a match in the title is worth more than a match in the content.

Refactoring

We have covered a lot of complexity in this lesson, but we have not considered how to refactor this code to make it more readable and maintainable. Let’s take a look at how we can refactor this code to make it more readable.

Extracting Functions

The first thing we can do is extract the search logic into a function. This will make it easier to read and understand the code.

javascript
	// Checks a single movie object for a match
	export function movieMatches(query, movie) {
	  const content = [
	    movie.title,
	    movie.author,
	    movie.content,
	    movie.tags.join(", ")
	  ];
	
	  return content.map(value => value.toLocaleLowerCase()).some(value => {
	    return value.includes(query.toLocaleLowerCase());
	  });
	}
	
	// Filters a movie array for matches
	export function searchMovies(query, movies) {
	  return movies.filter(movie => movieMatches(query, movie));
	}

By isolating and exporting this function, we can use it wherever we need to without needing to rewrite this complex logic.

Now we can easily implement this in our application when needed:

javascript
	import { searchMovies } from "./search.js";
	
	async function setup() {
	  const movies = await getMovies();
	  const results = searchMovies("The Matrix", movies);
	  // renderResults(results);
	}

By reusing these functions, we ensure consistency in the behaviour of search across our application. It also makes the process of adding event listeners more straightforward.

Input Handling

When we are collecting the user’s input - there are different approaches depending on which kind of experience is intended. For example, if the user is expected to type a full query before the search is performed, we would use a submission handler. However, if the user is expected to type a partial query and see results as they type, we would use an input handler.

Many modern sites employ an input handler to give instant feedback to the user, however this does not make it the correct choice in all circumstances. Additionally, there are considerations such as debouncing that are introduced with this approach that can make the implementation more complex.

Let’s take a look at both examples side by side:

html
	<form name="search">
	  <input type="search" name="query" />
	  <button type="submit">Search</button>
	</form>
javascript
	const form = document.forms.search;
	
	form.addEventListener("submit", (event) => {
	  event.preventDefault();
	  const query = form.query.value;
	  const results = searchMovies(query, movies);
	  renderResults(results);
	});

In the example above, the user only sees the results of their query when they submit the form. Let’s look at the alternative approach:

javascript
	const form = document.forms.search;
	
	form.addEventListener("input", (event) => {
	  const query = form.query.value;
	  const results = searchMovies(query, movies);
	  renderResults(results);
	});

In this example, the user sees the results of their query as they type. This is a more complex approach, but it can be more user friendly. With a large data set or a complex search process, this can slow the application down.

Mitigation

In order to mitigate the performance issues that can be introduced by an input handler, we can use several techniques to delay or slow the rate that this function will be running. This is known as debouncing and throttling.

Firstly, we can make a simple change to prevent the function from running until the user has entered enough characters to make a meaningful query:

javascript
	const form = document.forms.search;
	
	form.addEventListener("input", (event) => {
	  const query = form.query.value;
	
	  if (query.length < 3) {
	    return;
	  }
	
	  const results = searchMovies(query, movies);
	  renderResults(results);
	});

This may or may not suit the needs of the application, but it is a simple way to prevent the function from running too often.

Debouncing

Another more complex approach is to use a debounce function to delay the execution of the function until the user has stopped typing. This is a more complex approach, but it can be more user friendly. With a large data set or a complex search process, this can slow the application down.

javascript
	export function debounce(func, delay) {
	  let timer;
	  return function() {
	    const context = this;
	    const args = arguments;
	    clearTimeout(timer);
	    timer = setTimeout(() => {
	      func.apply(context, args);
	    }, delay);
	  };
	}
javascript
	import { debounce } from "./debounce.js";
	
	const form = document.forms.search;
	
	form.addEventListener("input", debounce((event) => {
	  const query = form.query.value;
	  const results = searchMovies(query, movies);
	  renderResults(results);
	}, 300));

In this example, we have added a debounce function that will delay the function from being executed by 300ms and ensure that it is only executed once.

Later in the program we will look at dedicated libraries designed to handle this kind of functionality in more detail.

Lesson Task

Using data from the API below, implement a dynamic search feature that shows a list of results based on the user’s query. The search should be case insensitive and should search the title, author and genre of each book.

Documentation - https://docs.noroff.dev/docs/v2/basic/books API - https://v2.api.noroff.dev/books

Additional Goals

Allow the user to search each Author’s books by the description and year it was published.