(Updated: )
/ #javascript #es6 #data structures 

JavaScript remove duplicates, get unique/distinct values from Array with ES6+ Set and spread

With ES6+, more developers should be leveraging built-ins than are using lodash functions.

This post will go through how to remove duplicates ie. get distinct/unique values from an Array using ES6 Set. This gives you a one-line implementation of lodash/underscore’s uniq function:

const dedupe = list => [...new Set(list)];

This rest of the post goes through the how and why this works. It explores Sets, iterables and a bit of complexity theory with Big-O notation.

Table of Contents

What’s a set anyways?

a set is an abstract data type that can store unique values, without any particular order.

Wikipedia - Set (abstract data type)

Let’s look at that definition and draw some conclusions:

  • “store unique values” → Sets don’t contain duplicates
  • “without any particular order”, we should stick to lists if we care about ordering

In practice, and within the lense of ES6 JavaScript (of which Sets are a part of):

The Set object lets you store unique values of any type, whether primitive values or object references.

MDN Set Documentation

new Set() accepts an iterable as an argument (see MDN Set Syntax section), a JavaScript Array is iterable (see the iterable protocol on MDN).

Sets are also iterable themselves, which means they can be spread, spread is the ... inside the square brackets [] (see MDN Docs - javascript Spread).

Set applied to a dedupe function

So if we break down the dedupe function call:

const dedupe = list => [...new Set(list)];
  1. create a new Set using the passed list argument
  2. spread the newly created Set into an Array
  3. return the array created in step 2 with an implicit arrow function return

Performance of a Set-based deduplication/unique function

Remy Sharp has a great post called “Reduce spread and the path to unique” on the performance implications of different approaches to creating unique arrays.

His post doesn’t actually go through our dedupe function but his benchmarks include it.

The ES6 Set-based solution is somewhere in the middle in terms of performance, why use it?

Maybe you shouldn’t, especially if you’ve already loaded Lodash or some other utility library.

The advantages of the Set approach are twofold.

First of all you should be able to write the code inline (ie. const userIds = [...new Set(maybeDupeIds)];) and it’s still legible, most JavaScript engineers will understand what’s happening there (we’re creating a new de-duped Array from maybeDupeIds).

The second reason is that the V8 team (V8 is the JavaScript engine that powers Chrome and Node.js) optimises for real-world usage, that means spread and Set’s performance is likely to keep improving as when/if they can drop more optimisations in.

In-depth ES6 Set use-cases

Keeping track of unique values

A Set by definition stores unique values. This is great if you might try to add the same entry multiple times.

const list = [1, 2, 2];
console.log(new Set(list)); // Set(2) {1, 2}
console.log([...new Set(list)]); // [1,2]


const values = new Set(list); // Set(2) {1, 2}
values.add(1); // this is essentially a no-op,
values.add(2); // so is this
console.log(values); // Set(2) {1, 2}

This is the property of sets we leveraged when re-implementing uniq or dedupe earlier.

Existence Checks

A Set is generally used for existence checks eg.


const existingUsers = new Set(['id1', 'id2'])

if (existingUsers.has(newUserId)) {
  // update user
}

The computational complexity of that is O(1) (it’s constant time), that means no matter how large existingUsers gets, it will always take the same amount of time to look up whether or not an id is in the Set.

You could do the same with an Array (list), with a higher performance cost:

const existingUsers = ['id1', 'id2']
if (existingUsers.includes(newUserId)) {
  // update user
}

The complexity would now be O(n) (n being the length of the list), which means as the size of the list grows, the code would linearly get slower.

In the case where the newUserId doesn’t exist, you would have to look through the full list before you can say it’s not there.

Best case, the newUserId is the first element of the array so you can return false with only one lookup.

Existence check with objects: sets under the hood

The following actually shows a way one can visualise what a Set is doing under the hood:

const existingUsers = { 'id1': true, 'id2': true }
if (existingUsers[newUserId]) {
  // update user
}

Duplication is can be dealt with using an Object approach eg.

const existingUsers = {}
['id1', 'id2', 'id2'].forEach(id => {
  existingUsers[id] = true
})
// existingUsers is now: { 'id1': true, 'id2': true }
if (existingUsers[newUserId]) {
  // update user
}

The existingUsers object wouldn’t have duplicate keys either.

Sets vs objects

The main difference between the object (map, associative array), is that the Set is directly iterable.

Which means it has a .forEach method, you can spread it into an Array (amongst other things).

Set is easier to convert from/to a list because they’re both iterable.

An object isn’t iterable: it doesn’t have a .forEach, to iterate through an object, the first step is to convert it to a list first using Object.keys, Object.values or Object.entries.

unsplash-logoNathalie Gouzée

Author

Hugo Di Francesco

Co-author of "Professional JavaScript", "Front-End Development Projects with Vue.js" with Packt, "The Jest Handbook" (self-published). Hugo runs the Code with Hugo website helping over 100,000 developers every month and holds an MEng in Mathematical Computation from University College London (UCL). He has used JavaScript extensively to create scalable and performant platforms at companies such as Canon, Elsevier and (currently) Eurostar.

Get The Jest Handbook (100 pages)

Take your JavaScript testing to the next level by learning the ins and outs of Jest, the top JavaScript testing library.