JavaScript remove duplicates, get unique/distinct values from Array with ES6+ Set and spread
With ES6+, more developers should be leveraging built-ins than are using lodash functions.
This post will go through how to remove duplicates ie. get distinct/unique values from an Array using ES6 Set. This gives you a one-line implementation of lodash/underscore’s uniq
function:
const dedupe = list => [...new Set(list)];
This rest of the post goes through the how and why this works. It explores Sets, iterables and a bit of complexity theory with Big-O notation.
Table of Contents
What’s a set anyways?
a set is an abstract data type that can store unique values, without any particular order.
Let’s look at that definition and draw some conclusions:
- “store unique values” → Sets don’t contain duplicates
- “without any particular order”, we should stick to lists if we care about ordering
In practice, and within the lense of ES6 JavaScript (of which Sets are a part of):
The
Set
object lets you store unique values of any type, whether primitive values or object references.
new Set()
accepts an iterable as an argument (see MDN Set Syntax section),
a JavaScript Array is iterable (see the iterable protocol on MDN).
Sets are also iterable themselves, which means they can be spread, spread is the ...
inside the square brackets []
(see MDN Docs - javascript Spread).
Set applied to a dedupe function
So if we break down the dedupe function call:
const dedupe = list => [...new Set(list)];
- create a new Set using the passed
list
argument - spread the newly created Set into an Array
- return the array created in step 2 with an implicit arrow function return
Performance of a Set-based deduplication/unique function
Remy Sharp has a great post called “Reduce spread and the path to unique” on the performance implications of different approaches to creating unique arrays.
His post doesn’t actually go through our dedupe
function but his benchmarks include it.
The ES6 Set-based solution is somewhere in the middle in terms of performance, why use it?
Maybe you shouldn’t, especially if you’ve already loaded Lodash or some other utility library.
The advantages of the Set approach are twofold.
First of all you should be able to write the code inline (ie.
const userIds = [...new Set(maybeDupeIds)];
) and it’s still legible,
most JavaScript engineers will understand what’s happening there
(we’re creating a new de-duped Array from maybeDupeIds
).
The second reason is that the V8 team (V8 is the JavaScript engine that powers Chrome and Node.js) optimises for real-world usage, that means spread and Set’s performance is likely to keep improving as when/if they can drop more optimisations in.
In-depth ES6 Set use-cases
Keeping track of unique values
A Set by definition stores unique values. This is great if you might try to add the same entry multiple times.
const list = [1, 2, 2];
console.log(new Set(list)); // Set(2) {1, 2}
console.log([...new Set(list)]); // [1,2]
const values = new Set(list); // Set(2) {1, 2}
values.add(1); // this is essentially a no-op,
values.add(2); // so is this
console.log(values); // Set(2) {1, 2}
This is the property of sets we leveraged when re-implementing uniq
or dedupe
earlier.
Existence Checks
A Set is generally used for existence checks eg.
const existingUsers = new Set(['id1', 'id2'])
if (existingUsers.has(newUserId)) {
// update user
}
The computational complexity of that is O(1) (it’s constant time), that means no
matter how large existingUsers
gets, it will always take the same amount of time to
look up whether or not an id is in the Set.
You could do the same with an Array (list), with a higher performance cost:
const existingUsers = ['id1', 'id2']
if (existingUsers.includes(newUserId)) {
// update user
}
The complexity would now be O(n) (n being the length of the list), which means as the size of the list grows, the code would linearly get slower.
In the case where the newUserId
doesn’t exist, you would have to look through the full list before you can say it’s not there.
Best case, the newUserId
is the first element of the array so you can return false with only one lookup.
Existence check with objects: sets under the hood
The following actually shows a way one can visualise what a Set is doing under the hood:
const existingUsers = { 'id1': true, 'id2': true }
if (existingUsers[newUserId]) {
// update user
}
Duplication is can be dealt with using an Object approach eg.
const existingUsers = {}
['id1', 'id2', 'id2'].forEach(id => {
existingUsers[id] = true
})
// existingUsers is now: { 'id1': true, 'id2': true }
if (existingUsers[newUserId]) {
// update user
}
The existingUsers object wouldn’t have duplicate keys either.
Sets vs objects
The main difference between the object (map, associative array), is that the Set is directly iterable.
Which means it has a .forEach
method, you can spread it into an Array (amongst other things).
Set is easier to convert from/to a list because they’re both iterable.
An object isn’t iterable: it doesn’t have a .forEach
, to iterate through an object,
the first step is to convert it to a list first using Object.keys
,
Object.values
or Object.entries
.
Get The Jest Handbook (100 pages)
Take your JavaScript testing to the next level by learning the ins and outs of Jest, the top JavaScript testing library.
orJoin 1000s of developers learning about Enterprise-grade Node.js & JavaScript