How to Remove Duplicates Based on One Column in Excel

Remove duplicates in Excel

Excel is very well known and very widely used for data filtering. It facilitates many aspects of data cleaning and filtering that can help you save big on your time.

A common way of data filtering in Excel is eliminating duplicate data. And at times, removing duplicate data is not that simple. This is particularly the case when you want to delete rows that contain duplicate data in the same column.

But no worries! This article should enlighten you on three common and easy ways of filtering out duplicates from your data based on one column. Continue reading to learn them all.

1. The ‘Remove Duplicate’ Function

This is the easiest way to sweep your worksheets clear of duplicate values based on one column. Take a look at the data in the example below.

Award data for different employees

The above data represents the details of different awards won by employees in different areas.

Now, to filter out a distinct list of employees who won an award in any department, we need to filter duplicates from the column containing employee names i.e., Column A.

To do so, follow the steps given below.

Step 1:

Select the data to be filtered out along with the headers.

Step 2:

Go to Data Tab > Remove Duplicates as shown below.

Accessing the removing duplicates option from the Data Tab

This opens the ‘Remove Duplicates’ Dialogues box as follows.

The ‘Remove Duplicates’ Dialogues box

Step 3:

From the ‘Remove Duplicates’ Dialogues box, check the option ‘My data has headers’. This is because we selected the data including the headers.

Checking the option ‘My data has headers’

Step 4:

Unselect all the column headers appearing under the ‘Remove Duplicates’ Dialogues box. Only select the header for the column where the values to be filtered appear.

Unchecking all the columns except for one

Step 5:

Click ‘Ok’. Excel will remove the duplicate rows from the selected column.

Excel removes the duplicate rows from the selected column.

The results are as follows.

Duplicate rows deleted based on one column

Must note how Column A now only consists of a single instance for each employee name.

All duplicate or triplicate instances have been deleted from Column A and their corresponding entries from Columns B and C have also been deleted.

Pro Tip:
Unlike in this example, if you want to remove duplicates from more than one column, you can still use this method. The only difference to be made is of checking the boxes for other column heads.

2. Conditional Formatting

If you have a small dataset and you just don’t want to spend quite some time applying complex functions to your data, you can choose to go manual all the way.

An easy way to filter out duplicates based on one column is to conditionally format the duplicate values, filter them out, and delete them. Check out the example below.

Step 1:

Considering the same example as above. To identify the duplicates in Column A, select the said column and take the following route.

Home Tab > Styles > Conditional Formatting > Highlight Cells Rules > Duplicate Values

Accessing the conditional formatting rules

Step 2:

Set up conditional formatting options in the window that then opens up and click ‘Ok’. This would highlight the duplicate rows from Column A as follows.

Excel highlights the duplicate values from Column A

Step 3:

Filter out the highlighted values by applying data filters to your data as follows.

Select the header of Column A > Go to Data Tab > Sort & filter > Filter

Applying data filters to the data

Step 4:

This would add a drop-down menu icon to the header of Column A. Click on it and select ‘Filter by Color’.

Filtering out highlighted values

This would filter out highlighted values from Column A as shown below.

Highlighted values

From these rows, select duplicate rows and delete them to keep back only one row for each value.

Deleting duplicate values

Now remove filters and conditional formatting from your data to have the final look of your data without any duplicate values.

Data free of any duplicate values

3. Running a VBA Code

Both the above methods may seem easily accessible and convenient. However, if you have regular data-filtering to do, these methods may prove to be hectic and time-consuming. To counter the said problem, you may want to run a VBA Code.

You can set up a VBA code and add it to your Quick Access Toolbar to be able to readily access it only with a single click.

But you’re not a coder? No need to worry. You only need to copy-paste the following code, and you are done.

Sub Delete_duplicate_rows()

Dim Rng As Range

Set Rng = Selection

Rng.RemoveDuplicates Columns:=Array(1), Header:=xlYes

End Sub

Note: The above code is specifically built for the instance above, whereby in the code we have specified that the data contains headers and that data filtration is to be done based on Column 1 i.e. Column A. If you want to filter your data based on any other column, say column C, you may alter the second last line of the above code as Columns:=Array (3).

To run the code in your Excel, here are some easy steps to follow.

Step 1:

Go to the Developer tab from your Ribbon.

Pro Tip: If you find it nowhere on the ribbon, there’s nothing to fret about. 

Go to File > Options > Customize Ribbon > Check Developer’s Tab as shown below
Adding Developer’s tab to the Ribbon

Step 2:

From the developer’s tab, go to Visual Basic, and a VBA Window will launch as follows.

Launching a VBA Window

From the above VBA window, go to Insert > New Module. In the new module that opens up, write the code specified above and close the VBA window.

Writing the code into VBA

Step 3:

This creates a macro that is now ready to use. To access it, take the following route.

Developer tab > Macros

Accessing the Macros in Excel

From the list of Macros that then opens, select the ‘Delete_Duplicate_Rows’ macro and click Run. Excel would run the code, and the duplicates would be removed, as shown below.

Duplicate rows deleted by running the VBA Code

Pro Tip: Deleting duplicate rows by running a VBA code has no turn backs or undo options. Be very careful before you run this code and preferably keep a backup of your data to save yourself of any undue data loss.

Conclusion

Big Data is a common thing in the revolutionizing tech world. With more and more data coming in, we need more sophisticated and better techniques to sort and analyze data.

Filtering data of any duplicate values is one of those most-wanted techniques, and doing it in Excel can save you much time. The article talks about three ways how you can engineer your data in Excel – practice them all and learn them better.

Author: Zeynep

I am a data scientist and Excel expert with a passion for turning complex data into meaningful insights. I have a deep understanding of statistical analysis, data modeling, and machine learning techniques. I am skilled at presenting data in a way that is easy to understand for both technical and non-technical audiences.

Leave a Reply

Your email address will not be published. Required fields are marked *