Duplicate rows:

SQL tables can have duplicate rows, which often leads to data inconsistencies. These problems can be overcome with a primary key, but sometimes when these rules are not followed or an exception arises, the problem becomes even more frightening. It is best practice to use appropriate keys and constraints to eliminate the risk of duplicate rows. To clear duplicate data, you need to use some special procedures.

Delete duplicate rows in SQL

In a SQL Server table, duplicate records can be a serious problem. Duplicate data can lead to order processing multiple times, incorrect reporting results, and more. SQL Server has many options for working with duplicate table entries, depending on the circumstances. They are:

  • Unique restrictions in table
  • There are no unique restrictions in the table

Unique restrictions in table

According to Delete Duplicate Rows in SQL, a unique index table can use the index to identify duplicate data and then delete the duplicate records. Self-aggregation, sorting the data by maximum value, using the RANK function or using NOT IN logic are used to perform identification.

There are no unique restrictions in the table

Under Deleting Duplicate Rows in SQL is a bit difficult for tables without a special index. The ROW NUMBER () function can be used in conjunction with a common table expression (CTE) to sort data and then remove duplicate records.

SQL Delete duplicate rows using Group By and Having Clause

According to Delete Duplicate Rows in SQL, you must use the SQL GROUP BY clause to find duplicate rows. The COUNT function can be used to check the occurrence of a row using the Group by clause, which groups data according to the given columns.

code:

Creating and inserting data into the table:

First, you need to create a table as follows:

CREATE TABLE stud (Regno integer, Name text, Marks integer);

/ * Create multiple records in this table * /

INSERT INTO STUD VALUES (1, ‘Tom’, 77);

INSERT INTO STUD VALUES (2, ‘Lucy’, 78);

INSERT INTO STUD VALUES (3, ‘Frank’, 89);

INSERT INTO STUD VALUES (4, ‘Jane’, 98);

INSERT INTO STUD VALUES (5, ‘Robert’, 78);

INSERT INTO STUD VALUES (3, ‘Frank’, 89);

INSERT INTO STUD VALUES (5, ‘Robert’, 78);

INSERT INTO STUD VALUES (4, ‘Jane’, 98);

COMMIT;

Delete DuplicateRowsinSQL_1

FREE course: Introduction to data analysis

The mastery of the basics of data analysis is one click away!Start learning

FREE course: Introduction to data analysis

Extract and identify duplicate rows in SQL

/ * Show all records from the table * /

SELECT * FROM hairpin;

Entrance

Delete duplicate rows in SQL_2

Exit

Delete DuplicateRowsinSQL_3.

The table shown above, consisting of duplicate data, ie duplicate data, can be deleted using a clause group as follows:

Delete duplicate rows from the table using the Group by and Having clause

Under SQL, deleting duplicate rows in SQL is done with the Group by and Having clause. It is done as follows:

code:

choose name, grades, grade, number

such as cnt from a tribal group by name, estimates, count count

> 1;” width =”602″ height =”77″ class =”lazy” src =”https://www.simplilearn.com/ice9/free_resources_article_thumb/Delete_Duplicate_Rows_In_SQL/DeletingDuplicateRowsinSQL_4.png”/></picture></p>
<h3>Entrance:</h3>
<p style=Delete duplicate rows in SQL_4

Exit:

Delete duplicate rows in SQL_5

SQL Delete duplicate rows using common table expressions (CTE)

General tabular expression

When deleting duplicate rows in SQL, the abbreviation CTE means “common table expression”. This is a named temporary result set created by a simple query and specified within the scope of a single SELECT, INSERT, UPDATE, or DELETE statement. You can write complex recursive queries using CTE. There is much more traction than temporary tables. [CTEName]

Syntax

S [tablename] Like [condition]

(Select col1, col2, col3 from [CTEName]

where

Select col1, col2, col3 from

Procedure for removing duplicate rows using CTE

First, you need to create an Employ_DB script table on SQL Server and run it in the required database.

Creating the table

Create a table Employ_DB (emp_no number (10), emp_name varchar (20), emp_address varchar (25), emp_eoj date);

Entrance:

Delete duplicate rows in SQL_6

Exit

Delete duplicate rows in SQL_7Free course: Basics of business analysis

Master the basics of business analysis

Sign up now

Free course: Basics of business analysis

Insert data into the table

Once the table is created, you now need to insert some entries in it, including some duplicates.

code:

Insert in Employ_DB values ​​(11, 'Mohith', 'tokya', '12-may-2000');

Entrance:

Delete duplicate rows in SQL_8.

Exit:

(8) inserted rows

Search data from the table

Then run the previous script and use the following query to look up the data in the table.

To print Employ_EB records in a sorted list, use the field name and employee address.

code:

Select * from Employ_DB order by emp_name, emp_address;

Entrance:

Delete duplicate rows in SQL_9

Exit:

Delete DuplicateRowsinSQL_10

Delete duplicate rows in SQL using CTE

According to Delete Duplicate Rows in SQL, the above contains many of the duplicate records that were fixed in the table. To number duplicate city records by state, you will use the CTE line number function (). CTE can generate a temporary set of results that you can use to remove redundant records from the actual one-question table.

code:

With CTE as (Select emp_no, emp_name, row_number ()

Above (emp_no division by emp_no line) as work_number From Employ_DB) Select * from CTE order by emp_name, emp_no;

Delete DuplicateRowsinSQL_11

Finally, delete the duplicate record using the Common Type expression as follows:

code:

With CTE as (Select emp_no, emp_name, row_number ()

Above (emp_no division by emp_no line) as work_number

From Employ_DB)

Choose * from CTE, where number of employees> 1 order by emp_no;

Delete DuplicateRowsinSQL_12

According to Delete Duplicate Rows in SQL, in the table above only two of the records are duplicated based on emp_no.  So, now you will delete this duplicate record from the Employ_DB table using the following code.

Delete duplicate rows in SQL_14.Free course: Introduction to SQL

Learn MySQL, PostgreSQL and SQL Server

Sign up now

Free course: Introduction to SQL

SQL Rank Function Delete duplicate rows

According to Delete Duplicate Rows in SQL, you can also use the SQL RANK function to get rid of duplicate rows. Regardless of the duplicate rows, the SQL RANK function returns a unique row identifier for each row.

  1. You must use aggregate functions such as Max, Min and AVG to perform calculations on data. Using these functions, you get one output line. SQL RANK features in SQL Server allow you to rank individual fields based on categorizations. For each participating row, it returns a summary value. RANK functions in SQL are often called window functions.
  2. You can implement rank functions in four ways.
  3. ROW_NUMBER ()
  4. rank ()

DENSE_RANK ()

NTILE ()

According to Delete Duplicate Rows in SQL, the PARTITION BY clause is used with the RANK function in the following query. The PARTITION BY clause divides the data into subsets for the listed columns and assigns a rank to each partition.

code:

Select *,

RANK () OVER (PARTITION BY Animal_id, Animal_name ORDER BY sno DESC) rank

From Animals) T on A.sno = T.sno

Where T.Rank> 1

Change the table Animals

Drop column sno

Choose * from Animals

Delete duplicate rows in SQL_15.

Delete duplicate rows in SQL_16 You can see in the screenshot that a row with a rank greater than one needs to be removed. Let’s use the following question to get rid of these lines.

Delete DuplicateRowsinSQL_17.

Postgraduate program in business analysis

In partnership with Purdue University

SEE THE COURSE

Postgraduate program in business analysis

Use the SSIS package to SQL delete duplicate rows

The SQL Server Integration Service includes several transformations and operators that help administrators and developers reduce manual labor and optimize tasks. The SSIS package can also be used to delete duplicate rows from an SQL table.

Use an SSIS sort operator to remove duplicate rows

The sort operator can be used to sort values ​​in an SQL table.  You may be wondering how sorting data will help you get rid of duplicate rows.  Here's how.

Delete DuplicateRowsinSQL_18

Delete duplicate rows in SQL_19.

Delete DuplicateRowsinSQL_20

Delete DuplicateRowsinSQL_21

Delete duplicate rows in SQL_22.

To demonstrate this challenge, create an SSIS kit.

First, create a new integration kit for SQL Server Data Tools. Add an OLE DB source link to the new package.

Gain experience in the latest business analysis tools and techniques with the Business Analyst Certification Program. Sign up now!

Conclusion

In this article, you looked at how to remove duplicate rows in SQL using T-SQL, CTE, and the SSIS suite, among other methods. You are free to use any approach that makes you feel most at ease. However, direct application of these procedures and packaging of results is not recommended. You need to run your tests in a less demanding environment.

To become an expert in the SQL programming language, join our Simplilearn SQL Certification Training course. This SQL Certification course gives you everything you need to get started with SQL databases and incorporate them into your applications. Learn how to properly organize your database, write effective SQL statements and clauses, and maintain your SQL database for scalability. This course includes comprehensive coverage of the basics of SQL, comprehensive coverage of all relevant query tools and SQL commands, an industry-recognized certificate of completion, and access to self-study.

Despite the fact that SQL is an old language, it is still very useful today as companies around the world collect huge amounts of data. SQL is one of the most sought after engineering skills and mastering it will significantly improve your resume.

Database and connection management, query tools and SQL commands, aggregate functions, clause grouping, tables and joins, subqueries, data manipulation, transaction control, views and procedures are among the skills covered.

Start learning the skills you want most today for free. This course emphasizes the development of good basic skills for future career development. Specialists in the field will teach you. Get access to 300+ skills ready to work in the most sought after areas today. Learn from anywhere, on any laptop while working or studying. Check out the free courses here. You can find free tutorials on various career paths, salaries, interview tips and more. Do you have any questions for us? Leave them in the comments section of this article and our experts will contact them as soon as possible!

https://www.simplilearn.com/tutorials/sql-tutorial/delete-duplicate-rows-in-sql

Previous articleThe Leyden project gets the green light
Next articleXORDdos malware is targeted at Linux devices