a mindfire for the new millenium

● deduping in MS Access 2000 posted: Wed 2011-08-24 19:51:04 tags: databases

For lots of list-cleaning purposes, a spreadsheet is preferable to a database. Up to 2000, MS Excel was limited to around 65,500 rows, and even now OpenOffice Calc is limited to about 1 million rows; so if you need to operate on lists that exceed your spreadsheet limits, then you need to work in a true database manager application, such as MS Access.

In Excel, you might sort your list by the column you were deduping, and use a formula to increment a count or simply show a flag word if the current record matched the previous record.

=if(a2=a1,"dupe","")

Then copy/paste the formula results column as values, re-sort the list by the count or flag, and delete all flagged rows (or rows with a dupe-count value greater than one).

Access versions after 2000 included dedupe wizards, but some of us are still getting value from our old OFC2K licenses. So in Access 2000, you can use the Group By qualifier to dedupe on a given column, and the First qualifier on other columns to tell Access which values it can discard when grouping. The Access-SQL of such a query (implemented as a make-table query, as evidenced by the INTO clause) would look something like this:

SELECT First(contacts.id) AS FirstOfid, contacts.email, First(contacts.name1)
AS FirstOfname1, First(contacts.name2) AS FirstOfname2
INTO deduped_contacts
FROM contacts
GROUP BY contacts.email
ORDER BY First(contacts.id);

Using the TOP clause and subquery techniques to select only chunks from the list is left as an exercise to the reader.