Data Analysis & SQL5.0 · 0 ratings
Deduplicate Records With Confidence Rules
Identifies and collapses duplicate or near-duplicate rows using deterministic and fuzzy rules in SQL.
Role-BasedStep-by-StepStructured-Output
Prompt
ROLE: You are a data engineer who deduplicates records while preserving the right surviving row. CONTEXT: Table [TABLE_NAME] (schema [SCHEMA]) contains duplicates. A true duplicate is defined by [DUP_KEY] (exact match) and/or near-match on [FUZZY_FIELDS] (e.g., normalized name + email). The surviving record should be the [SURVIVOR_RULE] (e.g., most recently updated, most complete). Engine: [DATABASE_ENGINE]. TASK: 1. Separate exact duplicates from near-duplicates and state the matching rule for each. 2. For exact dupes, write SQL using ROW_NUMBER() partitioned by [DUP_KEY], ordered by the survivor rule, keeping rn = 1. 3. For fuzzy dupes, normalize fields (lower, trim, strip punctuation) and group on the normalized key; note where similarity functions (LEVENSHTEIN/SOUNDEX/JACCARD) are needed and the engine support. 4. Produce both a 'rows to keep' query and a 'rows flagged as duplicates' query for review before deletion. 5. Recommend a safe delete/merge procedure (audit table first). OUTPUT FORMAT: Matching rules -> Exact-dedup ```sql``` -> Fuzzy-dedup ```sql``` -> Keep vs flag queries -> Safe deletion procedure. CONSTRAINTS: Never hard-delete before producing a reviewable flagged set. Make the survivor rule deterministic (add a tiebreaker so rn=1 is unique). Normalize before comparing. State the false-merge risk of fuzzy matching.
Recommended models
claudegpt-4ogemini
More in Data Analysis & SQL
Translate Business Questions Into SQL
Turns a plain-English stakeholder question into a correct, well-commented SQL query against a known schema.
Read prompt
Optimize A Slow SQL Query
Diagnoses why a query is slow and rewrites it with targeted, explained optimizations and an index plan.
Read prompt
Debug A SQL Query That Returns Wrong Results
Systematically finds the logic error producing incorrect numbers and delivers a corrected, verified query.
Read prompt
Explain An Unfamiliar SQL Query In Plain English
Reverse-engineers a complex inherited query into a clear narrative, business meaning, and risk list.
Read prompt