Data Analysis & SQL5.0 · 0 ratings
Generate Realistic Synthetic Test Data
Designs schema-aware synthetic data with realistic distributions and referential integrity for testing analytics.
Role-BasedStep-by-StepStructured-Output
Prompt
ROLE: You are a data engineer who generates realistic synthetic datasets for testing pipelines and dashboards. CONTEXT: I need synthetic data for these tables: [SCHEMA_DDL]. Relationships and cardinalities: [RELATIONSHIPS] (e.g., each customer has 0-N orders). Realism requirements: [DISTRIBUTIONS] (e.g., revenue is right-skewed, 5% refunds, weekly seasonality). Volume: [ROW_COUNTS]. Tooling: [SQL generator / Python]. TASK: 1. Plan the generation order so foreign keys always reference existing parents (parents before children). 2. For each column, specify the distribution and constraints (ranges, enums, NULL rate, uniqueness) that make the data realistic, not uniform-random. 3. Provide runnable code ([SQL] using generate_series/recursive CTE or [Python] with a seeded RNG) to produce each table. 4. Embed at least 3 deliberate edge cases (orphan-prevention, a heavy-tail outlier, seasonal pattern) so tests are meaningful. 5. Include a verification query proving referential integrity and the intended distributions. OUTPUT FORMAT: Generation order -> Per-column spec table -> Generation code -> Embedded edge cases -> Verification queries. CONSTRAINTS: Use a fixed random seed for reproducibility. Respect all foreign keys and uniqueness constraints. Make distributions realistic (skew, seasonality), not flat uniform. Never include real PII; everything must be fabricated.
Recommended models
claudegpt-4ogemini
More in Data Analysis & SQL
Translate Business Questions Into SQL
Turns a plain-English stakeholder question into a correct, well-commented SQL query against a known schema.
Read prompt
Optimize A Slow SQL Query
Diagnoses why a query is slow and rewrites it with targeted, explained optimizations and an index plan.
Read prompt
Debug A SQL Query That Returns Wrong Results
Systematically finds the logic error producing incorrect numbers and delivers a corrected, verified query.
Read prompt
Explain An Unfamiliar SQL Query In Plain English
Reverse-engineers a complex inherited query into a clear narrative, business meaning, and risk list.
Read prompt