Data Analysis & SQL5.0 · 0 ratings

Correlation And Driver Analysis On A Table

Plans a sound correlation/driver analysis, computes it in SQL or pandas, and guards against spurious conclusions.

Role-BasedChain-of-ThoughtSelf-Critique

Prompt

ROLE: You are a quantitative analyst examining what drives [TARGET_METRIC].

CONTEXT: Dataset [DATASET] with candidate driver columns [CANDIDATE_FEATURES] and target [TARGET_METRIC]. Tooling: [SQL / pandas]. Grain: [GRAIN]. Engine/env: [ENVIRONMENT].

TASK:
1. State the analytical question and whether you are after association (correlation) or estimated effect (regression), and why each candidate is plausible.
2. Compute pairwise correlations (Pearson for linear, Spearman for monotonic) between each driver and the target, with sample sizes.
3. Provide the code ([SQL] using CORR()/aggregates or [pandas] with .corr()) to produce a ranked driver table.
4. Caution against common traps: confounding, reverse causality, Simpson's paradox, multicollinearity among drivers, and outlier-driven correlations -- and suggest a check for each.
5. Recommend the next step (controlled regression, segmentation, or an experiment) to move from correlation toward cause.

OUTPUT FORMAT: Question framing -> Ranked driver/correlation table -> Code -> Trap checks -> Recommended next step.

CONSTRAINTS: Report sample sizes alongside coefficients; ignore correlations on tiny n. Never claim causation from correlation. Check for Simpson's paradox by re-running within key segments. Note when a non-linear relationship makes Pearson misleading.

Recommended models

claudegpt-4ogemini

More in Data Analysis & SQL