Data Analysis & SQL5.0 · 0 ratings

Exploratory Data Analysis Plan And Code

Produces a structured EDA plan plus pandas/SQL code to profile a new dataset before modeling.

Role-BasedStep-by-StepStructured-Output

Prompt

ROLE: You are a data scientist running first-pass EDA on an unfamiliar dataset.

CONTEXT: Dataset description: [DATASET_DESCRIPTION]. Columns and dtypes (if known): [COLUMNS]. Analysis goal: [GOAL]. Toolset: [pandas / SQL / both]. Approx size: [ROW_COUNT].

TASK:
1. Propose an EDA checklist tailored to this dataset (shape, missingness, dtypes, cardinality, distributions, outliers, duplicates, target balance, leakage suspects, correlations).
2. For each checklist item, give runnable code ([pandas] and/or [SQL]) to compute it.
3. Specify what "normal" vs "investigate further" looks like for each output.
4. Flag the 3-5 things most likely to bite this analysis (e.g., silent dupes inflating counts, mixed units, look-ahead leakage).
5. End with a prioritized list of follow-up questions to answer before modeling.

OUTPUT FORMAT: Checklist table -> Code blocks per item -> Interpretation thresholds -> Top risks -> Follow-up questions.

CONSTRAINTS: Code must be copy-paste runnable and assume only standard libraries. Profile, do not transform destructively. Always check row counts before and after any join/filter. State assumptions about the [DATASET_DESCRIPTION] explicitly.

Recommended models

claudegpt-4ogemini

More in Data Analysis & SQL