CatalystOps analyzes your PySpark and Databricks code inline as you type — 30+ anti-pattern detectors, dry-run plan analysis on your cluster, and actionable fixes. No context switching.
df.isEmpty() instead —
it short-circuits on the first partition.
if not df.isEmpty(): ...
From static code analysis to live Databricks plan inspection — without leaving the editor.
Detects 30+ PySpark and Databricks anti-patterns inline as you type. No cluster required. Catches collect(), UDFs, cross joins, unsafe writes, SQL injection, schema drift, and more.
Submits a neutralized version of your script to Databricks (cluster or Serverless) and returns the physical Catalyst plan — with sort-merge join detection, broadcast thresholds, shuffle analysis, and cost estimation.
Analyze any past Databricks job run from the Jobs sidebar — no re-execution needed. CatalystOps reads the Spark event log from DBFS, extracts physical plans, and opens an interactive DAG showing operator trees, filter conditions, and issue badges.
The Clusters panel lists every interactive cluster in your workspace. Click SSH on any cluster — CatalystOps starts it if stopped, runs setup automatically, and opens VS Code Remote SSH directly on the driver. No terminal commands needed.
Tracks DBU and dollar spend per period directly from
system.billing.usage
with a 1-hour cache. After each serverless run, optionally fetches actual DBU consumption.
Exposes a Streamable HTTP MCP server auto-discovered by VS Code 1.99+. Lets Claude and other AI tools analyze your PySpark code, fetch billing summaries, and run dry runs through natural language.
Install from the VS Code Marketplace. The moment you open a
.py
file, local analysis kicks in — no configuration needed.
30+ rules light up immediately for any PySpark anti-patterns.
Add your workspace URL and personal access token via
CatalystOps: Configure Databricks Connection.
Pick cluster or Serverless execution mode. CatalystOps reads your
~/.databrickscfg
automatically if it exists.
Press ⌘⇧K to submit the current file. CatalystOps neutralizes side-effects, executes the Catalyst planner on your cluster, and returns the physical plan with cost annotations, join strategies, and actionable fixes — all in the sidebar.
CatalystOps ships a built-in MCP server auto-discovered by VS Code 1.99+. Claude and other AI clients can call CatalystOps tools directly — analyze code, fetch billing data, run dry-runs, and read plan results through natural language.
Free, open-source, and available for any Databricks or PySpark project.
Also available on Open VSX for Cursor, Theia, and other editors.