Crawl
open-sourcePre-migration intelligence for enterprise data infrastructure.
They catalog your data. Crawl tells you what breaks when you migrate.
Extract business logic from stored procedures, ETL jobs, and warehouse views — the undocumented rules buried in your data stack that block every migration project. Open-source, vendor-neutral, works with any LLM provider.
What Crawl does
Input: a 200-line stored procedure that nobody on the team wrote.
sp_calculate_customer_churn (confidence: HIGH) ├── Rule 1: Customers inactive >90 days flagged as at-risk ├── Rule 2: Churn score weighted by lifetime value (dim_customer) ├── Rule 3: ⚠️ References dim_product_v2 — TABLE DROPPED 2022-06-14 ├── Rule 4: Monthly aggregation via vendor-specific DATEADD syntax └── Triage: CRITICAL (12 downstream deps) | MEDIUM migration risk Contradictions found: └── Rule 2 conflicts with sp_calculate_ltv line 47 (different LTV formula)
CLI commands
crawl scan Discover stored procs, views, functions crawl extract Extract business rules via hybrid AST + LLM analysis crawl triage Score by criticality, complexity, and migration risk crawl diff Compare logic between environments or time periods crawl export Output to dbt-docs YAML, JSON, or Markdown
The problem
Every cloud migration hits the same wall: thousands of stored procedures and ETL jobs encoding business rules in vendor-specific dialects that nobody documented. Migration tools can translate your SQL, but they can't tell you what it means — or whether it's even still relevant.
Crawl is Step 0: the pre-migration intelligence layer that runs before you use Datafold, Lakebridge, dbt, or SnowConvert.
Questions Crawl answers
—What do we have? Inventory with auto-generated business-rule summaries
—What does it do? Human-readable logic, not just column lineage
—Is it still alive? Dead code detection, contradiction flagging
—What should we migrate first? Triage by criticality, complexity, risk
—What breaks if we move? Vendor-specific logic that won't survive a platform change
How it works
YOUR LEGACY DATA STACK
(stored procs, ETL, views)
│
▼
┌──────────────┐
│ CRAWL │ ← Step 0: Understand & Triage
│ (open-source) │
└──────┬───────┘
│
│ outputs: business rules, triage scores,
│ migration risk, dbt-compatible docs
│
┌─────┴─────┬──────────────┬─────────────────┐
▼ ▼ ▼ ▼
┌──────┐ ┌────────┐ ┌───────────┐ ┌──────────────┐
│ dbt │ │Datafold│ │Lakebridge │ │ SnowConvert │
└──────┘ └────────┘ └───────────┘ └──────────────┘
Step 1 Step 1 Step 1 Step 1Design principles
Enterprise safety
Crawl is designed to connect to enterprise databases safely.
—Read-only, always. No writes, no DDL, no DML. Read-only transaction mode enforced.
—Catalog-only access. Reads stored procedure source code from system catalogs. Never queries user table contents.
—Query allowlisting. Every SQL query is hardcoded and auditable. No dynamic SQL.
—Non-production recommended. Stored procedure source code is identical in staging — there's no reason to connect to prod.
Supported sources
| Source | Status |
|---|---|
| Oracle Data Integrator (ODI) | Supported |
| Informatica PowerCenter / IICS | In Development |
| Snowflake (views, UDFs, procs, tasks) | Planned |
| SQL Server stored procedures | Planned |
| Oracle PL/SQL | Planned |
| PostgreSQL stored procedures | Planned |
| dbt models | Planned |
Built by Digital Rain Technologies. Founded by Augustin Chan, former Development Architect at Informatica (12 years, Fortune 500 data integration across APAC/MENA/Europe).