← Digital Rain Technologies

Crawl

open-source

Pre-migration intelligence for enterprise data infrastructure.

They catalog your data. Crawl tells you what breaks when you migrate.

Extract business logic from stored procedures, ETL jobs, and warehouse views — the undocumented rules buried in your data stack that block every migration project. Open-source, vendor-neutral, works with any LLM provider.

What Crawl does

Input: a 200-line stored procedure that nobody on the team wrote.

sp_calculate_customer_churn (confidence: HIGH)
├── Rule 1: Customers inactive >90 days flagged as at-risk
├── Rule 2: Churn score weighted by lifetime value (dim_customer)
├── Rule 3: ⚠️  References dim_product_v2 — TABLE DROPPED 2022-06-14
├── Rule 4: Monthly aggregation via vendor-specific DATEADD syntax
└── Triage: CRITICAL (12 downstream deps) | MEDIUM migration risk

Contradictions found:
  └── Rule 2 conflicts with sp_calculate_ltv line 47 (different LTV formula)

CLI commands

crawl scan      Discover stored procs, views, functions
crawl extract   Extract business rules via hybrid AST + LLM analysis
crawl triage    Score by criticality, complexity, and migration risk
crawl diff      Compare logic between environments or time periods
crawl export    Output to dbt-docs YAML, JSON, or Markdown

The problem

Every cloud migration hits the same wall: thousands of stored procedures and ETL jobs encoding business rules in vendor-specific dialects that nobody documented. Migration tools can translate your SQL, but they can't tell you what it means — or whether it's even still relevant.

Crawl is Step 0: the pre-migration intelligence layer that runs before you use Datafold, Lakebridge, dbt, or SnowConvert.

Questions Crawl answers

What do we have? Inventory with auto-generated business-rule summaries

What does it do? Human-readable logic, not just column lineage

Is it still alive? Dead code detection, contradiction flagging

What should we migrate first? Triage by criticality, complexity, risk

What breaks if we move? Vendor-specific logic that won't survive a platform change

How it works

 YOUR LEGACY DATA STACK
 (stored procs, ETL, views)
           │
           ▼
    ┌──────────────┐
    │    CRAWL      │  ← Step 0: Understand & Triage
    │  (open-source) │
    └──────┬───────┘
           │
           │ outputs: business rules, triage scores,
           │          migration risk, dbt-compatible docs
           │
     ┌─────┴─────┬──────────────┬─────────────────┐
     ▼           ▼              ▼                 ▼
  ┌──────┐  ┌────────┐  ┌───────────┐  ┌──────────────┐
  │ dbt  │  │Datafold│  │Lakebridge │  │ SnowConvert  │
  └──────┘  └────────┘  └───────────┘  └──────────────┘
   Step 1     Step 1       Step 1         Step 1

Design principles

Step 0, not Step 1. Crawl doesn't migrate your code — it tells you what you have so migration tools can do their job.
Vendor-neutral. Works with any source database, any target platform. No lock-in.
Any LLM provider. OpenRouter by default, or any OpenAI-compatible API. Point it at a local model if enterprise code can't leave your environment.
Open-source (Apache 2.0). Your understanding of your data belongs to you, not a vendor.

Enterprise safety

Crawl is designed to connect to enterprise databases safely.

Read-only, always. No writes, no DDL, no DML. Read-only transaction mode enforced.

Catalog-only access. Reads stored procedure source code from system catalogs. Never queries user table contents.

Query allowlisting. Every SQL query is hardcoded and auditable. No dynamic SQL.

Non-production recommended. Stored procedure source code is identical in staging — there's no reason to connect to prod.

Supported sources

SourceStatus
Oracle Data Integrator (ODI)Supported
Informatica PowerCenter / IICSIn Development
Snowflake (views, UDFs, procs, tasks)Planned
SQL Server stored proceduresPlanned
Oracle PL/SQLPlanned
PostgreSQL stored proceduresPlanned
dbt modelsPlanned

Built by Digital Rain Technologies. Founded by Augustin Chan, former Development Architect at Informatica (12 years, Fortune 500 data integration across APAC/MENA/Europe).