Step 3
Beginner
8 min

Build Your Data Context

Import your database schema and enrich it with AI-generated descriptions, quality checks, and tags — so the agent actually understands your data.

Bruin CLIAI
Learning paths:Data AnalystData Engineer

What you'll do

  1. Import your database tables as Bruin asset files
  2. Enhance those assets with AI-generated metadata

Why this step matters

An AI agent with access to a database but no context about what's in it will produce mediocre results. It'll guess that cust_id is a customer ID (probably right) but won't know that gmv stands for Gross Merchandise Value, that status = 3 means "refunded," or that timestamps are stored in UTC but your business runs on EST.

This step fixes that. You'll import your database schema into Bruin — turning each table into a local asset file with column names and types — and then use AI to enrich those files with meaningful descriptions, data quality checks, and tags. The result is a structured knowledge base that any AI agent can read to understand your data before writing a single query.

This is what separates a useful AI analyst from one that constantly gets things wrong.

Instructions

1. Discover available schemas (optional)

If you already know the name of the schema (or dataset) you want to import, skip ahead to step 2. Otherwise, you can list the available schemas in your database:

BigQuery:

bruin query --connection gcp-default --query "SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA"

Postgres / Redshift:

bruin query --connection postgres-default --query "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('pg_catalog', 'information_schema')"

ClickHouse:

bruin query --connection clickhouse-default --query "SHOW DATABASES"

Pick the schema that contains the tables you want to analyze (e.g., stock_market, ecomm, public).

2. Import your database schema

Run the import command, pointing at your pipeline folder (ai-analyst) and the connection you set up in the previous step. Replace <connection-name> with the name you used (e.g. gcp-default, redshift-default, postgres-default, or clickhouse-default), and <schema> with the schema you identified above:

bruin import database --connection <connection-name> --schema <schema> ai-analyst

Note: The last argument is the path to your pipeline folder (ai-analyst). This is the folder containing pipeline.yml, not the project root where .bruin.yml lives.

Importing multiple schemas: If your data spans multiple schemas, you can import them all at once:

bruin import database --connection <connection-name> --schemas schema1,schema2,schema3 ai-analyst

This creates an .asset.yml file for each table under ai-analyst/assets/<schema>/. Each file contains the table name, type, and column metadata pulled directly from your database.

3. Enhance with AI

Now let the AI fill in what raw schema can't tell you — descriptions, quality checks, and tags:

bruin ai enhance ai-analyst

Important: The command is bruin ai enhance, not bruin enhance. Don't forget the ai subcommand.

This command connects to your database to gather column statistics (null counts, distinct values, min/max ranges), then uses an AI model to:

  • Write descriptions for each table and column based on their names and data patterns
  • Add quality checks like not_null on primary keys, accepted_values on status columns, and range checks on numeric fields
  • Apply tags that group related assets by domain

The AI is conservative — it only adds metadata it's confident about, and existing content is never overwritten. If you've already written descriptions for some columns, they'll stay as-is.

Which AI provider? The command auto-detects which AI CLI you have installed (Claude Code, OpenCode, or Codex) and uses it automatically. If you have multiple installed and want to specify one explicitly, use the --claude, --opencode, or --codex flag:

bruin ai enhance ai-analyst --claude

See the ai enhance docs for all options.

Time estimate: The enhance step can take several minutes depending on how many tables you imported. For 15-20 tables, expect 3-5 minutes. For larger schemas (50+ tables), it may take 10+ minutes.

4. Review what was generated

Open one of the asset files to see what Bruin created:

cat ai-analyst/assets/<schema>/<table>.asset.yml

You'll see something like:

type: pg.source
description: "Customer orders with purchase details and fulfillment status"
tags:
  - ecommerce
  - orders
columns:
  - name: order_id
    type: INTEGER
    description: "Unique identifier for the order"
    checks:
      - name: not_null
      - name: unique
  - name: status
    type: VARCHAR
    description: "Current fulfillment status"
    checks:
      - name: accepted_values
        value: ["pending", "shipped", "delivered", "refunded"]

This is the context your AI agent will read. The better these descriptions are, the better the agent's queries will be. Feel free to edit any file — add business context, fix descriptions, or tighten quality checks. These are your files, version-controlled and human-readable.

Watch out for incorrect unique checks. The AI sometimes adds a unique check to columns that look like identifiers (e.g., ticker, customer_id) but aren't unique in the table. For example, in a quarterly financials table, ticker appears once per quarter — it's not unique per row. Always review generated checks against how the data actually works. Remove any unique constraint that doesn't apply.

5. Verify your work

Check that your assets were created correctly:

ls ai-analyst/assets/

You should see folders for each schema you imported, with .asset.yml files inside each.

Troubleshooting

"bruin enhance" not found / "unknown command"

The correct command is bruin ai enhance (with ai as a subcommand), not bruin enhance. Run:

bruin ai enhance ai-analyst

"No AI CLI detected"

The bruin ai enhance command requires an AI CLI to be installed. Install one of:

  • Claude Code: curl -fsSL https://claude.ai/install.sh | bash (macOS/Linux) or irm https://claude.ai/install.ps1 | iex (Windows PowerShell)
  • OpenCode: See opencode.ai for installation
  • Codex: See OpenAI Codex docs

Command times out or hangs

For large schemas (50+ tables), the enhance step can take 10+ minutes. If it seems stuck:

  • Check your internet connection
  • Try enhancing a smaller subset first using --schema to limit which assets are processed
  • If the AI generates incomplete metadata, you can always re-run the command — it won't overwrite existing descriptions

AI generates incorrect or low-quality descriptions

The AI makes educated guesses based on column names and data patterns. If a description is wrong:

  • Edit the .asset.yml file directly — these are your files
  • Add a clarification to your AGENTS.md (Step 5) so the AI agent knows the correct interpretation
  • Consider adding a glossary entry for commonly misunderstood terms

Import command fails with "permission denied" or "access denied"

Your database connection doesn't have sufficient permissions. You need at least SELECT access on the target schema and its tables. For BigQuery, ensure your account has BigQuery Data Viewer role.

What just happened

Your project now contains asset files that map every table and column in your database, enriched with AI-generated descriptions and quality checks. This metadata is the foundation for everything that comes next — it's what turns a generic AI into one that actually understands your data.