Sanyukta Suman Logo Sanyukta Suman
Blog Post Thumbnail
Image credit: Source

dataform Installation in your local (VS-Code)

Published on September 12, 2025

Prerequisites Checklist

  • Google Cloud account with billing enabled
  • Node.js (v12+) installed
  • Google Cloud SDK installed
  • Git installed
  • VS Code installed

Step-by-Step Full Reset Process

1. Clean Up Existing Setup (If Any)

# Remove global Dataform CLI
npm uninstall -g @dataform/cli

# Remove local project dependencies
rm -rf node_modules package-lock.json

# Clear GCP authentication
gcloud auth revoke --all

2. Set Up Fresh Environment

A. Install Global Dependencies

# Install/Reinstall Dataform CLI
npm install -g @dataform/cli

# Install Google Cloud SDK (if not already)
# Follow: https://cloud.google.com/sdk/docs/install

B. Authenticate with Google Cloud

# Login to GCP
gcloud auth login

# Set default application credentials
gcloud auth application-default login

# Set your GCP project
gcloud config set project YOUR_PROJECT_ID

3. Repository Setup

A. Clone Your Repository

# Navigate to your desired directory
cd ~/projects

# Clone your GitHub repository
git clone https://github.com/your-username/your-dataform-repo.git
cd your-dataform-repo

B. Initialize Dataform Project

# Initialize Dataform (creates dataform.json)
dataform init

# Install project dependencies
npm install

4. Configure Dataform Project

A. Edit dataform.json

{  "warehouse": "bigquery",  
     "defaultDatabase": "your-gcp-project-id",  
     "defaultSchema": "your_dataset_name",  
     "assertionSchema": "dataform_assertions",  
     "defaultLocation": "US"
}

B. Set Up Directory Structure

your-repo/
├── definitions/
├── includes/
├── assertions/
├── dataform.json
├── package.json
└── .gitignore

5. Test Your Setup

A. Verify Installation

# Check Dataform version
dataform --version

# Check GCP authentication
gcloud auth list

# Check project configuration
gcloud config list

B. Test Compilation

# Create a test file
echo 'config { type: "view" } SELECT 1 as test' > definitions/test_view.sqlx

# Compile project
dataform compile

# Dry run
dataform run --dry-run

6. VS Code Configuration

A. Install Recommended Extensions

  • SQL Tools
  • GitLens
  • Prettier (for code formatting)

B. Create VS Code Settings (optional)

Create .vscode/settings.json:

{  
    "editor.formatOnSave": true,
    "files.associations": {    
    "*.sqlx": "sql"  
    }
}

7. First Real Run

A. Create Your First Table

definitions/first_table.sqlx:

config {
    type: "table",
    schema: "your_dataset",
    description: "My first Dataform table"
}

SELECT 
CURRENT_DATE() as execution_date,
COUNT(*) as total_records
FROM 
`your-project.other_dataset.source_table`

B. Execute

# Compile and run
dataform run

# Run specific actions only
dataform run --actions your_dataset.first_table

8. Set Up CI/CD (Optional but Recommended)

A. Create GitHub Actions Workflow

.github/workflows/dataform.yml:

name: Dataform CI
on: [push]
jobs:  
  dataform:    
    runs-on: ubuntu-latest    
    steps:    
    - uses: actions/checkout@v2    
    - uses: actions/setup-node@v2      
      with:        
        node-version: '16'    
    - run: npm install -g @dataform/cli    
    - run: npm install    
    - run: dataform compile    
    env:      
      GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_CREDENTIALS }}

9. Environment Setup Script (For Future Use)

Create setup.sh:

#!/bin/bash

echo "Setting up Dataform environment..."

# Install global dependencies
npm install -g @dataform/cli

# Authenticate with GCP
gcloud auth login
gcloud auth application-default login

# Set project
gcloud config set project YOUR_PROJECT_ID

# Install project dependencies
npm install
echo "Setup complete! Run 'dataform compile' to test."

Quick Reset Command Sequence

# Complete fresh start
npm uninstall -g @dataform/cli
gcloud auth revoke --all
rm -rf node_modules package-lock.json
git clean -fd
npm install -g @dataform/cli
gcloud auth login
gcloud auth application-default login
npm install
dataform compile

Troubleshooting Common Issues

If authentication fails:

# Reset credentials
gcloud auth application-default revoke
gcloud auth application-default login

If project doesn't compile:

# Check for syntax errors
dataform compile --verbose

# Check warehouse connection
dataform test-connection

If permissions issues:

  • Ensure service account has BigQuery Admin role
  • Check IAM permissions in GCP console

This process gives you a clean, reproducible setup that you can use anytime you need to start over or set up on a new machine.