dataform Installation in your local (VS-Code)
Published on September 12, 2025
Prerequisites Checklist
- Google Cloud account with billing enabled
- Node.js (v12+) installed
- Google Cloud SDK installed
- Git installed
- VS Code installed
Step-by-Step Full Reset Process
1. Clean Up Existing Setup (If Any)
# Remove global Dataform CLI
npm uninstall -g @dataform/cli
# Remove local project dependencies
rm -rf node_modules package-lock.json
# Clear GCP authentication
gcloud auth revoke --all
2. Set Up Fresh Environment
A. Install Global Dependencies
# Install/Reinstall Dataform CLI
npm install -g @dataform/cli
# Install Google Cloud SDK (if not already)
# Follow: https://cloud.google.com/sdk/docs/install
B. Authenticate with Google Cloud
# Login to GCP
gcloud auth login
# Set default application credentials
gcloud auth application-default login
# Set your GCP project
gcloud config set project YOUR_PROJECT_ID
3. Repository Setup
A. Clone Your Repository
# Navigate to your desired directory
cd ~/projects
# Clone your GitHub repository
git clone https://github.com/your-username/your-dataform-repo.git
cd your-dataform-repo
B. Initialize Dataform Project
# Initialize Dataform (creates dataform.json)
dataform init
# Install project dependencies
npm install
4. Configure Dataform Project
A. Edit dataform.json
{ "warehouse": "bigquery",
"defaultDatabase": "your-gcp-project-id",
"defaultSchema": "your_dataset_name",
"assertionSchema": "dataform_assertions",
"defaultLocation": "US"
}
B. Set Up Directory Structure
your-repo/
├── definitions/
├── includes/
├── assertions/
├── dataform.json
├── package.json
└── .gitignore
5. Test Your Setup
A. Verify Installation
# Check Dataform version
dataform --version
# Check GCP authentication
gcloud auth list
# Check project configuration
gcloud config list
B. Test Compilation
# Create a test file
echo 'config { type: "view" } SELECT 1 as test' > definitions/test_view.sqlx
# Compile project
dataform compile
# Dry run
dataform run --dry-run
6. VS Code Configuration
A. Install Recommended Extensions
- SQL Tools
- GitLens
- Prettier (for code formatting)
B. Create VS Code Settings (optional)
Create .vscode/settings.json
:
{
"editor.formatOnSave": true,
"files.associations": {
"*.sqlx": "sql"
}
}
7. First Real Run
A. Create Your First Table
definitions/first_table.sqlx:
config {
type: "table",
schema: "your_dataset",
description: "My first Dataform table"
}
SELECT
CURRENT_DATE() as execution_date,
COUNT(*) as total_records
FROM
`your-project.other_dataset.source_table`
B. Execute
# Compile and run
dataform run
# Run specific actions only
dataform run --actions your_dataset.first_table
8. Set Up CI/CD (Optional but Recommended)
A. Create GitHub Actions Workflow
.github/workflows/dataform.yml:
name: Dataform CI
on: [push]
jobs:
dataform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
with:
node-version: '16'
- run: npm install -g @dataform/cli
- run: npm install
- run: dataform compile
env:
GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_CREDENTIALS }}
9. Environment Setup Script (For Future Use)
Create setup.sh
:
#!/bin/bash
echo "Setting up Dataform environment..."
# Install global dependencies
npm install -g @dataform/cli
# Authenticate with GCP
gcloud auth login
gcloud auth application-default login
# Set project
gcloud config set project YOUR_PROJECT_ID
# Install project dependencies
npm install
echo "Setup complete! Run 'dataform compile' to test."
Quick Reset Command Sequence
# Complete fresh start
npm uninstall -g @dataform/cli
gcloud auth revoke --all
rm -rf node_modules package-lock.json
git clean -fd
npm install -g @dataform/cli
gcloud auth login
gcloud auth application-default login
npm install
dataform compile
Troubleshooting Common Issues
If authentication fails:
# Reset credentials
gcloud auth application-default revoke
gcloud auth application-default login
If project doesn't compile:
# Check for syntax errors
dataform compile --verbose
# Check warehouse connection
dataform test-connection
If permissions issues:
- Ensure service account has BigQuery Admin role
- Check IAM permissions in GCP console
This process gives you a clean, reproducible setup that you can use anytime you need to start over or set up on a new machine.