Sanyukta Suman Logo Sanyukta Suman
Blog Post Thumbnail
Image credit: Source

Create Airflow instance from Local Computer

Published on September 19, 2025

Introduction. Apache Airflow is a powerful tool for orchestrating workflows. Running Airflow from your local environment is a great way to develop, test, and iterate on DAGs before moving to more complex deployments. This guide walks you through prerequisites, setting up your local environment, building and running Airflow in Docker, deploying a simple DAG, and practical considerations and pitfalls to watch out for. It also references the practical steps from a recent conversation to keep things concrete and actionable.

Table of Contents

  • Prerequisites
  • Local environment choices
  • Docker-based quick start (single-container for testing)
  • Running a multi-container Airflow setup (recommended for longer-term use)
  • Creating and testing your first DAG
  • Observability and debugging
  • Environment and deployment considerations
  • Common gotchas and watch-outs
  • Next steps and best practices

Prerequisites

Before you start, ensure you have:

  • A computer with a modern OS (Linux, macOS, or Windows with WSL2).
  • Docker Desktop installed and running.
  • Basic command-line proficiency (bash/sh or PowerShell/zsh as appropriate).
  • A local directory for your DAGs (e.g., `~/airflow/dags`).
  • Optional: a Python project you’ll convert into Airflow tasks.

Local Environment Choices

There are several ways to run Apache Airflow locally. The two most common are:

  • Docker-based approach: Fast, isolated, and reproducible. Ideal for local development and testing.
  • Native Python environment: Running Airflow directly on your machine. Useful for simpler scenarios but less portable.

For this guide, I’ll focus on the Docker-based approach, including a single-container quick-start for testing and a multi-container setup for a fuller development environment. This aligns with the “straight path” you asked for in the earlier conversation.

Docker-based Quick Start (Single-container for Testing)

This approach is excellent for a quick start and validates DAG logic without worrying about a full production setup.

What You’ll Need

  • Docker Desktop installed.
  • A local DAGs directory, e.g., `/path/to/your/dags`.

Steps

  1. Build your Airflow image (optional if you want to customize)
  2. If you already have a Dockerfile (as discussed in our conversation), build your image: `docker build -t my-airflow:latest .`
  3. Run Airflow in a single container with SQLite (for quick testing)
  4. This starts Airflow, initializes the DB, and exposes the web UI on port 8080.
  5. Replace `/path/to/your/dags` with your actual DAGs directory.

Docker run command (copy-paste):

docker run -d --name airflow \  
    -p 8080:8080 \  
    -v /path/to/your/dags:/opt/airflow/dags \  
    my-airflow:latest \  
    bash -c "airflow db init && airflow webserver"
  1. Access Airflow UI
  2. Open your browser and go to http://localhost:8080
  3. The UI will show your DAGs found in `/opt/airflow/dags` within the container.
  4. Optional: Run the scheduler (recommended for testing)
  5. If you want to run the scheduler in a separate container (common pattern for multi-container setups), you can run:
docker run -d --name airflow-scheduler \  
--link airflow:airflow \  
-v /path/to/your/dags:/opt/airflow/dags \  
my-airflow:latest \  
airflow scheduler
  • The scheduler keeps track of DAG runs and task states.

Notes from Our Conversation

  • We started with a Python 3.7-compatible, cp37-focused pinset to avoid Python 3.8+ wheels. If you followed that approach, your local development would use a conservative set of package versions. If you later decide to upgrade Airflow or Python, the dependency set will change.
  • In the simple container approach, SQLite is used by default for metadata storage. This is fine for learning and small tests but not suitable for production or parallel task runs.

Best Practices if You Plan to Scale Beyond Local Testing

  • Use a multi-container setup with a metadata database (PostgreSQL or MySQL) and a Redis backend if you plan to use CeleryExecutor for parallel task execution.
  • Use Docker Compose to orchestrate webserver, scheduler, and the metadata database.
  • Persist Airflow logs and DAGs to a mounted volume to avoid losing data on container recreation.
  • Keep Airflow version pinned to a known working version (e.g., 2.x) and avoid aggressive upgrades mid-project to minimize breaking changes.

Multi-container Development Environment (Recommended for Ongoing Work)

If you want a more realistic setup, use Docker Compose. Here’s a minimal, copy-ready Compose file you can adapt. It uses PostgreSQL for the metadata database and runs Airflow’s webserver and scheduler in separate containers.

`docker-compose.yml` (minimal)

version: '3'
services:
    postgres:  
        image: postgres:13  
        environment:    
            POSTGRES_USER: airflow    
            POSTGRES_PASSWORD: airflow    
            POSTGRES_DB: airflow  
        volumes:    
            - postgres_data:/var/lib/postgresql/data

    redis:    
        image: redis:6

    airflow:    
        image: my-airflow:latest    
        depends_on:      
            - postgres      
            - redis    
        ports:      
            - "8080:8080"    
        volumes:      
            - ./dags:/opt/airflow/dags      
            - ./logs:/opt/airflow/logs      
            - ./plugins:/opt/airflow/plugins    
        environment:      
            AIRFLOW__CORE__LOAD_EXAMPLES: "False"      
            AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow      
            AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0      
            AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow    
        command: >      
            bash -c "airflow db init && airflow scheduler &               
                     airflow webserver --port 8080"
volumes:  
    postgres_data:

Creating and Testing Your First DAG

  1. Create a simple DAG
  2. Save a file in your dags directory, e.g., `/path/to/your/dags/hello_airflow.py:`

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {  
        'owner': 'you',  
        'depends_on_past': False,  
        'retries': 1,  
        'retry_delay': timedelta(minutes=5),
}
with DAG(  
        'hello_airflow',  
        default_args=default_args,  
        description='A simple hello Airflow DAG',  
        schedule_interval=timedelta(minutes=1),  
        start_date=datetime(2020, 1, 1),  
        catchup=False,
) as dag:  
        t1 = BashOperator( 
                task_id='say_hello',      
                bash_command='echo "Hello from Airflow!"'  
            )
  1. Check the DAG in the UI
  2. After the container starts, refresh the Airflow UI, and you should see the `hello_airflow` DAG.
  3. Trigger it manually and observe the logs.

Observability and Debugging

  • Logs: In the UI, click on a DAG, then on a specific task, and view logs to diagnose failures.
  • Web UI: Use the UI to inspect DAGs, runs, task statuses, and code.
  • CLI: You can also interact with Airflow via the CLI inside the container: `docker exec -it airflow bash airflow dags list airflow tasks test hello_airflow say_hello 2020-01-01`
  • Local vs. remote: If you’re using Docker Compose with a database, logs and metadata persist in the Postgres container and mounted volumes.

Environment and Deployment Considerations

  • Local testing vs. production: Local SQLite is convenient for learning, but production-grade deployments should use PostgreSQL/MySQL and a proper executor like CeleryExecutor or KubernetesExecutor.
  • Version control for DAGs: Keep your DAGs in a git repository and mount them into the container with a volume for easy updates.
  • Secrets and credentials: Do not bake credentials into your DAG files. Use environment variables, Airflow’s connections, or a secret backend (e.g., AWS Secrets Manager, GCP Secret Manager) managed outside the container.
  • Resource limits: On your host, Airflow tasks can consume CPU and RAM. For local testing, keep expectations modest (e.g., 1–2 CPUs, 1–2 GB RAM per container).

Common Pitfalls and Watch-Outs

  • Environment drift: If you update the image or Python environment, re-run `airflow db upgrade` (`-airflow db upgrade`) to align the metadata database schema.
  • Volume permissions: Ensure the host directory permissions allow the container to read/write DAGs, logs, and plugins.
  • Scheduler vs. Webserver timing: In a single-container setup, starting the webserver and scheduler in the right order matters. A typical approach is to initialize the DB first, then start the webserver; add the scheduler as a separate container in production or when you’re ready for a multi-container setup.
  • Dependency drift: Pin Airflow version and dependencies to avoid surprises when upgrading. Use a `requirements.txt` or constraint files if you customize Python dependencies.

What to Do Next

  • If you want a straightforward, copy-paste path, use the single-container quick start for rapid DAG testing, then migrate to Docker Compose for a more realistic environment with a real metadata database.
  • If you’d like, I can tailor a ready-to-run `docker-compose.yml` based on your machine specifics (OS, whether you’ll use PostgreSQL, and where your DAGs live).
  • If you’re ready to move toward production, I can guide you through migrating to Postgres, setting up CeleryExecutor or KubernetesExecutor, and implementing best practices for credentials and observability.

Would you like me to provide a ready-to-run `docker-compose.yml` tuned to your DAGs path and your preferred database (SQLite for local testing vs PostgreSQL for production-like testing)? If yes, share:

  • Path to your DAGs directory (host path)
  • Whether you want SQLite (local, quick) or PostgreSQL (production-like)
  • Any plugins you plan to use (so I can slot them into the compose setup)