A Guide to Creating a Cloud Composer Environment in GCP
Cloud Composer is Google Cloud's managed service for running Apache Airflow. It takes care of the infrastructure for you, allowing you to focus on building and orchestrating data pipelines. If you're new to the service, setting up your first environment can feel like a big step. This guide will walk you through the process, step by step, so you can get your workflows running quickly.
What's the Difference: Composer 1 vs. Composer 2
Before you begin, it's important to understand the two versions of the service.
- Cloud Composer 1: The original version, built on GKE Standard. It's zonal, meaning all components are in a single zone.
- Cloud Composer 2: The latest version, built on GKE Autopilot. It is more cost-effective, auto-scales more efficiently, and is highly resilient with a multi-zonal architecture.
For all new projects, it is highly recommended to use Cloud Composer 2. This guide will focus on creating a Composer 2 environment.
Creating an Environment via the Cloud Console
This is the most straightforward method for a quick setup.
Step 1: Navigate to Cloud Composer
- Go to the Google Cloud Console.
- In the search bar at the top, type "Cloud Composer" and select the service from the results.
- On the Cloud Composer page, click the + CREATE button at the top.
Step 2: Choose Your Environment Version
On the creation page, you'll be prompted to choose a version. Select Cloud Composer 2.
Step 3: Configure Basic Environment Details
This is where you give your environment an identity.
- Name: Give your environment a unique, descriptive name (e.g.,
my-first-composer-env
). - Location: Choose the region closest to where your other Google Cloud resources are located to minimize data transfer costs and latency.
- Image Version: This is the version of Airflow that will be installed. The latest stable version is selected by default, which is usually the best option.
- Service Account: You can either use the default Compute Engine service account or create a new, dedicated service account with the appropriate permissions. For production, a dedicated service account is a best practice for security.
Step 4: Configure the Environment Scale
Cloud Composer 2 provides three pre-configured sizes to get you started.
- Small: Best for development or light workflows.
- Medium: A balanced option for a wider range of workflows.
- Large: For large-scale data processing and a high number of concurrent tasks.
For your first environment, Medium is a great place to start. You can always scale up later if needed.
Step 5: Finalize and Create
Before creating, you can review additional advanced settings under the "Advanced options" dropdown, such as networking, environment variables, and PyPI packages. For now, the default settings are fine.
When you are ready, click the CREATE button.
Step 6: The Waiting Game
Creating a Cloud Composer environment is a significant provisioning task. It typically takes about 20-30 minutes for all the necessary Google Cloud resources (GKE cluster, Cloud Storage bucket, and more) to be provisioned and configured.
You can monitor the creation progress directly from the Cloud Composer page in the console.
Creating an Environment via Cloud Shell (gcloud
)
Using the command line is an excellent way to automate your infrastructure setup.
Step 1: Open Cloud Shell
- In the Google Cloud Console, click the Activate Cloud Shell button at the top of the window. A terminal will open at the bottom of your screen.
Step 2: Construct the gcloud
command
Use the following template to create your environment. Replace the bracketed placeholders with your desired values.
gcloud composer environments create [ENVIRONMENT_NAME] \
--location=[LOCATION] \
--image-version=[IMAGE_VERSION] \
--service-account=[SERVICE_ACCOUNT_EMAIL] \
--environment-size=medium
[ENVIRONMENT_NAME]
: A unique name for your environment.[LOCATION]
: The region where your environment will be created (e.g.,us-central1
).[IMAGE_VERSION]
: The specific Composer and Airflow version, such ascomposer-2.7.3-airflow-2.7.3
. It's a good practice to be explicit here.[SERVICE_ACCOUNT_EMAIL]
: The email of the service account you wish to use.
Step 3: Address Common Issues (Troubleshooting)
As we learned, some issues can prevent a successful creation. If your command fails, check the following:
- API Permissions: Ensure the service account you are using has the
composer.worker
role and any other permissions needed to create resources in your project. A common error is aProtocol message has no "release_config" field
message, which can sometimes be a red herring pointing to a permissions problem or an old DAG file still in the Airflow environment's GCS bucket. - Billing: Make sure billing is enabled for your project. A failed creation with no clear error message can often be a billing issue.
- Old DAGs: If you were previously testing, ensure any DAG files with errors are removed from the GCS bucket to prevent them from being parsed during the provisioning process.
Step 4: Run the Command
Paste your constructed command into Cloud Shell and press Enter. The terminal will provide you with a URL to monitor the progress of the creation. This process is asynchronous, so the command will return immediately while the environment provisions in the background.
Key Considerations After Creation
Once your environment is ready, it's important to remember a few things:
- DAGs Bucket: Cloud Composer automatically provisions a Cloud Storage bucket for your environment. This is where you will upload your Airflow DAG files.
- Airflow UI: The Airflow web interface is the main way to monitor and manage your workflows. You can access it via a link on your environment's details page.
- Cost: Cloud Composer is a powerful service, and its cost is a combination of the underlying resources it uses. Be sure to delete the environment when it is no longer needed to avoid incurring charges.