π’ DevOps [Google Cloud Track]
Currently available in Islamabad
Google Cloud Platform is where the internet runs at scale. Gmail, YouTube, Google Search, Google Maps β all of them run on the same infrastructure you will learn to operate in this course. GCP is the cloud that invented Kubernetes, pioneered serverless containers with Cloud Run, and built the data and AI platform that the world's largest companies depend on. In 2026 it holds the third-largest cloud market share globally and is growing fastest in data engineering, AI/ML workloads, and developer-friendly serverless deployments β precisely the areas where Pakistan's engineering talent is increasingly competitive.
The core program builds the GCP DevOps foundation in 4β5 weeks: Linux, containers, Google Cloud infrastructure, CI/CD with Cloud Build and GitHub Actions, Infrastructure as Code with Terraform and the Google Cloud CLI, observability with Cloud Monitoring and OpenTelemetry, and cost governance. Advanced specialisations β GKE, security, serverless, platform engineering, data and ML infrastructure, and SRE β are offered as separate add-on tracks so students can specialise without covering everything at once.
π‘ Why Google Cloud DevOps in 2026
- Google invented Kubernetes and contributes more to its development than any other company β GKE (Google Kubernetes Engine) is the most mature managed Kubernetes offering and the reference implementation that all other managed K8s services are measured against
- Cloud Run is the cleanest serverless container platform in the industry β deploy any container, pay per request, scale to zero, and get HTTPS automatically β the fastest path from Docker image to production URL
- Google's data and AI platform (BigQuery, Vertex AI, Dataflow, Pub/Sub, Looker) powers the data infrastructure of more Fortune 500 companies than any competing platform β creating massive demand for GCP data engineers and ML infrastructure engineers
- GCP's developer experience is consistently rated the best of the three major clouds β the gcloud CLI, Cloud Shell, and the Artifact Registry are among the most polished infrastructure tools available
- Google Cloud's open-source commitment means GCP engineers spend less time on proprietary tooling and more time on skills that transfer: Kubernetes, Terraform, Prometheus, Grafana, and OpenTelemetry are all GCP-native or deeply integrated
- The Professional Cloud DevOps Engineer and Professional Cloud Architect certifications are among the most rigorous and respected in the industry β and this course aligns directly with both
- Remote opportunities: GCP skills are increasingly requested by international companies building data-heavy and AI-native products β the fastest-growing segment of Pakistan's IT export market
π Core Program β 4 to 5 Weeks
(Foundation every GCP DevOps engineer needs before specialising)
Week 1 β Linux, Networking & Shell Automation
Every GCP Compute Engine VM, every GKE node, every Cloud Run instance, and every Cloud Build worker runs on Linux. This week builds the operating system and networking foundations that all GCP infrastructure sits on top of.
- Linux fundamentals for DevOps: process management, systemd services, file permissions, user and group management, and the /proc and /sys virtual filesystems
- Shell scripting in Bash: variables, conditionals, loops, functions, error handling with set -euo pipefail, and writing production-grade automation scripts for GCP operations
- Text processing tools: grep, awk, sed, cut, sort, uniq, jq for JSON (essential for gcloud CLI output parsing), and yq for YAML β the DevOps data transformation toolkit
- File system and storage: inodes, mount points, LVM, disk usage analysis, and attaching Google Persistent Disks to Linux VMs
- Networking fundamentals: TCP/IP, subnets (CIDR notation), routing tables, DNS resolution, NAT, and how packets flow through a Google Cloud VPC
- Linux networking tools: ip, ss, netstat, curl, wget, dig, nslookup, tcpdump, and nc β debugging connectivity in GCP VPC environments
- TLS/SSL: how certificates work, the certificate chain, and inspecting certificates with openssl β essential for Cloud Load Balancing and Cloud Run managed certificates
- SSH: key generation, SSH config files, OS Login for GCP VMs, and Identity-Aware Proxy (IAP) tunnel-based SSH without public IP addresses
- Google Cloud CLI (gcloud): installing, authenticating with service accounts and Application Default Credentials, scripting resource operations, and querying output with --format json and jq
- Cloud Shell: the browser-based Linux environment with gcloud, kubectl, Terraform, and Docker pre-installed β using it as a zero-setup development environment
- Git advanced workflows: rebasing, cherry-picking, reflog, and monorepo patterns β working with Cloud Source Repositories and GitHub
- Python for GCP automation: scripts with the google-cloud Python client libraries, the GCP REST API, subprocess, pathlib, and argparse for complex infrastructure automation
Week 2 β Docker & Containers in Depth
GCP is the most container-native cloud platform β Cloud Run, GKE, Cloud Build, and Artifact Registry all treat containers as first-class citizens. This week covers Docker from OS fundamentals to production-quality builds optimised for Google's container ecosystem.
- Container fundamentals: Linux namespaces, cgroups, and the kernel features that make containers possible β understanding what Cloud Run and GKE actually execute
- Docker architecture: Docker daemon, containerd, runc, image layers, and the OverlayFS union filesystem
- Writing production Dockerfiles: multi-stage builds, minimal base images (distroless β Google's own project, alpine, scratch), non-root users, and build cache optimisation
- Google Distroless images: Google's production-hardened base images β why they are the recommended base for GCP deployments and how to use them for Go, Java, Python, and Node.js
- Docker image security: scanning with Trivy and Google Artifact Analysis β continuous vulnerability scanning integrated into Artifact Registry
- Docker networking: bridge, host, overlay drivers β inter-container communication and Cloud Run sidecar patterns
- Docker volumes: bind mounts vs named volumes, and mapping to Cloud Filestore (NFS) for persistent container storage on GKE
- Docker Compose: multi-container local development stacks β health checks, depends_on, environment files, and profiles for GCP-mirrored local environments
- Google Artifact Registry: creating repositories (Docker, npm, Python, Maven, Go), pushing and pulling images, regional vs multi-regional repositories, and repository-level IAM permissions
- Artifact Registry vulnerability scanning: Google Artifact Analysis β Container Scanning API for OS and language package CVEs in stored images
- Container image tagging strategies: semver, Git SHA immutable tags, and GCP's recommended tagging conventions for Cloud Deploy promotion
- Cloud Build for Docker: building images in Cloud Build without a local Docker daemon β Dockerfile builds, kaniko for rootless builds, and layer caching with Cloud Storage
- Multi-platform builds: ARM64 + AMD64 images for GKE's T2A (Tau) Arm-based node pools using Docker buildx
Week 3 β Google Cloud Core Services & Infrastructure as Code with Terraform
GCP from a DevOps engineer's perspective β provisioning everything as code with Terraform, designing VPC topology correctly, and managing identity and access with the Google Cloud IAM model.
GCP fundamentals for DevOps:
- GCP resource hierarchy: organisation, folders, projects, and resources β how IAM policies inherit down the hierarchy and why project-level isolation matters
- Google Cloud VPC: global VPC with regional subnets, VPC peering, Shared VPC, Private Google Access, Cloud NAT, and VPC Flow Logs
- Firewall rules and firewall policies: ingress and egress rules, priority, target tags vs service accounts, and hierarchical firewall policies at org/folder level
- Compute Engine: machine types (N2, C3, T2D Arm), custom machine types, preemptible and Spot VMs, managed instance groups (MIGs), and startup scripts via metadata
- Cloud Load Balancing: Global HTTP(S) Load Balancer, Regional Load Balancer, TCP/UDP Load Balancer, Internal Load Balancer β backend services, health checks, and SSL certificates
- Cloud Storage: storage classes (Standard, Nearline, Coldline, Archive), IAM vs ACL, bucket policies, signed URLs, lifecycle rules, and Pub/Sub notifications on object events
- Cloud SQL: managed PostgreSQL and MySQL β instances, read replicas, high availability with regional persistent disk, Private Service Connection, and point-in-time recovery
- Cloud Memorystore: managed Redis and Valkey β tiers, read replicas, in-transit encryption, and Private Service Access
- Google Cloud IAM: service accounts, IAM roles (primitive, predefined, custom), Workload Identity Federation for external identity providers (GitHub Actions, AWS), and IAM Recommender for least-privilege enforcement
- Service account best practices: one service account per workload, service account impersonation, Workload Identity for GKE, and avoiding service account keys
- Cloud DNS: managed zones, record sets, private zones for internal VPC resolution, and DNS peering between VPCs
- Secret Manager: storing and rotating application secrets β secret versions, IAM-based access, and event-driven rotation with Cloud Functions
Terraform for GCP:
- Terraform with the Google provider: the google and google-beta providers β provider configuration, credentials, and project/region defaults
- Terraform state on GCP: Cloud Storage backend with object versioning and state locking via Cloud Storage object holds
- Google Cloud Foundation Toolkit: Google's opinionated Terraform modules for VPC, GKE, Cloud SQL, IAM, and project creation β the enterprise starting point
- Terraform modules for GCP: writing reusable VPC, GKE, and Cloud Run modules β the terraform-google-modules GitHub organisation
- Workspaces and environment promotion: dev β staging β production with separate state files and variable files per environment
- Provisioning a complete GCP environment with Terraform: organisation setup, folder hierarchy, project creation, Shared VPC, GKE cluster, Cloud SQL, Cloud Storage, and IAM bindings
- Config Connector: the Kubernetes-native alternative to Terraform for GCP β managing GCP resources as Kubernetes custom resources (covered as awareness, not primary path)
- Atlantis on GCP: team-based IaC with plan-on-PR and apply-on-merge β running Atlantis on Cloud Run or GKE
- Checkov and tfsec for GCP: static analysis of Terraform code for GCP security misconfigurations before applying
- Policy Controller (OPA-based): enforcing organisational policies on GCP resources at the project and folder level
Week 4 β CI/CD with Cloud Build, Cloud Deploy & GitHub Actions
The complete delivery pipeline: from code push to production on Cloud Run and GKE, automated, secure, and progressively delivered. Google's native CI/CD tooling β Cloud Build and Cloud Deploy β is covered alongside GitHub Actions for teams that prefer it.
Cloud Build in depth:
- Cloud Build architecture: build triggers, build configs (cloudbuild.yaml), build steps, build workers, and the build lifecycle
- Build triggers: push to branch, pull request, tag push, manual, and Pub/Sub-triggered builds from external events
- Build steps: the community builder images (gcr.io/cloud-builders/*), custom builders, and parallel step execution with waitFor
- Cloud Build substitutions: built-in variables (COMMIT_SHA, BRANCH_NAME, BUILD_ID) and user-defined substitutions for parameterised pipelines
- Private pools: dedicated build workers in a Customer VPC β for builds that need access to private resources without public internet exposure
- Build caching: Cloud Storage-backed layer caching for Docker builds β significantly reducing build times for large images
- Cloud Build IAM: the service account for builds, granting least-privilege access to Artifact Registry, Secret Manager, and GKE
- Security in Cloud Build: no inbound network, hermetic builds, and using Secret Manager for sensitive build-time variables
Cloud Deploy β continuous delivery:
- Cloud Deploy architecture: delivery pipelines, targets (GKE, Cloud Run, Anthos), releases, and rollouts
- Promotion workflow: promoting a release through dev β staging β production with manual approval gates
- Cloud Deploy with Cloud Run: creating releases, deploying to Cloud Run services, and progressive traffic splitting
- Cloud Deploy with GKE: Skaffold-based rendering, Helm and Kustomize manifest support, and GKE rollout verification
- Rollback in Cloud Deploy: automatic rollback on failed deployment verification and manual rollback to previous releases
- Canary deployments with Cloud Deploy: phased rollout with traffic percentage targets and automated verification
GitHub Actions for GCP:
- Workload Identity Federation from GitHub Actions to GCP: keyless authentication β no service account keys stored in GitHub secrets
- GCP-specific GitHub Actions: google-github-actions/auth, google-github-actions/setup-gcloud, google-github-actions/deploy-cloudrun, and google-github-actions/get-gke-credentials
- Complete CI/CD pipeline with GitHub Actions: push β Trivy scan β Cloud Build trigger or Docker build β push to Artifact Registry β Terraform plan β Cloud Deploy release β smoke test β Slack notification
- Reusable workflows for GCP: building a shared GitHub Actions workflow library for Cloud Run and GKE deployments
- Self-hosted GitHub Actions runners on GCP: Compute Engine VMs or GKE-based runners with Workload Identity β private network access to Cloud SQL and internal services
Week 5 β Observability, Security Basics & Cost Management
The three disciplines that define a working GCP DevOps engineer's day: understanding system behaviour with Cloud Operations Suite, keeping workloads secure, and governing an often-surprising cloud bill.
Observability with Cloud Operations Suite:
- Cloud Operations Suite overview: Cloud Monitoring, Cloud Logging, Cloud Trace, Cloud Profiler, and Error Reporting β the integrated observability platform
- Cloud Logging: log sinks, log buckets, log-based metrics, log exclusions, and the Logs Explorer β querying with the Logging Query Language
- Cloud Monitoring: workspace setup, metrics explorer, monitored resources, and the 6,000+ built-in metrics from GCP services
- Cloud Monitoring dashboards: building operational dashboards with metrics, log panels, and SLO widgets
- Alerting policies: metric threshold alerts, log-based alerts, uptime checks, and notification channels (email, PagerDuty, Slack via Pub/Sub webhook)
- Cloud Trace: distributed tracing for Cloud Run, GKE, and App Engine β trace waterfall analysis and latency percentiles
- Error Reporting: automatic grouping of application errors β alerting on new error classes and spike detection
- Cloud Profiler: continuous CPU and memory profiling in production β flame graph analysis without performance overhead
- OpenTelemetry with GCP: the Google Cloud OpenTelemetry exporter β sending OTel traces, metrics, and logs to Cloud Monitoring and Cloud Trace
- Structured logging best practices: using Google Cloud's structured logging format β JSON with severity, trace, spanId, and httpRequest fields for automatic log enrichment
GCP security fundamentals:
- Secret Manager in production: secret rotation with Cloud Functions, automatic expiry, audit logging in Cloud Audit Logs, and application integration with Workload Identity
- Workload Identity Federation: eliminating service account keys for GitHub Actions, GitLab CI, AWS Lambda, and Azure workloads accessing GCP resources
- VPC Service Controls: creating service perimeters around GCP APIs to prevent data exfiltration β protecting BigQuery, Cloud Storage, and Secret Manager from outside the perimeter
- Binary Authorization: enforcing that only signed, approved container images are deployed to GKE and Cloud Run β policy-based admission control
- Security Command Center (SCC): the centralised security and risk platform β finding misconfigurations, vulnerabilities, and threats across the GCP organisation
- Cloud Audit Logs: Admin Activity logs, Data Access logs, System Event logs, and Policy Denied logs β configuring retention and exporting to Cloud Storage and BigQuery
- Identity-Aware Proxy (IAP): zero-trust access to Cloud Run services, GKE applications, and Compute Engine VMs β enforcing Google Identity authentication without a VPN
- Google Cloud Armor: web application firewall for Cloud Load Balancing β OWASP managed rule groups, rate limiting, and geo-based blocking
- Shift-left security in Cloud Build and GitHub Actions: Trivy (containers), Checkov (Terraform), Semgrep (SAST), and the Google Cloud container scanning integration
GCP cost management:
- Cloud Billing: billing accounts, billing exports to BigQuery, and the Google Cloud Pricing Calculator
- Labels and tags for cost allocation: mandatory label policies enforced via Organisation Policy β environment, team, project, and cost-centre labels
- Budget alerts: setting spend budgets with alert thresholds and Pub/Sub notifications for automated cost control actions
- Recommender and Active Assist: idle VM recommendations, right-sizing suggestions, unused IP address cleanup, and IAM role recommendations
- Committed Use Discounts (CUDs): 1-year and 3-year resource commitments for Compute Engine and Cloud SQL β 37β55% savings
- Spot VMs: using preemptible/Spot instances for CI/CD builds, batch jobs, and fault-tolerant stateless workloads β 60β91% savings with proper interruption handling
- Cloud Run scale-to-zero: eliminating idle costs for non-production environments β the biggest cost advantage of serverless containers
- Cloud Storage cost optimisation: lifecycle rules for automatic tier transitions (Standard β Nearline β Coldline β Archive), Object Versioning costs, and egress cost awareness
- Network egress pricing: understanding inter-region, inter-zone, and internet egress costs β the most consistently underestimated GCP bill item
- BigQuery cost control: on-demand vs capacity pricing, slot reservations, query cost estimation with dry run, and partitioned/clustered tables to reduce bytes processed
π Advanced Add-On Tracks
(Each track is 2β3 weeks β additional fee per track. Can be taken in any order after completing the core program.)
Advanced Track 1: Kubernetes & GKE (3 weeks)
GKE is where Kubernetes was born and where it is most mature. Three weeks are dedicated to covering it properly β from Kubernetes fundamentals through GKE-specific operations to the GitOps delivery layer that production GKE clusters depend on.
Week 1 β Kubernetes fundamentals:
- Kubernetes architecture: control plane components (API server, etcd, scheduler, controller manager) and data plane (kubelet, kube-proxy, container runtime) β GKE manages the control plane; you manage the data plane
- Core workload objects: Pods, ReplicaSets, Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs β when each is appropriate
- Services: ClusterIP, NodePort, LoadBalancer (backed by Google Cloud Load Balancer), and ExternalName β DNS-based discovery with CoreDNS
- Ingress and Ingress Controllers: GKE Ingress (backed by Google Cloud HTTP(S) LB), NGINX Ingress Controller, and Gateway API β TLS with Google-managed certificates
- ConfigMaps and Secrets: baseline configuration injection β the foundation before using Secret Manager CSI integration
- Persistent Volumes: PV, PVC, StorageClass β the Compute Engine Persistent Disk CSI driver and Filestore CSI driver for RWX workloads
- Namespaces and RBAC: isolating teams and workloads, ClusterRoles vs Roles, and binding Google identities to Kubernetes roles via GKE RBAC
- Resource requests and limits: LimitRanges, ResourceQuotas, and the three QoS classes (Guaranteed, Burstable, BestEffort)
- Health checks: liveness, readiness, and startup probes β writing probes that work reliably with Cloud Load Balancing health checks
- Pod scheduling: nodeSelector, node affinity/anti-affinity, pod topology spread constraints, taints and tolerations, and PodDisruptionBudgets
- Rolling updates and rollbacks: Deployment strategy, maxUnavailable, maxSurge, and progressive delivery
Week 2 β GKE in production:
- GKE cluster modes: Standard vs Autopilot β when each is appropriate, cost model differences, and workload constraints in Autopilot
- GKE Autopilot: the fully managed Kubernetes mode β no node management, pod-based billing, and built-in security hardening
- GKE cluster provisioning with Terraform: the google_container_cluster resource, node pool configuration, network config, and the GKE Terraform module
- GKE networking: VPC-native clusters, alias IP ranges, Pod CIDR planning, GKE Dataplane V2 (eBPF-based), and Network Policy enforcement
- Workload Identity for GKE: binding Kubernetes ServiceAccounts to Google service accounts β the replacement for service account key files mounted in pods
- GKE node pools: standard, Spot, Arm (T2A), and GPU node pools β when and how to use each
- GKE cluster autoscaling: the Cluster Autoscaler and Node Auto-Provisioning (NAP) β letting GKE manage node pools automatically
- Vertical Pod Autoscaler (VPA) and Horizontal Pod Autoscaler (HPA): right-sizing pod resources and scaling on custom metrics with Cloud Monitoring
- GKE cluster upgrades: release channels (Rapid, Regular, Stable), auto-upgrade, maintenance windows, and blue/green node pool upgrades
- Secret Manager CSI driver for GKE: mounting Secret Manager secrets as files or environment variables in pods β no Kubernetes Secrets needed
- GKE observability: Google Managed Prometheus, Cloud Monitoring Container Insights, GKE dashboard, and Workload metrics
- GKE security hardening: Shielded GKE nodes, Confidential GKE nodes, Binary Authorization admission control, and Workload Identity enforcement
Week 3 β GitOps and advanced delivery on GKE:
- Helm: packaging Kubernetes applications β writing charts, values files, hooks, and using Helm with Cloud Deploy
- Kustomize: environment-specific overlays β base + overlays for dev/staging/prod GKE clusters
- Config Sync (Flux-based): the GKE Enterprise GitOps operator β syncing from Git, OCI registries, and Cloud Storage
- ArgoCD on GKE: the community GitOps operator β Applications, ApplicationSets, the App of Apps pattern, and multi-cluster management
- Cloud Deploy with GKE: the managed delivery pipeline β Skaffold rendering, release promotion, and GKE rollout verification with custom metrics
- Argo Rollouts on GKE: canary and blue/green deployments with Cloud Monitoring-backed automated analysis
- Config Controller: the hosted Config Connector and Policy Controller β managing GCP resources and enforcing policies from Git
- Anthos Service Mesh (ASM): the managed Istio service mesh for GKE β mTLS, traffic management, Kiali observability, and cross-cluster service mesh
- Multi-cluster GKE: GKE Fleet management β registering clusters, multi-cluster Ingress, and multi-cluster Services for global load balancing
Advanced Track 2: GCP Security & Compliance Engineering (2 weeks)
Google Cloud has one of the most sophisticated security models of any cloud platform β built on BeyondProd zero-trust principles. This track covers GCP's security tooling, DevSecOps automation, and compliance frameworks that regulated enterprise environments require.
Week 1 β GCP security services in depth:
- Security Command Center (SCC) Premium: security findings, compliance dashboards (CIS, PCI-DSS, NIST), threat detection, and automated finding export to Pub/Sub for SIEM integration
- Chronicle SIEM: Google's cloud-native security operations platform β ingesting GCP logs, detecting threats with YARA-L rules, and automated response with SOAR playbooks
- VPC Service Controls in depth: service perimeters, access levels, access policies, and bridging perimeters for controlled data sharing
- Cloud KMS: customer-managed encryption keys (CMEK) for Cloud Storage, BigQuery, GKE, Cloud SQL, and Secret Manager β key rings, key versions, key rotation, and Cloud HSM
- Cloud Armor Advanced: advanced WAF β custom rules with CEL expressions, adaptive protection (ML-based DDoS mitigation), and named IP lists for allow/deny
- Cloud IDS (Intrusion Detection System): network threat detection using Palo Alto Networks signatures β detecting malware, spyware, and C2 traffic in VPC
- Assured Workloads: compliance boundaries for regulated workloads (FedRAMP, ITAR, IL4) β data residency controls and organisation policy constraints
- Organisation Policy Service: enforcing constraints across the GCP organisation β restricting public IPs, disabling service account key creation, requiring OS Login, and enforcing uniform bucket-level access
- Access Transparency and Access Approvals: visibility into Google personnel access to your data and requiring explicit customer approval
Week 2 β DevSecOps pipeline and compliance automation:
- Shift-left security in Cloud Build and GitHub Actions: Trivy (containers), Checkov (Terraform), Semgrep (SAST), OSV-Scanner (open source vulnerability β Google's tool), and licence compliance scanning
- Software Bill of Materials (SBOM): generating SBOMs with Syft and attesting them in Artifact Registry with Google Cloud's attestation framework
- Binary Authorization in depth: attestors, attestation authorities, Cloud KMS-signed attestations, and enforcing image signing policies in Cloud Deploy and GKE admission control
- SLSA (Supply chain Levels for Software Artifacts): Google's own framework β SLSA provenance from Cloud Build, verifying provenance in Binary Authorization
- Container Analysis and Artifact Registry scanning: the on-push and continuous scanning APIs β writing Pub/Sub-triggered remediation functions for newly discovered CVEs
- Terraform Sentinel and Policy Controller for GCP: enforcing organisation-wide infrastructure standards β no public Cloud Storage buckets, all VMs require OS Login, and all GKE clusters must have Binary Authorization
- GCP compliance frameworks: CIS Google Cloud Foundations Benchmark, PCI-DSS, ISO 27001, and SOC 2 β mapping controls to GCP services with SCC compliance dashboards
- Data Loss Prevention (Cloud DLP): discovering and redacting sensitive data (PII, credentials, PHI) in Cloud Storage, BigQuery, and Datastore β using DLP inspection in CI/CD pipelines
- Incident response on GCP: isolating compromised resources (firewall rule quarantine, service account key revocation, project shutdown), forensic investigation with Cloud Audit Logs and SCC findings, and automated runbooks with Cloud Functions
Advanced Track 3: Serverless & Event-Driven Architecture on GCP (2 weeks)
GCP has the most elegant serverless ecosystem of any major cloud β Cloud Run is the industry's best serverless container platform, Cloud Functions integrates seamlessly with every GCP event source, and Pub/Sub powers event-driven architectures at YouTube scale.
- Cloud Run deep dive: the execution model, concurrency model (up to 1,000 concurrent requests per instance), CPU allocation options (request-only vs always-on), and min/max instance configuration
- Cloud Run revisions and traffic splitting: progressive rollout, canary releases, blue/green deployments, and rollback β entirely managed without Kubernetes
- Cloud Run jobs: running batch and scheduled workloads as containers β task parallelism, index-based job arrays, and job scheduling with Cloud Scheduler
- Cloud Run sidecars: the multi-container Cloud Run model β running a sidecar proxy, log shipper, or agent alongside the main container
- Cloud Run with VPC: Direct VPC egress and VPC Access Connector β accessing Cloud SQL, Memorystore, and internal services from Cloud Run without public internet
- Cloud Functions (2nd gen): the Cloud Run-based Functions runtime β event-driven functions backed by Eventarc, longer timeouts, larger instances, and concurrency support
- Eventarc: the unified eventing platform β routing events from GCP services, Pub/Sub, and third-party webhooks to Cloud Run and Cloud Functions
- Cloud Pub/Sub: the globally distributed message bus β topics, subscriptions (pull and push), dead-letter topics, filtering, message ordering, and exactly-once delivery
- Cloud Tasks: managed asynchronous task queues β HTTP task targets (Cloud Run, Cloud Functions), task deduplication, scheduling, and rate limiting
- Cloud Scheduler: managed cron jobs β triggering Pub/Sub, HTTP endpoints (Cloud Run), and Cloud Functions on a schedule
- Firebase (Realtime and Firestore): the mobile and web backend β real-time database, NoSQL document store, Firebase Auth, and Cloud Functions integration for serverless backend logic
- Cloud Endpoints and API Gateway: managed API gateways for Cloud Run and Cloud Functions β authentication, rate limiting, monitoring, and OpenAPI spec-based configuration
- Event-driven patterns on GCP: choreography with Pub/Sub + Eventarc, orchestration with Cloud Workflows, the transactional outbox pattern using Datastore + Pub/Sub, and fan-out with multi-subscription topics
- Cloud Workflows: the serverless orchestration service β defining multi-step workflows in YAML/JSON that call GCP APIs, HTTP endpoints, and Cloud Functions with error handling and retry logic
Advanced Track 4: Platform Engineering & Advanced IaC on GCP (2 weeks)
Platform Engineering on GCP means building the foundation that development teams self-service from β project provisioning, environment creation, policy enforcement, and the developer portal that surfaces it all through a single interface.
- Platform Engineering principles for GCP: Google's Site Reliability Engineering culture applied to internal platform teams β golden paths, golden images, and measuring platform effectiveness with DORA metrics
- GCP Organisation structure for enterprise: management group hierarchy with folders (BU, team, environment), Shared VPC host project pattern, and the Google Cloud Enterprise Foundation blueprint
- Project factory with Terraform: automating project creation with standard VPC, IAM bindings, API enablement, and budget configuration β the google_project Terraform resource at scale
- Cloud Foundation Toolkit (CFT): Google's opinionated Terraform module library β the project-factory, vpc, gke, iam-member-pairs, and forseti-security modules
- Fabric FAST: the Google Cloud FAST (Foundation Accelerator using Stages with Terraform) framework β a modular, stage-based approach to deploying a GCP foundation
- Backstage with GCP: the internal developer portal β Google Cloud plugin, GKE plugin, Cloud Run plugin, and GCP resource entity provider for service catalogue integration
- Google Cloud Marketplace and Service Catalog: curating approved solutions that developers can deploy without direct IAM permissions β Terraform-backed deployments from the catalogue
- Config Connector advanced: managing the full GCP resource hierarchy as Kubernetes resources β organisational policy, IAM, and networking controlled from Git via Config Sync
- Terraform advanced patterns for GCP: the google_project_service resource for API enablement, provider aliasing for multi-project deployments, and testing Terraform modules with Terratest against real GCP projects
- Organisation Policy advanced: custom constraints using CEL expressions, policy simulation, and programmatic policy management with Terraform
- GCP resource management at scale: resource tagging strategy, label enforcement via Organisation Policy, and managing 100+ projects with a hub-spoke Shared VPC topology
- FinOps on GCP: BigQuery billing exports, Data Studio / Looker Studio cost dashboards, committed use discount optimisation, and the GCP FinOps Hub recommendations
Advanced Track 5: Data Engineering & ML Infrastructure on GCP (2 weeks)
GCP's data and ML platform is the most complete of any cloud β BigQuery, Vertex AI, Dataflow, Pub/Sub, Dataproc, and Looker form an integrated stack that powers the world's most data-intensive applications. This track covers the infrastructure engineering side of operating these services in production.
- BigQuery infrastructure: dataset and table organisation, column-level access control, row-level security with row access policies, and partitioned and clustered table design for cost control
- BigQuery slot management: on-demand vs capacity pricing, slot reservations (editions), reservation assignments per project and folder, and monitoring slot utilisation
- Cloud Dataflow infrastructure: the managed Apache Beam service β worker pools, autoscaling, Streaming Engine for real-time pipelines, and Dataflow templates for reusable pipeline deployment
- Cloud Pub/Sub as a data pipeline backbone: exactly-once delivery, Dataflow-Pub/Sub integration, dead-letter topics, and message schema registry with Pub/Sub Schema
- Cloud Dataproc: managed Apache Spark and Hadoop β cluster types (standard, high-availability, single-node), autoscaling policies, Dataproc Serverless for ephemeral Spark jobs, and Dataproc Metastore
- Cloud Composer (managed Apache Airflow): environment infrastructure β Airflow version, environment size, autoscaling workers, VPC-native networking, and DAG deployment from Cloud Storage
- Vertex AI platform infrastructure: the managed ML platform β Workbench (managed Jupyter), custom training jobs (CPU and GPU), Training Pipelines, and the Model Registry
- Vertex AI compute: N1/N2/A2/A3 GPU instances for training (T4, V100, A100, H100), Spot VM training jobs, and TPU pods for large-scale model training
- Vertex AI model serving: Dedicated Endpoints vs Serverless Prediction β traffic splitting for A/B testing models, autoscaling, and Vertex AI Explainability
- Vertex AI Pipelines: Kubeflow Pipelines on Vertex β building reproducible ML pipelines with component caching, artifact lineage, and CICD integration
- Vertex AI Feature Store: managed feature serving β online store for low-latency retrieval, offline store for training, and feature monitoring
- MLOps CI/CD on GCP: triggering Vertex AI Pipeline runs from Cloud Build, model evaluation gates, automatic deployment to Vertex AI Endpoints on approval, and monitoring model drift with Vertex AI Model Monitoring
- Data platform security: VPC Service Controls around BigQuery and Vertex AI, column-level encryption with Cloud DLP, and BigQuery authorized views for row-level access
- Data infrastructure cost control: BigQuery slot and storage cost analysis with BigQuery system tables, Dataflow job cost tracking, and Vertex AI training cost optimisation with Spot VMs
Advanced Track 6: SRE & Advanced Observability on GCP (2 weeks)
Site Reliability Engineering was invented at Google β and GCP's observability and reliability tooling reflects that heritage. This track applies Google's own SRE principles to operating GCP infrastructure, with the tooling Google built to do it.
Week 1 β Advanced observability stack:
- Google Managed Prometheus: the fully managed Prometheus-compatible metrics backend β scraping GKE workloads, rules evaluation, and Grafana integration with no infrastructure to operate
- Cloud Monitoring managed service for Prometheus: the GCP-native path for storing and querying Prometheus metrics β differences from self-hosted Prometheus and cost model
- Grafana on GCP: Google Cloud Managed Grafana vs self-hosted Grafana on Cloud Run β connecting Cloud Monitoring, Google Managed Prometheus, Cloud Logging, and Cloud Trace as data sources
- Cloud Trace advanced: trace sampling configuration, custom span attributes, trace-log correlation, and analysing p99 latency across Cloud Run and GKE services
- Cloud Profiler in production: continuous profiling for Go, Java, Python, and Node.js β identifying hot functions and memory allocation patterns without overhead
- OpenTelemetry Collector on GCP: deploying the OTel Collector as a GKE DaemonSet or Cloud Run sidecar β receivers (OTLP, Prometheus, Jaeger), processors (batch, filter, resource), and exporters (Cloud Monitoring, Cloud Trace, Cloud Logging)
- Log-based metrics and alerting: creating distribution metrics from structured log fields β latency histograms, error rate metrics, and alerting policies backed by log data
- Cloud Monitoring custom dashboards: using MQL (Monitoring Query Language) for advanced metric expressions, ratio metrics, and SLO-based widgets
Week 2 β SRE practice on GCP:
- SLIs, SLOs, and error budgets on GCP: defining SLIs from Cloud Monitoring metrics β the Cloud Monitoring SLO API for creating request-based and window-based SLOs natively
- Cloud Monitoring SLO dashboard: tracking error budget burn rate, burn rate alerts (fast burn + slow burn), and alerting before the budget is exhausted
- Multi-window multi-burn-rate alerting: implementing the Google SRE Book's recommended alerting strategy using Cloud Monitoring alerting policies
- Chaos engineering on GCP: the absence of a managed chaos service (unlike AWS FIS) β building controlled failure experiments with Cloud Functions, Pub/Sub triggers, and the GCP APIs to simulate: Compute Engine VM termination, Cloud SQL failover, Cloud Run instance scaling-to-zero, and Pub/Sub message delivery delays
- Resilience testing: validating Cloud SQL HA failover (regional persistent disk switchover), Cloud Run health check recovery, GKE node pool disruption handling, and multi-region load balancer failover
- Google Cloud Functions for operational automation: auto-remediating SCC findings, auto-stopping idle Compute Engine VMs, cleaning up orphaned Persistent Disks, and rotating credentials on schedule
- Pub/Sub for operational event streams: connecting Cloud Audit Logs β Pub/Sub β Cloud Functions for real-time, event-driven operational automation
- Toil reduction on GCP: identifying repetitive manual operations and automating them with Cloud Functions, Cloud Workflows, and Eventarc-triggered pipelines
- Blameless post-mortems for GCP incidents: reconstructing timelines using Cloud Audit Logs, Cloud Monitoring incident history, and SCC findings β action item tracking and GCP support case analysis
- Capacity planning on GCP: using Cloud Monitoring metrics trends, Committed Use Discount analyser, and Recommender right-sizing suggestions to plan ahead of traffic growth
- Google Cloud Status and incident communication: integrating Google Cloud Status RSS feed with Slack, setting up personalised incident notifications, and embedding GCP service health in SRE dashboards
π Schedule & Timings
Choose one group only based on your availability. Max 5 candidates per group to ensure individual attention and hands-on lab support.
Weekday Groups:
- Group 1: MonβWed, 10 AM β 1 PM
- Group 2: MonβWed, 4 PM β 7 PM
Weekend Groups:
- Group 3: Sat & Sun, 10 AM β 2 PM
- Group 4: Sat & Sun, 4 PM β 8 PM
π Location: In-house training in Islamabad
π± Online option may be arranged for out-of-city participants
π οΈ Core Program Tools & Technologies
- OS & Scripting: Ubuntu Linux, Bash, gcloud CLI, Cloud Shell, Python (google-cloud SDK)
- Containers: Docker, Docker Compose, Artifact Registry, Trivy, Google Distroless, Cloud Build
- GCP Services: VPC, Firewall, Compute Engine/MIG, Cloud Load Balancing, Cloud Storage, Cloud SQL, Memorystore, IAM, Workload Identity Federation, Cloud DNS, Secret Manager
- IaC: Terraform (google provider), Cloud Foundation Toolkit modules, Checkov, Atlantis
- CI/CD: Cloud Build, Cloud Deploy, GitHub Actions (OIDC to GCP, google-github-actions/*)
- Observability: Cloud Monitoring, Cloud Logging, Cloud Trace, Error Reporting, Cloud Profiler, OpenTelemetry
- Security: Secret Manager, Workload Identity Federation, Binary Authorization, Cloud Armor, VPC Service Controls, Security Command Center, Organisation Policy
- Cost: Cloud Billing, BigQuery billing export, Budget alerts, Recommender, Committed Use Discounts
π Advanced Track Summary
- Track 1: Kubernetes & GKE β 3 weeks (Standard vs Autopilot, Workload Identity, Config Sync, Cloud Deploy, Anthos Service Mesh, Fleet)
- Track 2: GCP Security & Compliance Engineering β 2 weeks (SCC Premium, Chronicle, Binary Authorization, SLSA, Cloud DLP, Assured Workloads)
- Track 3: Serverless & Event-Driven Architecture β 2 weeks (Cloud Run, Cloud Functions 2nd gen, Eventarc, Pub/Sub, Cloud Workflows, Firebase)
- Track 4: Platform Engineering & Advanced IaC β 2 weeks (Fabric FAST, project factory, Config Connector, Backstage, Organisation Policy)
- Track 5: Data Engineering & ML Infrastructure β 2 weeks (BigQuery, Dataflow, Pub/Sub, Dataproc, Composer, Vertex AI, MLOps pipelines)
- Track 6: SRE & Advanced Observability β 2 weeks (Managed Prometheus, Grafana, Cloud Trace, Profiler, SLO API, chaos engineering, toil automation)
π― GCP Certifications Aligned
- Core program: Google Cloud Associate Cloud Engineer (ACE)
- Track 1 (GKE): Kubernetes CKA (Certified Kubernetes Administrator), Google Cloud Professional Cloud Developer
- Track 2 (Security): Google Cloud Professional Cloud Security Engineer
- Track 3 (Serverless): Google Cloud Professional Cloud Developer
- Track 4 (Platform): Google Cloud Professional Cloud DevOps Engineer, Google Cloud Professional Cloud Architect
- Track 5 (Data/ML): Google Cloud Professional Data Engineer, Google Cloud Professional Machine Learning Engineer
- Track 6 (SRE): Google Cloud Professional Cloud DevOps Engineer
β Prerequisites
- Comfortable using the Linux command line (navigating, editing files, running commands)
- Basic understanding of how web applications and HTTP work
- Familiar with at least one scripting language (Bash or Python preferred)
- Git basics: clone, commit, push, pull, branch
- No prior GCP or DevOps experience required for the core program
π― Who This Is For
- Developers and system administrators transitioning into GCP cloud engineering and DevOps roles
- Data engineers and ML engineers who want to own the infrastructure layer of their data and AI pipelines
- Engineers targeting remote roles at product companies and AI-native startups that run on GCP
- DevOps engineers already experienced on AWS or Azure who want to add GCP expertise
- Anyone pursuing the Google Cloud Associate Cloud Engineer, Professional Cloud DevOps Engineer, or Professional Cloud Architect certifications
π³ Course Fee & Booking
- β Core Program Duration: 4β5 Weeks
- β Each Advanced Track: 2β3 additional weeks (additional fee per track)
- π¦ Available Advanced Tracks: Kubernetes & GKE Β· Security & Compliance Β· Serverless & Event-Driven Β· Platform Engineering Β· Data Engineering & ML Infrastructure Β· SRE & Advanced Observability
- π Seats: 5 only per group