AI & Data Series · 🔥 High Demand · Standalone or Add-On

AI Engineering Track
— Islamabad 2026

There is a significant difference between using AI and building AI systems. This track trains you to be on the engineering side of that divide — designing RAG pipelines, orchestrating LLM agents, managing vector databases, evaluating model outputs, and deploying AI features that work reliably in production. Python throughout. Integrates with any Full-Stack 2.0 backend.

Duration
5 Weeks
👥
Seats / Batch
5 Maximum
🔧
Language
Python First
📍
Location
Islamabad + Online
🔗
Mode
Standalone or Add-On
Core Stack 🤖 OpenAI API 🧠 Anthropic Claude ♊ Google Gemini 📚 LangChain 🦙 LlamaIndex 🕸️ LangGraph 🗄️ pgvector 📌 Pinecone 🐍 Python ⚡ FastAPI ☁️ AWS Bedrock 🔍 RAGAS

🎓 Program Overview

There is a significant difference between using AI and building AI systems. Millions of developers now use ChatGPT and GitHub Copilot daily — but only a fraction can build the production systems that power those experiences: the retrieval pipelines, the agent architectures, the evaluation frameworks, the cost-optimised inference layers, and the monitoring systems that keep AI features working reliably at scale.

This track trains you to be on the engineering side of that divide. You will work with the OpenAI, Anthropic, and Google Gemini APIs not as a user but as a builder — designing RAG pipelines, orchestrating multi-step LLM agents, managing vector databases, evaluating model outputs programmatically, and deploying AI features that behave predictably in production.

🔀 Using AI vs. Building AI Systems — This Track Is About Building
Using AI (what most people do)
Prompt ChatGPT / Copilot
Use AI tools day-to-day
Copy-paste AI output
Call an API with a simple prompt
Building AI Systems (this track)
Design RAG pipelines for private knowledge
Build LLM agents with tool use and memory
Evaluate model outputs programmatically
Deploy AI features that work at production scale

💡 Why AI Engineering in 2026

Every software product is being rebuilt with AI features — search, summarisation, recommendations, assistants, automation — and companies need engineers who can build them properly
Prompt engineering alone is not enough — companies need the full stack: retrieval, context management, evaluation, safety, and deployment
AI engineers command among the highest salaries in software engineering globally — the skills gap is enormous and will not close quickly
This skillset is framework-agnostic — it layers directly on top of your existing backend knowledge in Python, Node.js, Go, Laravel, or any other stack
Pakistan's IT export sector is seeing rapid growth in demand for AI engineering skills from international clients
This track is designed to be taken standalone or as a direct add-on to any Full-Stack 2.0 program — no prior AI/ML experience required

📚 Curriculum — 8 Phases + Capstone

0
Week 1
LLM Foundations for Engineers
9 topics · Mental models before APIs

Before calling a single API, you need to understand what language models actually are, how they work at a systems level, and what their capabilities and failure modes look like in production. This phase gives engineers the mental model needed to make good architectural decisions throughout the entire course.

  1. How large language models work: tokens, embeddings, attention, and the transformer architecture — explained for engineers, not researchers
  2. Tokenisation in practice: how text becomes tokens, why token counts matter for cost and context limits, and how to measure them with tiktoken
  3. Context windows: what they are, how they constrain system design, and current limits across GPT-4o, Claude 3.5, and Gemini 1.5 Pro
  4. Temperature, top-p, and sampling parameters: what they control and how to set them for different use cases
  5. LLM failure modes engineers must understand: hallucination, context loss, sycophancy, prompt injection, and positional bias
  6. Model comparison: GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro vs open-source (Llama 3, Mistral) — capabilities, pricing, and when to use each
  7. Open-source vs proprietary models: self-hosted inference with Ollama and vLLM vs API-based models
  8. Cost modelling: estimating and controlling LLM API spend at scale — token budgeting, caching, and model tiering strategies
  9. Setting up the AI engineering environment: Python, API keys, environment variable management, and rate limit handling
1
Week 1–2
LLM APIs & Prompt Engineering for Production
19 topics across 3 providers
+

Working directly with OpenAI, Anthropic, and Google Gemini — and the prompt engineering techniques that make the difference between a toy prototype and a reliable production feature.

API Fundamentals — All Three Providers
  1. OpenAI API: chat completions, function calling, structured outputs, vision, and streaming with the Python SDK
  2. Anthropic API: messages API, system prompts, tool use, vision, and extended thinking with Claude
  3. Google Gemini API: multimodal inputs, long context, grounding, and the Gemini Python SDK
  4. Streaming responses: handling token-by-token output in APIs and surfacing it to users in real time
  5. Structured output: forcing models to return valid JSON using OpenAI structured outputs, Anthropic tool use, and the Instructor library
  6. Vision and multimodal inputs: sending images, PDFs, and documents to LLM APIs for analysis
  7. Batch API: processing thousands of requests asynchronously at lower cost with OpenAI Batch and Anthropic Batch
  8. Rate limiting and retry logic: exponential backoff, request queuing, and graceful degradation
  9. Provider abstraction: building a unified LLM client that can swap providers without rewriting application logic
Prompt Engineering for Engineers
  1. System prompts: writing effective prompts that define persona, behaviour, output format, and constraints
  2. Few-shot prompting: selecting and formatting examples that steer model behaviour reliably
  3. Chain-of-thought: making models reason step by step before producing output
  4. XML and structured prompt formatting: Anthropic's recommended approach for complex prompts
  5. Prompt templating: building dynamic prompts from user input and context using Jinja2 and f-strings
  6. Output formatting control: requesting JSON, markdown, tables, and code blocks reliably
  7. Prompt versioning: treating prompts as code — version control and A/B testing
  8. Prompt injection: understanding attack vectors and how to defend against them
  9. Context window management: summarisation, truncation, and prioritisation strategies
  10. Instruction following: writing prompts that models actually follow — specificity, positive framing
2
Week 2
Embeddings & Vector Databases
16 topics · The foundation of RAG
+

Embeddings are the foundation of semantic search, RAG pipelines, recommendation systems, and clustering. This phase covers generating embeddings and storing/querying them at scale using production vector databases.

Embeddings
  1. What embeddings are: converting text, images, and data into high-dimensional vectors that encode semantic meaning
  2. Embedding models: OpenAI text-embedding-3 (small/large), Cohere Embed v3, and open-source alternatives (sentence-transformers, BGE, E5)
  3. Embedding dimensions and model selection: accuracy vs cost vs latency trade-offs
  4. Similarity metrics: cosine similarity, dot product, and Euclidean distance — when each applies
  5. Batching embedding requests: efficient bulk generation for large document corpora
  6. Multimodal embeddings: text, images, and code — CLIP and OpenAI vision embeddings
  7. Embedding drift: how model updates can change embedding spaces and break existing indexes
Vector Databases
  1. pgvector: vector similarity search in PostgreSQL — HNSW vs IVFFlat indexing, Eloquent-friendly queries
  2. Pinecone: managed vector database — indexes, namespaces, metadata filtering, and hybrid search
  3. Qdrant: open-source vector database — collections, payload filtering, and self-hosted deployment
  4. Choosing a vector database: decision framework based on scale, cost, latency, and infrastructure
  5. Hybrid search: combining dense vector search with sparse BM25 keyword search for better retrieval
  6. Metadata filtering: narrowing searches by document type, date, user, tenant, or structured fields
  7. Vector index performance: HNSW graph construction, ef_construction, and recall/latency trade-offs
  8. Re-ranking: using cross-encoders (Cohere Rerank, Voyage Rerank) to improve retrieval precision
  9. Amazon OpenSearch with vector engine: AWS-native vector search alternative for AWS deployments
3
Week 2–3
RAG — Retrieval-Augmented Generation
19 topics · Basic to advanced production patterns
+

RAG is the most important pattern in production AI engineering — it solves the core limitations of LLMs (outdated training data, hallucination, private knowledge) by retrieving relevant context at inference time. This phase covers RAG from basic implementation through to advanced production patterns.

RAG Fundamentals
  1. Why RAG: the problem it solves, when to use it, and when fine-tuning is a better answer
  2. The basic RAG pipeline: ingest → chunk → embed → store → retrieve → augment → generate
  3. Document ingestion: loading PDFs, Word docs, web pages, Notion, and databases with LangChain loaders and LlamaIndex readers
  4. Text chunking strategies: fixed-size, recursive character splitting, semantic chunking, document-structure-aware
  5. Chunk size and overlap: how they affect retrieval quality and what to tune for different document types
  6. Metadata enrichment: adding source, page number, section headers, and timestamps to chunks
  7. Embedding and indexing: bulk ingestion pipelines with progress tracking and error handling
  8. Query embedding and similarity search: retrieving the top-k most relevant chunks
  9. Context assembly: formatting retrieved chunks into a coherent prompt context block
  10. Source attribution: citing which documents the answer was drawn from
Advanced RAG Patterns
  1. Query transformation: rewriting user queries with an LLM before retrieval to improve recall
  2. HyDE (Hypothetical Document Embeddings): generating a hypothetical answer and using it as the retrieval query
  3. Multi-query retrieval: generating multiple query variants and merging their results
  4. Parent-child chunking: indexing small child chunks for precision, retrieving larger parent context
  5. Contextual compression: extracting only the relevant portion of a retrieved chunk
  6. Corrective RAG (CRAG): evaluating retrieval quality and falling back to web search when the knowledge base is insufficient
  7. Multi-vector retrieval: indexing documents by multiple representations (summary + full text + hypothetical questions)
  8. Agentic RAG: building retrieval as a tool that an agent calls dynamically
  9. RAG evaluation with RAGAS: measuring retrieval quality (context precision, recall) and generation quality (faithfulness, answer relevancy)
4
Week 3
LLM Orchestration — LangChain & LlamaIndex
17 topics · Two frameworks, practical decision guide
+

LangChain and LlamaIndex are the two dominant frameworks for orchestrating LLM applications — managing chains of calls, tool integrations, memory, and retrieval pipelines. Both are covered so you can choose the right tool for each job.

LangChain
  1. LangChain architecture: chains, runnables, and the LCEL (LangChain Expression Language) pipeline syntax
  2. Prompt templates, output parsers, and structured output chains
  3. LangChain retrieval chains: complete RAG pipelines with LCEL
  4. Conversation chains and memory: maintaining history across turns with different memory backends
  5. LangChain Tools: wrapping functions, APIs, and databases as tools LLMs can call
  6. LangSmith: tracing, debugging, and evaluating LangChain applications in production
LlamaIndex
  1. LlamaIndex architecture: nodes, indexes, query engines, and pipelines
  2. Document and node processing: readers, transformations, and metadata extractors
  3. Index types: VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex, PropertyGraphIndex
  4. Query engines and chat engines: conversational interfaces over your data
  5. Sub-question query engine: decomposing complex questions across multiple data sources
  6. LlamaIndex Workflows: event-driven, step-based orchestration for complex multi-stage pipelines
  7. LlamaParse: managed document parsing for complex PDFs, tables, and mixed-format documents
Framework Decision Guide
  1. LangChain vs LlamaIndex vs building from scratch: practical decision framework with real trade-offs
  2. Using both together: LlamaIndex for retrieval, LangChain for orchestration
  3. When to avoid frameworks: cases where direct API calls produce simpler, more maintainable code
  4. Dependency pinning and version management: keeping orchestration framework upgrades from breaking production
5
Week 3–4
LLM Agents & Tool Use
19 topics · ReAct to multi-agent systems
+

Agents are LLMs that can take actions — calling tools, writing and executing code, querying databases, and orchestrating other AI models. This phase covers agent architectures from simple tool-calling to complex multi-agent systems.

Tool Use & Function Calling
  1. Function calling fundamentals: defining tools as JSON schemas and letting LLMs decide when and how to call them
  2. Parallel tool calls: models that call multiple tools simultaneously and merge results
  3. Tool design principles: naming, descriptions, and parameter schemas that LLMs use reliably
  4. Built-in tools: web search, code execution, and file reading across OpenAI, Anthropic, and Gemini
  5. Custom tools: wrapping REST APIs, database queries, Python functions, and external services as LLM tools
Agent Architectures
  1. ReAct (Reasoning + Acting): the foundational agent loop — think, act, observe, repeat
  2. OpenAI Assistants API: threads, runs, tool calls, and file search — managed agent infrastructure
  3. LangGraph: stateful, graph-based agent workflows with cycles, branches, and human-in-the-loop
  4. LlamaIndex Workflows: event-driven agent pipelines with explicit step definitions
  5. Memory in agents: short-term (conversation buffer), long-term (vector memory), and entity memory
  6. Planning agents: breaking complex goals into sub-tasks and executing in order
  7. Code execution agents: agents that write Python, run it in a sandbox, and iterate on output
  8. Browser agents: agents that navigate web pages and extract information (Playwright + LLM)
Multi-Agent Systems
  1. Multi-agent patterns: supervisor agents that delegate to specialist sub-agents
  2. Agent-to-agent communication: how agents pass context, results, and instructions
  3. CrewAI: role-based multi-agent orchestration for structured collaborative workflows
  4. AutoGen: Microsoft's multi-agent conversation framework for complex task decomposition
  5. Guardrails in agent systems: preventing runaway loops, cost overruns, and unintended actions
  6. Human-in-the-loop: checkpoints where agents pause and request human approval before proceeding
6
Week 4
Fine-Tuning & Model Customisation
10 topics · When and how to fine-tune
+

Fine-tuning is not always the right answer — but when it is, it dramatically outperforms prompting alone. This phase covers when fine-tuning makes sense, how to do it correctly, and the alternatives that are often faster and cheaper.

  1. Fine-tuning vs RAG vs prompt engineering: the decision framework every AI engineer needs
  2. When fine-tuning wins: style consistency, format adherence, domain-specific terminology, and latency-sensitive use cases
  3. Dataset preparation: formatting training data as instruction-response pairs, quality filtering, and diversity
  4. OpenAI fine-tuning API: uploading datasets, running training jobs, evaluating fine-tuned models, and cost estimation
  5. Fine-tuning GPT-4o mini for classification, extraction, and structured output tasks
  6. LoRA and QLoRA: parameter-efficient fine-tuning of open-source models (Llama 3, Mistral) on consumer hardware
  7. HuggingFace PEFT library: implementing LoRA fine-tuning with the Trainer API
  8. Instruction tuning vs continued pre-training: understanding the difference and when each applies
  9. RLHF overview: how models are aligned with human preferences — conceptual understanding
  10. Deploying fine-tuned models: serving with vLLM, BentoML, or uploading to HuggingFace Hub
7
Week 4–5
AI Safety, Evaluation & Guardrails
16 topics · Making AI trustworthy in production
+

Production AI systems fail in ways that are hard to predict and hard to detect. This phase covers evaluation frameworks, guardrails, and safety layers that make AI features trustworthy in customer-facing applications.

LLM Evaluation
  1. Why LLM evaluation is hard: non-determinism, subjective quality, and the absence of ground truth
  2. Evaluation metrics: faithfulness, answer relevancy, context precision, context recall, and toxicity
  3. RAGAS: automated RAG evaluation — measuring retrieval and generation quality end-to-end
  4. LLM-as-judge: using a strong LLM to evaluate the outputs of another — prompting patterns and limitations
  5. Human evaluation: building annotation interfaces and rubrics for systematic human review
  6. Regression testing: building an evaluation dataset and running it on every prompt or model change
  7. LangSmith and Braintrust: platforms for logging, evaluating, and comparing LLM outputs across runs
  8. Evals as code: integrating LLM evaluation into CI/CD pipelines so regressions are caught before deployment
Guardrails & Safety
  1. Input guardrails: classifying and filtering user inputs before they reach the LLM
  2. Output guardrails: validating, filtering, and post-processing LLM outputs before they reach users
  3. Guardrails AI: declarative guardrail definitions with validators for PII, toxicity, and schema conformance
  4. Llama Guard: Meta's open-source safety classifier for screening inputs and outputs
  5. PII detection and redaction: identifying and masking personal data in inputs and outputs with Presidio
  6. Jailbreak and prompt injection defence: input sanitisation and instruction hierarchy patterns
  7. Content moderation: OpenAI Moderation API and custom classifiers for domain-specific policies
  8. Fallback strategies: graceful degradation when models fail, time out, or produce unsafe output
8
Week 5
Production Deployment & Observability
18 topics · Shipping AI to production reliably
+

Shipping an AI feature to production is different from shipping a traditional API — latency is higher, costs vary with usage, outputs are non-deterministic, and failures are often silent. This phase covers running AI systems reliably at scale.

AI Features in Real Applications
  1. AI feature architecture: synchronous vs asynchronous patterns in full-stack applications
  2. Streaming AI responses to the frontend: Server-Sent Events in FastAPI and Next.js
  3. Background AI jobs: document processing, embeddings, and batch inference with Celery / ARQ
  4. Caching LLM responses: semantic caching with GPTCache and Redis to reduce cost and latency
  5. LLM proxy layer: routing requests across providers, fallbacks, and usage tracking with LiteLLM
  6. Multi-tenancy: isolating AI features, vector namespaces, and usage quotas per user or organisation
AWS Deployment
  1. FastAPI on AWS ECS: containerised AI inference services with auto-scaling
  2. AWS Lambda for lightweight AI features: serverless LLM calls with cold start optimisation
  3. AWS Bedrock: accessing Claude, Llama, Titan, and other foundation models through AWS
  4. Amazon OpenSearch with vector engine: AWS-native vector search for RAG at scale
  5. Secrets management: storing and rotating API keys with AWS Secrets Manager
Observability for AI Systems
  1. LLM observability: what to log — prompts, completions, tokens, latency, cost, and user feedback
  2. Langfuse: open-source LLM observability — tracing, scoring, and dataset management
  3. OpenTelemetry for AI: tracing LLM calls as spans in distributed traces
  4. Cost monitoring: per-user, per-feature, and per-model spend with dashboards and budget alerts
  5. Latency monitoring: p50/p95/p99 tracking and alerting on degradation
  6. Hallucination monitoring: automated detection of factual inconsistencies in production
  7. User feedback loops: thumbs up/down signals and using them to improve prompts and retrieval
Week 5 · Final Project
Capstone — End-to-End AI Feature Build
Complete production-ready AI feature integrated into a full-stack application
+

The final week is a guided capstone project where each student builds a complete, production-ready AI feature integrated into a full-stack application. Example project options:

  1. Document Q&A System: upload any PDF, ask questions, get answers with cited sources — built with RAG + pgvector + FastAPI + Next.js
  2. AI Customer Support Agent: handles FAQs from a knowledge base, escalates to humans when uncertain — RAG + LangGraph + guardrails
  3. Semantic Search Engine: replacing keyword search with vector search + hybrid retrieval over a product catalogue
  4. Code Review Agent: analyses pull request diffs and produces structured feedback using multi-step tool use
  5. Content Generation Pipeline: brief → research → draft → review loop with multiple specialised agents

🛠️ Tools & Technologies Covered

LLM APIs

OpenAI · Anthropic · Gemini · AWS Bedrock

GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and open-source models (Llama 3, Mistral) via Ollama and vLLM

Vector Databases

pgvector · Pinecone · Qdrant · OpenSearch

PostgreSQL-native vectors, managed Pinecone, self-hosted Qdrant, and AWS OpenSearch vector engine

Orchestration

LangChain · LlamaIndex · LangGraph · CrewAI

Complete orchestration frameworks, stateful graph-based agents, and multi-agent workflows

Evaluation & Safety

RAGAS · LangSmith · Langfuse · Guardrails AI

Automated evaluation, LLM tracing, production observability, and input/output safety layers

Fine-Tuning

OpenAI Fine-Tuning · HuggingFace PEFT · LoRA

OpenAI's fine-tuning API, parameter-efficient LoRA/QLoRA for open-source models on consumer hardware

Infrastructure

FastAPI · Celery · LiteLLM · Docker · AWS ECS

Production deployment, background job processing, provider routing, containerisation, and serverless Lambda

📅 Schedule & Timings

📌
Choose one group based on your availability. Maximum 5 candidates per group — individual attention, real project feedback, and direct instructor access throughout.

Weekday Groups

Group 1Mon–Wed · 10 AM – 1 PM
Group 2Mon–Wed · 4 PM – 7 PM

Weekend Groups

Group 3Sat & Sun · 10 AM – 2 PM
Group 4Sat & Sun · 4 PM – 8 PM

📍 Location: In-house training, F-11 Markaz, Islamabad  ·  📱 Online option available for out-of-city participants

🎯 Who This Is For

Full-stack or backend developers adding AI features to products — regardless of which language or framework you use
Engineers who have completed any Full-Stack 2.0 program and want to add production AI capabilities as a direct add-on
Developers targeting AI engineering roles at product companies or AI startups — one of the highest-demand specialisations globally
Technical leads evaluating AI tooling and architecture decisions for their teams
Freelancers building AI-powered products for international clients
No prior AI or ML experience required — no mathematics background required — only Python comfort and basic REST API knowledge