AI Engineering Track Training Islamabad 2026

🎓 Program Overview

There is a significant difference between using AI and building AI systems. Millions of developers now use ChatGPT and GitHub Copilot daily — but only a fraction can build the production systems that power those experiences: the retrieval pipelines, the agent architectures, the evaluation frameworks, the cost-optimised inference layers, and the monitoring systems that keep AI features working reliably at scale.

This track trains you to be on the engineering side of that divide. You will work with the OpenAI, Anthropic, and Google Gemini APIs not as a user but as a builder — designing RAG pipelines, orchestrating multi-step LLM agents, managing vector databases, evaluating model outputs programmatically, and deploying AI features that behave predictably in production.

🔀 Using AI vs. Building AI Systems — This Track Is About Building

Using AI (what most people do)

Prompt ChatGPT / Copilot

Use AI tools day-to-day

Copy-paste AI output

Call an API with a simple prompt

Building AI Systems (this track)

Design RAG pipelines for private knowledge

Build LLM agents with tool use and memory

Evaluate model outputs programmatically

Deploy AI features that work at production scale

💡 Why AI Engineering in 2026

Every software product is being rebuilt with AI features — search, summarisation, recommendations, assistants, automation — and companies need engineers who can build them properly

Prompt engineering alone is not enough — companies need the full stack: retrieval, context management, evaluation, safety, and deployment

AI engineers command among the highest salaries in software engineering globally — the skills gap is enormous and will not close quickly

This skillset is framework-agnostic — it layers directly on top of your existing backend knowledge in Python, Node.js, Go, Laravel, or any other stack

Pakistan's IT export sector is seeing rapid growth in demand for AI engineering skills from international clients

This track is designed to be taken standalone or as a direct add-on to any Full-Stack 2.0 program — no prior AI/ML experience required

📚 Curriculum — 8 Phases + Capstone

Week 1

LLM Foundations for Engineers

9 topics · Mental models before APIs

−

Before calling a single API, you need to understand what language models actually are, how they work at a systems level, and what their capabilities and failure modes look like in production. This phase gives engineers the mental model needed to make good architectural decisions throughout the entire course.

How large language models work: tokens, embeddings, attention, and the transformer architecture — explained for engineers, not researchers
Tokenisation in practice: how text becomes tokens, why token counts matter for cost and context limits, and how to measure them with tiktoken
Context windows: what they are, how they constrain system design, and current limits across GPT-4o, Claude 3.5, and Gemini 1.5 Pro
Temperature, top-p, and sampling parameters: what they control and how to set them for different use cases
LLM failure modes engineers must understand: hallucination, context loss, sycophancy, prompt injection, and positional bias
Model comparison: GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro vs open-source (Llama 3, Mistral) — capabilities, pricing, and when to use each
Open-source vs proprietary models: self-hosted inference with Ollama and vLLM vs API-based models
Cost modelling: estimating and controlling LLM API spend at scale — token budgeting, caching, and model tiering strategies
Setting up the AI engineering environment: Python, API keys, environment variable management, and rate limit handling

Week 1–2

LLM APIs & Prompt Engineering for Production

19 topics across 3 providers

Working directly with OpenAI, Anthropic, and Google Gemini — and the prompt engineering techniques that make the difference between a toy prototype and a reliable production feature.

API Fundamentals — All Three Providers

OpenAI API: chat completions, function calling, structured outputs, vision, and streaming with the Python SDK
Anthropic API: messages API, system prompts, tool use, vision, and extended thinking with Claude
Google Gemini API: multimodal inputs, long context, grounding, and the Gemini Python SDK
Streaming responses: handling token-by-token output in APIs and surfacing it to users in real time
Structured output: forcing models to return valid JSON using OpenAI structured outputs, Anthropic tool use, and the Instructor library
Vision and multimodal inputs: sending images, PDFs, and documents to LLM APIs for analysis
Batch API: processing thousands of requests asynchronously at lower cost with OpenAI Batch and Anthropic Batch
Rate limiting and retry logic: exponential backoff, request queuing, and graceful degradation
Provider abstraction: building a unified LLM client that can swap providers without rewriting application logic

Prompt Engineering for Engineers

System prompts: writing effective prompts that define persona, behaviour, output format, and constraints
Few-shot prompting: selecting and formatting examples that steer model behaviour reliably
Chain-of-thought: making models reason step by step before producing output
XML and structured prompt formatting: Anthropic's recommended approach for complex prompts
Prompt templating: building dynamic prompts from user input and context using Jinja2 and f-strings
Output formatting control: requesting JSON, markdown, tables, and code blocks reliably
Prompt versioning: treating prompts as code — version control and A/B testing
Prompt injection: understanding attack vectors and how to defend against them
Context window management: summarisation, truncation, and prioritisation strategies
Instruction following: writing prompts that models actually follow — specificity, positive framing

Week 2

Embeddings & Vector Databases

16 topics · The foundation of RAG

Embeddings are the foundation of semantic search, RAG pipelines, recommendation systems, and clustering. This phase covers generating embeddings and storing/querying them at scale using production vector databases.

Embeddings

What embeddings are: converting text, images, and data into high-dimensional vectors that encode semantic meaning
Embedding models: OpenAI text-embedding-3 (small/large), Cohere Embed v3, and open-source alternatives (sentence-transformers, BGE, E5)
Embedding dimensions and model selection: accuracy vs cost vs latency trade-offs
Similarity metrics: cosine similarity, dot product, and Euclidean distance — when each applies
Batching embedding requests: efficient bulk generation for large document corpora
Multimodal embeddings: text, images, and code — CLIP and OpenAI vision embeddings
Embedding drift: how model updates can change embedding spaces and break existing indexes

Vector Databases

pgvector: vector similarity search in PostgreSQL — HNSW vs IVFFlat indexing, Eloquent-friendly queries
Pinecone: managed vector database — indexes, namespaces, metadata filtering, and hybrid search
Qdrant: open-source vector database — collections, payload filtering, and self-hosted deployment
Choosing a vector database: decision framework based on scale, cost, latency, and infrastructure
Hybrid search: combining dense vector search with sparse BM25 keyword search for better retrieval
Metadata filtering: narrowing searches by document type, date, user, tenant, or structured fields
Vector index performance: HNSW graph construction, ef_construction, and recall/latency trade-offs
Re-ranking: using cross-encoders (Cohere Rerank, Voyage Rerank) to improve retrieval precision
Amazon OpenSearch with vector engine: AWS-native vector search alternative for AWS deployments

Week 2–3

RAG — Retrieval-Augmented Generation

19 topics · Basic to advanced production patterns

RAG is the most important pattern in production AI engineering — it solves the core limitations of LLMs (outdated training data, hallucination, private knowledge) by retrieving relevant context at inference time. This phase covers RAG from basic implementation through to advanced production patterns.

RAG Fundamentals

Why RAG: the problem it solves, when to use it, and when fine-tuning is a better answer
The basic RAG pipeline: ingest → chunk → embed → store → retrieve → augment → generate
Document ingestion: loading PDFs, Word docs, web pages, Notion, and databases with LangChain loaders and LlamaIndex readers
Text chunking strategies: fixed-size, recursive character splitting, semantic chunking, document-structure-aware
Chunk size and overlap: how they affect retrieval quality and what to tune for different document types
Metadata enrichment: adding source, page number, section headers, and timestamps to chunks
Embedding and indexing: bulk ingestion pipelines with progress tracking and error handling
Query embedding and similarity search: retrieving the top-k most relevant chunks
Context assembly: formatting retrieved chunks into a coherent prompt context block
Source attribution: citing which documents the answer was drawn from

Advanced RAG Patterns

Query transformation: rewriting user queries with an LLM before retrieval to improve recall
HyDE (Hypothetical Document Embeddings): generating a hypothetical answer and using it as the retrieval query
Multi-query retrieval: generating multiple query variants and merging their results
Parent-child chunking: indexing small child chunks for precision, retrieving larger parent context
Contextual compression: extracting only the relevant portion of a retrieved chunk
Corrective RAG (CRAG): evaluating retrieval quality and falling back to web search when the knowledge base is insufficient
Multi-vector retrieval: indexing documents by multiple representations (summary + full text + hypothetical questions)
Agentic RAG: building retrieval as a tool that an agent calls dynamically
RAG evaluation with RAGAS: measuring retrieval quality (context precision, recall) and generation quality (faithfulness, answer relevancy)

Week 3

LLM Orchestration — LangChain & LlamaIndex

17 topics · Two frameworks, practical decision guide

LangChain and LlamaIndex are the two dominant frameworks for orchestrating LLM applications — managing chains of calls, tool integrations, memory, and retrieval pipelines. Both are covered so you can choose the right tool for each job.

LangChain

LangChain architecture: chains, runnables, and the LCEL (LangChain Expression Language) pipeline syntax
Prompt templates, output parsers, and structured output chains
LangChain retrieval chains: complete RAG pipelines with LCEL
Conversation chains and memory: maintaining history across turns with different memory backends
LangChain Tools: wrapping functions, APIs, and databases as tools LLMs can call
LangSmith: tracing, debugging, and evaluating LangChain applications in production

LlamaIndex

LlamaIndex architecture: nodes, indexes, query engines, and pipelines
Document and node processing: readers, transformations, and metadata extractors
Index types: VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex, PropertyGraphIndex
Query engines and chat engines: conversational interfaces over your data
Sub-question query engine: decomposing complex questions across multiple data sources
LlamaIndex Workflows: event-driven, step-based orchestration for complex multi-stage pipelines
LlamaParse: managed document parsing for complex PDFs, tables, and mixed-format documents

Framework Decision Guide

LangChain vs LlamaIndex vs building from scratch: practical decision framework with real trade-offs
Using both together: LlamaIndex for retrieval, LangChain for orchestration
When to avoid frameworks: cases where direct API calls produce simpler, more maintainable code
Dependency pinning and version management: keeping orchestration framework upgrades from breaking production

Week 3–4

LLM Agents & Tool Use

19 topics · ReAct to multi-agent systems

Agents are LLMs that can take actions — calling tools, writing and executing code, querying databases, and orchestrating other AI models. This phase covers agent architectures from simple tool-calling to complex multi-agent systems.

Tool Use & Function Calling

Function calling fundamentals: defining tools as JSON schemas and letting LLMs decide when and how to call them
Parallel tool calls: models that call multiple tools simultaneously and merge results
Tool design principles: naming, descriptions, and parameter schemas that LLMs use reliably
Built-in tools: web search, code execution, and file reading across OpenAI, Anthropic, and Gemini
Custom tools: wrapping REST APIs, database queries, Python functions, and external services as LLM tools

Agent Architectures

ReAct (Reasoning + Acting): the foundational agent loop — think, act, observe, repeat
OpenAI Assistants API: threads, runs, tool calls, and file search — managed agent infrastructure
LangGraph: stateful, graph-based agent workflows with cycles, branches, and human-in-the-loop
LlamaIndex Workflows: event-driven agent pipelines with explicit step definitions
Memory in agents: short-term (conversation buffer), long-term (vector memory), and entity memory
Planning agents: breaking complex goals into sub-tasks and executing in order
Code execution agents: agents that write Python, run it in a sandbox, and iterate on output
Browser agents: agents that navigate web pages and extract information (Playwright + LLM)

Multi-Agent Systems

Multi-agent patterns: supervisor agents that delegate to specialist sub-agents
Agent-to-agent communication: how agents pass context, results, and instructions
CrewAI: role-based multi-agent orchestration for structured collaborative workflows
AutoGen: Microsoft's multi-agent conversation framework for complex task decomposition
Guardrails in agent systems: preventing runaway loops, cost overruns, and unintended actions
Human-in-the-loop: checkpoints where agents pause and request human approval before proceeding

Week 4

Fine-Tuning & Model Customisation

10 topics · When and how to fine-tune

Fine-tuning is not always the right answer — but when it is, it dramatically outperforms prompting alone. This phase covers when fine-tuning makes sense, how to do it correctly, and the alternatives that are often faster and cheaper.

Fine-tuning vs RAG vs prompt engineering: the decision framework every AI engineer needs
When fine-tuning wins: style consistency, format adherence, domain-specific terminology, and latency-sensitive use cases
Dataset preparation: formatting training data as instruction-response pairs, quality filtering, and diversity
OpenAI fine-tuning API: uploading datasets, running training jobs, evaluating fine-tuned models, and cost estimation
Fine-tuning GPT-4o mini for classification, extraction, and structured output tasks
LoRA and QLoRA: parameter-efficient fine-tuning of open-source models (Llama 3, Mistral) on consumer hardware
HuggingFace PEFT library: implementing LoRA fine-tuning with the Trainer API
Instruction tuning vs continued pre-training: understanding the difference and when each applies
RLHF overview: how models are aligned with human preferences — conceptual understanding
Deploying fine-tuned models: serving with vLLM, BentoML, or uploading to HuggingFace Hub

Week 4–5

AI Safety, Evaluation & Guardrails

16 topics · Making AI trustworthy in production

Production AI systems fail in ways that are hard to predict and hard to detect. This phase covers evaluation frameworks, guardrails, and safety layers that make AI features trustworthy in customer-facing applications.

LLM Evaluation

Why LLM evaluation is hard: non-determinism, subjective quality, and the absence of ground truth
Evaluation metrics: faithfulness, answer relevancy, context precision, context recall, and toxicity
RAGAS: automated RAG evaluation — measuring retrieval and generation quality end-to-end
LLM-as-judge: using a strong LLM to evaluate the outputs of another — prompting patterns and limitations
Human evaluation: building annotation interfaces and rubrics for systematic human review
Regression testing: building an evaluation dataset and running it on every prompt or model change
LangSmith and Braintrust: platforms for logging, evaluating, and comparing LLM outputs across runs
Evals as code: integrating LLM evaluation into CI/CD pipelines so regressions are caught before deployment

Guardrails & Safety

Input guardrails: classifying and filtering user inputs before they reach the LLM
Output guardrails: validating, filtering, and post-processing LLM outputs before they reach users
Guardrails AI: declarative guardrail definitions with validators for PII, toxicity, and schema conformance
Llama Guard: Meta's open-source safety classifier for screening inputs and outputs
PII detection and redaction: identifying and masking personal data in inputs and outputs with Presidio
Jailbreak and prompt injection defence: input sanitisation and instruction hierarchy patterns
Content moderation: OpenAI Moderation API and custom classifiers for domain-specific policies
Fallback strategies: graceful degradation when models fail, time out, or produce unsafe output

Week 5

Production Deployment & Observability

18 topics · Shipping AI to production reliably

Shipping an AI feature to production is different from shipping a traditional API — latency is higher, costs vary with usage, outputs are non-deterministic, and failures are often silent. This phase covers running AI systems reliably at scale.

AI Features in Real Applications

AI feature architecture: synchronous vs asynchronous patterns in full-stack applications
Streaming AI responses to the frontend: Server-Sent Events in FastAPI and Next.js
Background AI jobs: document processing, embeddings, and batch inference with Celery / ARQ
Caching LLM responses: semantic caching with GPTCache and Redis to reduce cost and latency
LLM proxy layer: routing requests across providers, fallbacks, and usage tracking with LiteLLM
Multi-tenancy: isolating AI features, vector namespaces, and usage quotas per user or organisation

AWS Deployment

FastAPI on AWS ECS: containerised AI inference services with auto-scaling
AWS Lambda for lightweight AI features: serverless LLM calls with cold start optimisation
AWS Bedrock: accessing Claude, Llama, Titan, and other foundation models through AWS
Amazon OpenSearch with vector engine: AWS-native vector search for RAG at scale
Secrets management: storing and rotating API keys with AWS Secrets Manager

Observability for AI Systems

LLM observability: what to log — prompts, completions, tokens, latency, cost, and user feedback
Langfuse: open-source LLM observability — tracing, scoring, and dataset management
OpenTelemetry for AI: tracing LLM calls as spans in distributed traces
Cost monitoring: per-user, per-feature, and per-model spend with dashboards and budget alerts
Latency monitoring: p50/p95/p99 tracking and alerting on degradation
Hallucination monitoring: automated detection of factual inconsistencies in production
User feedback loops: thumbs up/down signals and using them to improve prompts and retrieval

★

Week 5 · Final Project

Capstone — End-to-End AI Feature Build

Complete production-ready AI feature integrated into a full-stack application

The final week is a guided capstone project where each student builds a complete, production-ready AI feature integrated into a full-stack application. Example project options:

Document Q&A System: upload any PDF, ask questions, get answers with cited sources — built with RAG + pgvector + FastAPI + Next.js
AI Customer Support Agent: handles FAQs from a knowledge base, escalates to humans when uncertain — RAG + LangGraph + guardrails
Semantic Search Engine: replacing keyword search with vector search + hybrid retrieval over a product catalogue
Code Review Agent: analyses pull request diffs and produces structured feedback using multi-step tool use
Content Generation Pipeline: brief → research → draft → review loop with multiple specialised agents

🛠️ Tools & Technologies Covered

LLM APIs

OpenAI · Anthropic · Gemini · AWS Bedrock

GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and open-source models (Llama 3, Mistral) via Ollama and vLLM

Vector Databases

pgvector · Pinecone · Qdrant · OpenSearch

PostgreSQL-native vectors, managed Pinecone, self-hosted Qdrant, and AWS OpenSearch vector engine

Orchestration

LangChain · LlamaIndex · LangGraph · CrewAI

Complete orchestration frameworks, stateful graph-based agents, and multi-agent workflows

Evaluation & Safety

RAGAS · LangSmith · Langfuse · Guardrails AI

Automated evaluation, LLM tracing, production observability, and input/output safety layers

Fine-Tuning

OpenAI Fine-Tuning · HuggingFace PEFT · LoRA

OpenAI's fine-tuning API, parameter-efficient LoRA/QLoRA for open-source models on consumer hardware

Infrastructure

FastAPI · Celery · LiteLLM · Docker · AWS ECS

Production deployment, background job processing, provider routing, containerisation, and serverless Lambda

📅 Schedule & Timings

📌

Choose one group based on your availability. Maximum 5 candidates per group — individual attention, real project feedback, and direct instructor access throughout.

Weekday Groups

Group 1Mon–Wed · 10 AM – 1 PM

Group 2Mon–Wed · 4 PM – 7 PM

Weekend Groups

Group 3Sat & Sun · 10 AM – 2 PM

Group 4Sat & Sun · 4 PM – 8 PM

📍 Location: In-house training, F-11 Markaz, Islamabad · 📱 Online option available for out-of-city participants

🎯 Who This Is For

Full-stack or backend developers adding AI features to products — regardless of which language or framework you use

Engineers who have completed any Full-Stack 2.0 program and want to add production AI capabilities as a direct add-on

Developers targeting AI engineering roles at product companies or AI startups — one of the highest-demand specialisations globally

Technical leads evaluating AI tooling and architecture decisions for their teams

Freelancers building AI-powered products for international clients

No prior AI or ML experience required — no mathematics background required — only Python comfort and basic REST API knowledge