LLM Evaluation Systems
Designs repeatable eval harnesses for accuracy, refusal behavior, tool use, regressions, and multi-turn reasoning quality.
Lead SDET | AI Quality Engineer | Automation Architect
Lead SDET and AI Quality Engineer with 12+ years of experience in automation architecture, AI evaluation, cloud-native quality engineering, and scalable testing platforms.
quality_signal: production_ready
LLM evals, automation architecture, and cloud quality systems.
Identity
Srihari operates at the intersection of automation architecture, LLM evaluation, agent validation, and cloud-native release confidence.
The portfolio is designed around one clear signal: Srihari helps teams ship AI-powered systems with measurable trust. That means test architecture, prompt evaluation, hallucination detection, API reliability, performance baselines, and release gates that leaders can understand.
> initializing_ai_quality_engineer.exe
> loading_playwright_framework...
> validating_llm_responses...
> scanning_agentic_tool_calls...
> publishing_quality_signal: PASS
AI Expertise
Designs repeatable eval harnesses for accuracy, refusal behavior, tool use, regressions, and multi-turn reasoning quality.
Builds adversarial test suites, groundedness checks, red-team prompts, and production scorecards for AI reliability.
Tests planners, memory, MCP tools, retrieval, action execution, fallback flows, and human-in-the-loop controls.
Creates scalable Playwright, Cypress, Selenium, API, and performance frameworks with CI-native observability.
Technical Skills
Experience Timeline
Enterprise quality leadership
Lead SDET / AI Quality Engineer
Education technology scale
Automation Architect
Sports technology platform
Senior QA Automation Engineer
Featured AI Projects
Voice AI Quality
Problem: Validate an AI voice agent that handles real-time user intent, tool calls, and ambiguous conversation paths.
Architecture: Voice pipeline with transcription, LLM orchestration, tool routing, conversation memory, telemetry, and eval gates.
await evalVoiceAgent({ intent: 'schedule_demo', latencyBudget: 1200, grounded: true })Agentic Search
Problem: Improve solution discovery across complex product knowledge while reducing hallucinated recommendations.
Architecture: RAG workflows, prompt regression tests, retrieval quality scoring, citation checks, and agent trace review.
promptfoo eval --config solution-scout.yaml --grader groundednessAI Trust & Safety
Problem: Catch policy-risk responses, jailbreak attempts, and low-confidence model behavior before release.
Architecture: Safety test matrix, synthetic adversarial prompts, confidence thresholds, audit reports, and CI release blocks.
assert_safety(response, policy='academic_integrity', min_score=0.92)Cloud QA Platform
Problem: Scale automation and quality telemetry across web, API, data, and AI-powered learning workflows.
Architecture: Cloud execution grid, contract tests, Playwright suites, API checks, perf baselines, and quality dashboards.
npx playwright test --project=chromium --grep @critical --shard=1/4Metrics
Critical E2E automation coverage
Regression cycle reduction
Execution acceleration
Flaky test reduction
Years in quality engineering
AWS-native quality systems
AI Testing Philosophy
AI quality is not a single assertion. It is a living system of scenario design, model behavior scoring, retrieval checks, tool-call validation, safety coverage, latency budgets, trace review, and release governance.
Certifications / Awards
Open Source Contributions
A Playwright library for measuring and asserting Web Vitals metrics (LCP, FID, CLS) in automated tests. Essential for performance-driven QA.
Quality & Performance
Enhanced HTML reporting for Playwright tests with detailed traces, screenshots, videos, and failure analysis. Makes debugging test failures intuitive.
Automation Excellence
Production-grade test framework combining Playwright, Page Object Model, CI/CD integration, and scalable test organization. Built for enterprise-scale testing.
Architecture & Scale
Blog / Insights
5 min read
7 min read
6 min read
8 min read
Contact
Available for lead SDET, AI quality engineering, automation architecture, and LLM testing specialist roles.
Local recruiter-facing guide