Featured AI Projects | Srihari Naidu

Projects

AI quality systems with architecture, code signal, and measurable impact.

> initializing_ai_quality_engineer.exe

> loading_playwright_framework...

> validating_llm_responses...

> scanning_agentic_tool_calls...

> publishing_quality_signal: PASS

Voice AI Quality

Problem: Validate an AI voice agent that handles real-time user intent, tool calls, and ambiguous conversation paths.

Architecture: Voice pipeline with transcription, LLM orchestration, tool routing, conversation memory, telemetry, and eval gates.

OpenAI APILangChainPlaywrightDeepEvalAWS

eval.spec.ts

await evalVoiceAgent({ intent: 'schedule_demo', latencyBudget: 1200, grounded: true })

32% faster triage

18% higher intent pass rate

24/7 eval suite

Agentic Search

Problem: Improve solution discovery across complex product knowledge while reducing hallucinated recommendations.

Architecture: RAG workflows, prompt regression tests, retrieval quality scoring, citation checks, and agent trace review.

LangGraphPromptFooMCPPythonOpenAI API

eval.spec.ts

promptfoo eval --config solution-scout.yaml --grader groundedness

41% fewer bad answers

2.3x faster QA review

traceable responses

AI Trust & Safety

Problem: Catch policy-risk responses, jailbreak attempts, and low-confidence model behavior before release.

Architecture: Safety test matrix, synthetic adversarial prompts, confidence thresholds, audit reports, and CI release blocks.

DeepEvalPythonCI/CDAWSAPI Testing

eval.spec.ts

assert_safety(response, policy='academic_integrity', min_score=0.92)

58% expanded risk coverage

zero critical escapes

release-ready evidence

Cloud QA Platform

Problem: Scale automation and quality telemetry across web, API, data, and AI-powered learning workflows.

Architecture: Cloud execution grid, contract tests, Playwright suites, API checks, perf baselines, and quality dashboards.

PlaywrightAWSSeleniumRobot FrameworkGrafana

eval.spec.ts

npx playwright test --project=chromium --grep @critical --shard=1/4

80%+ E2E coverage

30% faster runs

20% fewer flakes