11 years engineering enterprise networks at scale. Now channeling that practitioner depth into research — applying LLMs to automate fault diagnosis, configuration management, and self-healing infrastructure.
Bridging the gap between industry practice and academic rigor — asking questions production environments never had time to answer.
"Networks generate enormous volumes of operational data — logs, configs, telemetry, incident tickets — that human operators cannot process at speed and scale. I believe LLMs, grounded in real-world operational context, can bridge this gap: automating diagnosis, accelerating remediation, and enabling self-healing infrastructure."
Benchmarking and fine-tuning large language models for fault diagnosis, log interpretation, root cause analysis, and configuration generation in enterprise networks.
Agentic AI systems that plan, execute, and verify multi-step network operations — from intent-based configuration to automated remediation — with minimal human intervention.
Applying AI to anomaly detection, threat classification, and autonomous policy enforcement — grounded in real firewall, IDS, and SIEM data from production environments.
Enterprise network operations generate vast quantities of unstructured data — incident logs, syslog streams, configuration diffs, and ticketing records — that exceed the capacity of human operators to process at speed and scale. This paper introduces NetLLM-Eval, a benchmark framework grounded in real-world enterprise network incidents, covering three core tasks: (1) fault symptom classification from raw syslog and SNMP data, (2) root cause identification from multi-source incident context, and (3) remediation step generation for common network failure scenarios. Drawing from practitioner experience across 500+ managed devices spanning Palo Alto, Cisco, Juniper, and Fortinet platforms, we evaluate GPT-4o, Llama-3, and Mistral under zero-shot, few-shot, and chain-of-thought prompting regimes. Our results demonstrate that domain-adapted prompting yields up to 78% accuracy on fault classification — outperforming zero-shot baselines by 23 percentage points — and identify systematic failure modes in multi-vendor configuration reasoning, providing a reproducible evaluation harness for the research community.
I'm a Network Engineer and Automation Expert with over 11 years of hands-on experience designing, securing, and automating large-scale enterprise network infrastructures. I've managed 500+ devices across Palo Alto, Cisco, Juniper, and Fortinet platforms — not in theory, but in production.
The problems I kept encountering — slow fault diagnosis, manual compliance checks, knowledge locked in engineers' heads — are not just operational problems. They're research problems. That realization pushed me to bridge the gap between industry practice and academic rigor.
My research is driven by production pain points and validated against real environments. I believe the most impactful work happens when the person running the experiment has also been on-call at 2 AM diagnosing the failure.
A decade of multi-vendor, multi-domain expertise — from BGP to LLM agents.
Production systems built to solve real operational problems — not proof-of-concepts.
LLM agents automating L1/L2 tasks, reducing manual workload and enhancing operational scalability across managed environments.
Platform to streamline knowledge transfer and significantly improve engineer onboarding efficiency across the team.
Automated log collection and analysis pipeline improving diagnostic accuracy and reducing mean-time-to-resolve.
Automated device hardening achieving 95% security compliance, eliminating 8 hours of manual reporting per week.
Scripted upgrade pipeline for 500+ network devices across multi-vendor environments with zero downtime during business hours.
Single-pane dashboard aggregating telemetry across multiple customer environments for faster detection of critical events.