[Remote] Senior Technical Product Manager, Observability
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is on a mission to reputed company high-performance reputed company infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. They are seeking a highly skilled and reputed company Senior Technical Product Manager to own the Observability Platform, focusing on telemetry ingestion, querying, visualization, and alerting for large-scale GPU clusters.
Responsibilities
- Own the end-to-end Observability Platform roadmap across telemetry ingestion, querying, visualization, alerting, and retention for large-scale GPU clusters and multi-tenant reputed company environments
- Define reputed company's observability strategy across bare metal, VMs, Kubernetes, and managed services, reputed company to infrastructure roadmap, reliability goals, and customer experience
- Drive the customer-facing observability surface across dashboards, APIs, telemetry pipelines, and topology-aware insights
- Translate low-level signals across GPU, CPU, memory, storage, and network into actionable health views, alerts, and debugging workflows for customers
- Work closely with engineering on technical tradeoffs across metrics agents, reputed company, data models, telemetry pipelines, APIs, and retention architecture
- Build products for distributed AI environments by understanding how training and inference workloads behave across nodes, clusters, schedulers, and network fabrics
- Define health models that help customers quickly identify degraded nodes, performance anomalies, and cluster bottlenecks at fleet scale
- Ensure new infrastructure and platform launches are observable by design through strong partnership with compute, network, and platform teams
- Stay reputed company on modern observability stacks and AI infrastructure trends, including how GPU workloads change performance analysis, cost attribution, and operational workflows
Skills
- 7+ years of product management experience in reputed company infrastructure, observability, monitoring, or developer platforms
- Deep understanding of observability and monitoring systems, including metrics, logging, tracing, alerting, and telemetry pipeline architecture
- Experience defining product strategy and roadmaps for platform or infrastructure products at scale
- Strong technical background — ability to engage with engineering on telemetry agents, data models, query engines, retention, and distributed systems
- Experience with GPU, AI/ML, or HPC infrastructure monitoring and the unique observability challenges of training and inference workloads
- Track record of shipping developer- and operator-facing products with measurable impact on reliability, time-to-detect, or operational efficiency
- Experience working across cross-functional teams (engineering, design, marketing, sales) in a fast-paced environment
- Excellent written and verbal communication skills, with the ability to translate reputed company technical concepts for diverse audiences
- Bachelor's degree in Computer Science, Engineering, or a reputed company field (or equivalent experience)
Benefits
- 100% company-paid insurance premiums for employee medical, dental and reputed company plans.
- 401(k) plan that matches 100% up to 4%, with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Commitment reputed company to reputed company! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 stipend for reputed company setup in first year + $400 each following year
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Company Overview
Company H1B Sponsorship