About the job
As a Senior Fullstack Software Engineer on the CosmicAC team, you will play a critical role in building and scaling our GPU accelerated cloud services platform. You'll work on both the backend infrastructure that powers AI/ML workloads at scale and the frontend interfaces that make these capabilities accessible to developers and data scientists. This position requires deep technical expertise in distributed systems, strong backend development skills, and the ability to create intuitive user experiences.
You'll be joining a team that's pushing the boundaries of decentralized AI infrastructure, working on everything from Kubernetes orchestration and managed inference services to API design and real time monitoring dashboards. Your work will directly impact how developers interact with and deploy AI models in production environments.
Responsibilities
- Backend Development: design and implement robust backend services and APIs that handle AI model inference, resource orchestration, and workload distribution across distributed GPU infrastructure
- Frontend Implementation: build responsive and intuitive web interfaces for training job management, model deployment workflows, and real time monitoring dashboards using modern JavaScript frameworks
- Distributed Systems Architecture: contribute to the design and implementation of distributed systems using peer to peer technologies (Holepunch stack)
- API Design & Integration: develop and maintain APIs that support both synchronous and asynchronous inference patterns, ensuring compatibility with industry standards
- Platform Reliability: implement monitoring, logging, and telemetry solutions to ensure high availability and performance of the platform services
- Cross functional Collaboration: work closely with DevOps, AI/ML engineers, and product teams to deliver integrated solutions that meet technical and business requirements
- Code Quality & Best Practices: maintain high standards for code quality through peer reviews, testing, and documentation while championing security best practices
Requirements
- 5+ years of experience in full stack development with strong emphasis on backend systems
- Expert level proficiency in Node.js/JavaScript for backend development and React frontend framework
- Proven experience building and scaling distributed systems or event driven architectures
- Strong understanding of API design and implementation, including authentication, rate limiting, and versioning
- Experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
- Proficiency with databases and a deep understanding of data modeling and optimization
- Solid understanding of networking, security principles, and best practices for production systems
- Experience with real time data streaming and RPC implementations
- Ability to work independently in a remote environment and communicate effectively across time zones
Preferred
- Experience with peer to peer technologies (Hyperswarm, libp2p, WebRTC) or similar distributed communication protocols
- Familiarity with AI/ML inference APIs and OpenAI compatible endpoints
- Previous experience building AI SaaS or PaaS platforms
- Knowledge of GPU resource management and ML framework infrastructure
- Experience with message queuing systems (Redis, RabbitMQ, Kafka)
- Familiarity with observability tools (Prometheus, Grafana, ELK stack)
- Understanding of WebAssembly or edge computing paradigms
- Contributions to open source projects in relevant domains