Principal Site Reliability Engineer, Platform Lead

Helius

Helius

Software Engineering · Full-time
Remote
225,000 – 350,000 USD per year
Posted on Thursday, July 25, 2024

About Helius

Helius is a leading developer platform on the Solana blockchain, dedicated to inspiring and accelerating the creation of crypto-powered software. Our mission is to empower developers and innovators by providing them with the tools and resources they need to build the future of decentralized applications. We specialize in developing RPCs, indexers, webhooks, APIs, and more for the Solana ecosystem.

We are looking for a Principal Engineer to found our Platform Engineering Team. You will be responsible for building the internal developer platform and accelerating our engineering team. Your work will involve scaling our host & software management solutions to support thousands of bare-metal servers, including our Solana validator(s) and our RPC fleet. You will build service frameworks which provision and manage globally distributed services, ensuring high-reliability and uptime as they handle billions of customer requests per day.

If you have a passion for blockchain technology and a strong sense of ownership, we'd love to hear from you!

(US or Canadian applicants only)

Key Responsibilities

  • Design, implement, and manage automated systems for deploying, monitoring, and maintaining our bare-metal servers and services.
  • Develop and maintain CI/CD pipelines to streamline the deployment process.
  • Enhance the security of our infrastructure and networks by implementing best practices and proactive measures.
  • Monitor system performance, identify and resolve issues to ensure high availability and reliability.
  • Lead incident response and root cause analysis for system outages and issues.
  • Implement robust security measures to safeguard sensitive data and protect against cyber threats and attacks.
  • Collaborate with the engineering team to optimize performance and scalability of our services.
  • Establish and enforce policies and procedures to ensure compliance with industry standards and regulations.

Requirements

  • A minimum of 8 years of experience in a DevOps or Site Reliability Engineering role, preferably in a high-performance, low latency environment.
  • Experience managing and optimizing bare-metal server environments.
  • Expert scripting and programming skills (e.g., Bash, Python, Go).
  • Experience in Rust, Golang, Java, or a similar language.
  • Proficiency with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Strong knowledge of automation tools and frameworks (e.g., Ansible, Terraform, Puppet, Chef).
  • Expertise in CI/CD tools and practices (e.g., Jenkins, GitLab CI, CircleCI).
  • Excellent problem-solving skills and the ability to troubleshoot complex issues.
  • Strong communication skills and the ability to collaborate effectively with cross-functional teams.
  • Ability to work independently and take ownership of projects from start to finish.

Preferred Qualifications

  • Experience with blockchain technologies, particularly Solana.
  • Prior experience in a startup or fast-paced environment.
Helius is an equal opportunity employer.