Site Reliability Engineer HPC & Distributed Compute Job at Boom Supersonic, Denver, CO

TTA5QVZldTJaODBqaUtucjlVMXdsOEty
  • Boom Supersonic
  • Denver, CO

Job Description

Help Supersonic Software Take Flight

At Boom, we're scaling supersonic innovation. That means pushing petabytes, taming thousands of cores, and building a compute backbone that lets our engineers simulate, analyze, and design the next era of aviation faster than ever. Sound like your kind of puzzle?

As a Site Reliability Engineer, you’ll sit at the intersection of aerospace and infrastructure—building the environments that keep Boom’s engineers moving faster than the speed of sound. From auto-scaling cloud systems to hands-free Linux workstation provisioning, you’ll streamline and safeguard everything behind the scenes. You’ll work shoulder-to-shoulder with engineering users, solving tough problems, and shipping tools that make supersonic development possible.

This isn’t just uptime and metrics—this is aviation-grade reliability. If that sounds exciting, we’re ready for you to dive in.

Role Overview


  • Architect and scale our on-prem and cloud-based HPC infrastructure—supporting GPU, CPU, and hybrid workflows

  • Optimize job scheduling and distributed workload management (e.g., SLURM, AWS Batch, Kubernetes) for massively parallel simulations

  • Engineer storage solutions that balance IOPS, throughput, and cost—across object, block, and parallel file systems

  • Embed with simulation and data teams to understand real bottlenecks—and then eliminate them

  • Level up observability across dozens of internal applications—unifying monitoring, alerting, and diagnostics into a single view

  • Automate everything from Linux workstation provisioning to dependency management and source-control enforcement

  • Own infrastructure reliability across cloud (AWS) and on-prem environments

  • Automate everything: deployments, upgrades, health checks, and recovery processes

  • Collaborate with aerospace engineers and IT partners to eliminate friction and reduce failure modes

  • Champion SRE best practices, mentoring teammates and influencing broader software lifecycle strategy

Ideal Candidate


  • Professional experience in a blend of Linux systems administration and software development

  • Write clean, maintainable code (especially in Python and bash -Go experience is a plus) in structured, team-oriented development environments with code review and source control

  • Have deployed and monitored distributed systems, such as microservices or client/server architectures

  • Hands-on experience designing and managing petabyte-scale storage systems (Lustre, BeeGFS, Ceph, ZFS)

  • Know how to wrangle fleets of Linux workstations with configuration management and automation tools

  • Familiarity with containerization (Docker, Singularity) and infrastructure-as-code (Terraform, Ansible, CDK)

  • Are comfortable coordinating backups and disaster recovery with IT stakeholders

  • Comfortable navigating fast-paced environments and high-ownership teams

  • Are endlessly curious and hungry to learn—especially about aerospace systems and the people building them

What Will Set You Apart


  • Prior experience in aerospace, defense, biotech, or other simulation-intensive industries, supported by large-scale, auto-scaling infrastructure

  • Familiarity with EDA, CAE, or CFD pipelines and their unique compute/storage needs

  • You’ve debugged distributed or threaded code, like goroutines or similar

  • You’ve built notification tooling that integrates with Slack, SMS, or email

  • You’ve hosted and secured modern SPAs and APIs in production environments

  • You’ve improved performance with distributed caching and content delivery strategies

  • Fearless curiosity—you chase down obscure kernel tuning flags and understand what they do

  • History of mentoring others in system reliability, automation, or performance optimization

Compensation

The Base Salary Range for this position is $140,000 - $177,000 per year. Actual salaries will vary based on factors including but not limited to location, experience, and performance. The range listed is just one component of Boom’s total rewards package for employees. Other rewards may include long term incentives/equity, a flexible PTO policy, and many other progressive benefits.

There is no set deadline to apply for this job opportunity. Applications will be accepted on an ongoing basis until the search is no longer active.

ITAR Requirement

To conform to U.S. Government aerospace technology export regulations (ITAR and EAR), applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State Learn more about ITAR here.

Boom is an equal opportunity employer and we value diversity. All employment is decided on the basis of qualifications, merit and business need. 

 

Job Tags

Permanent employment, Full time, Flexible hours,

Similar Jobs

Guardian Management

Leasing Agent - 3032 Job at Guardian Management

 ...Description Guardian Management has a need for a Full - Time Leasing Agent to join our team at Royal Hills! Royal Hill is an...  ...# High school diploma or GED. # At least one (1) year of experience working in customer service, sales, marketing or a college degree... 

Action Behavior Centers

Board Certified Behavioral Analyst Job at Action Behavior Centers

 ...expenses of your transition. 401(k) with Company Match : Boost your retirement potential and strengthen your financial future. Remote Work from Home Days : Up to 52 days per year- to focus on treatment documentation - giving you protected time to stay ahead, avoid... 

Rock-N-Roll Sushi LLC Defunct

Server/Bartender Job at Rock-N-Roll Sushi LLC Defunct

 ...Description As a Server/bartender in our team, you play a pivotal role in creating unforgettable dining experiences for our guests. Here's a breakdown of what you'll be doing and the perks of joining us: What You'll Do: # Guest-Obsessed Culture: Embrace our... 

GetMed Staffing, Inc.

Travel Radiology Manager Job at GetMed Staffing, Inc.

 ...Job Description GetMed Staffing, Inc. is seeking a travel Radiology Technologist for a travel job in Saint Peter, Minnesota. Job Description & Requirements ~ Specialty: Radiology Technologist ~ Discipline: Allied Health Professional ~ Start Date: 11/03/20... 

WHSmith North America

Junior Cyber Security Analyst Job at WHSmith North America

 ...Summary Were seeking a Junior Cybersecurity Analyst to help protect our users, data, and...  ...troubleshoot access requests and enforce security policy. Perform malware/IOC removal...  ...~2-3 years minimum technical IT and/or Cyber experience. ~ Basic knowledge of security...