Sr. Dev-Ops Engineer

EMPLOYER: bitHeads Inc


bitHeads, inc has an immediate need for a Senior DevOps Site Reliability Engineer to join our dynamic team of talented professionals. This is a full-time permanent role.


Lead and Mentor

  • Evangelize & support technology and best practices from the SRE team
  • Lead tactical strategies for the SRE team
  • Plan future architecture of core services technologies
  • Develop and drive balanced and fair service level objectives
  • Optimize on-call rotations and processes
  • Document tribal knowledge for operating technologies



  • Strategize and plan with IT on production and CI/CD infrastructure
  • Strategize and plan with the Engineering team on the core platform
  • Collaborate with the Engineering team on performance bottlenecks, security risks and process improvements
  • Partner with Engineering to improve services through rigorous testing and release procedures


Develop & Operate

  • Develop and support software and systems to help manage platform infrastructure and applications, operations and support teams
  • Practice sustainable incident response and blameless postmortems
  • Operate the production environment by monitoring availability and taking a holistic view of system health
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Provide primary operational support and engineering for multiple large distributed software applications



  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Contribute ideas and code to core frameworks that drive our technology and product roadmap
  • Research new technologies that could be used to improve our products



  • Minimum 7 years experience as DevOps or SRE
  • 3+ years of experience with Linux operating systems
  • Automation skills in shell bash, Python, and/or other languages
  • Basic understanding of C#, Java and JavaScript
  • Advanced proficiency with one of Python, C#, Java, JavaScript/Typescript, or GoLang
  • Advanced proficiency in managing infrastructure on Azure and AWS or GCP is nice to have
  • 2+ years of Docker, and Kubernetes, or similar technologies
  • 5+ years with Git, Perforce, or other version control software
  • Experience  using Terraform
  • Experience working with SQL and NoSql DB such as MongoDB, Cassandra, etc.
  • Strong understanding of virtualization and hypervisor technologies
  • Understanding of databases and data modelling
  • Experience with automatically managing dozens or hundreds of servers
  • Focus on performance bottlenecks and performance improvement techniques
  • Strong networking knowledge of TCP/IP
  • Experienced with monitoring/data aggregation tools and platforms such as Splunk, Grafana, New Relic
  • Experience with workflow and issue management tools such as JIRA
  • Must be comfortable working with mission-critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Able to work in a collaborative, global, agile/lean development environment
  • excellent time-management, organization, and communication skills


Pay: $75000-$115000 per annum

Location: Remote

START DATE: 12/01/2021

Let us know if you are interested in this position. Complete this short form and we will follow up with you promptly.