EMPLOYER: bitHeads Inc
JOB DESCRIPTION:
bitHeads, inc has an immediate need for a Senior DevOps Site Reliability Engineer to join our dynamic team of talented professionals. This is a full-time permanent role.
Lead and Mentor
- Evangelize & support technology and best practices from the SRE team
- Lead tactical strategies for the SRE team
- Plan future architecture of core services technologies
- Develop and drive balanced and fair service level objectives
- Optimize on-call rotations and processes
- Document tribal knowledge for operating technologies
Collaborate
- Strategize and plan with IT on production and CI/CD infrastructure
- Strategize and plan with the Engineering team on the core platform
- Collaborate with the Engineering team on performance bottlenecks, security risks and process improvements
- Partner with Engineering to improve services through rigorous testing and release procedures
Develop & Operate
- Develop and support software and systems to help manage platform infrastructure and applications, operations and support teams
- Practice sustainable incident response and blameless postmortems
- Operate the production environment by monitoring availability and taking a holistic view of system health
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Provide primary operational support and engineering for multiple large distributed software applications
Innovate
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Contribute ideas and code to core frameworks that drive our technology and product roadmap
- Research new technologies that could be used to improve our products
Requirements
- Minimum 7 years experience as DevOps or SRE
- 3+ years of experience with Linux operating systems
- Automation skills in shell bash, Python, and/or other languages
- Basic understanding of C#, Java and JavaScript
- Advanced proficiency with one of Python, C#, Java, JavaScript/Typescript, or GoLang
- Advanced proficiency in managing infrastructure on Azure and AWS or GCP is nice to have
- 2+ years of Docker, and Kubernetes, or similar technologies
- 5+ years with Git, Perforce, or other version control software
- Experience using Terraform
- Experience working with SQL and NoSql DB such as MongoDB, Cassandra, etc.
- Strong understanding of virtualization and hypervisor technologies
- Understanding of databases and data modelling
- Experience with automatically managing dozens or hundreds of servers
- Focus on performance bottlenecks and performance improvement techniques
- Strong networking knowledge of TCP/IP
- Experienced with monitoring/data aggregation tools and platforms such as Splunk, Grafana, New Relic
- Experience with workflow and issue management tools such as JIRA
- Must be comfortable working with mission-critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Able to work in a collaborative, global, agile/lean development environment
- excellent time-management, organization, and communication skills
Pay: $75000-$115000 per annum
Location: Remote
START DATE: 12/01/2021
Faites-nous savoir si ce poste vous intéresse. Remplissez ce court formulaire et nous vous contacterons rapidement.
'