How can we reach you?

Do you have a resume?

Please check the box below:

If you would like to be notified via email about new job openings, select a category below:

Submit
Cancel
Site Reliability Engineer
Detroit, MI

Apply Now

Posted 4/03/2019

Required Skills

  • Bachelor’s Degree in computer science or similar
  • Minimum of four years of Linux systems administration experience
  • Experience with Apache, Tomcat, Wildfly, .NET or .NET Core hosting
  • Experience with Puppet and Ansible
  • Experience with MS Azure, AWS, or Google Compute cloud environments
  • Expert skill level in Scripting and Automation
  • Expert in high-availability and load balancing technologies
  • Willingness to document technical processes and share knowledge with others
  • Capable of following and composing process and procedure documentation, as well as training other users on complex topics
  • Ability to participate in an on-call rotation performing weekend and after-hours support

Job Description

  • The Site Reliability Engineer is responsible for ensuring that our digital space is up and running at all times and performing at its peak capability
  • This position will be responsible for planning, deploying and troubleshooting application stacks in support of our growing ecommerce business
  • The Site Reliability Engineer will interact with several functional areas across all levels of the organization
  • Work closely with our Store, ESB, and Development teams to design our application stack for an enhanced digital presence, performance, and availability on multiple cloud services
  • Conduct post-mortem reviews of system down time with internal stakeholders to put short- and long-term solutions in place to eliminate repeat occurrences
  • Conduct risk analysis to review system shortcomings that present risk of downtime for application stacks 
  • Continuously improve our internal processes and controls to ensure optimal performance
  • Implement DevOps changes and rollouts and shepherding deployment in a manner leading to optimal results
  • Combine software and systems engineering to build and run large-scale, distributed, fault-tolerant systems
  • SRE ensures our internally critical and our externally-visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Use configuration management tools to create repeatable environments
  • Create dashboards which communicate and alert on the overall system health to less technical colleagues
  • Develop system configuration management templates, and audit systems against those templates over the system lifecycle
  • Work with developers to quickly identify and address issues to provide smooth code rollouts and seamless change back-out when there are problems

Reliance One, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.


This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

We Build
Teams

Contact Us

RELIANCE ONE, INC.

1700 Harmon Road, Suite One

Auburn Hills, MI 48326

248-922-4500


Copyright 2015 Reliance One, Inc.

All rights reserved.