AVAILABLE POSITIONS Site Reliability Engineer I – IT

Monitoring & Incident Management:

  • Improve the studio’s reliability through monitoring, rapid response, communication and coordination.
  • Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, escalations and alerts.
  • Routinely identifies operational problems by observing and studying system architect, functionality and performance results. Troubleshooting procedures with the overall studio architect and investigating surfaced issues, and handling incidents.
  • Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.

Development & Data Analyzing:

  • Develop operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.
  • Create new tools to facilitate automated monitoring of the studio’s operational environment.
  • Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.
  • Improves operational quality results by studying, evaluating, and recommending process re architecting, implementing changes, contributing information and opinion to unit design and modification teams.
  • Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.
  • Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
  • Accomplishes engineering and organization mission by completing related results as needed.

Operations Engineer Skills and Qualifications: 

Mastery of Systems Linux and Networking administration

  • Strong systems engineering and troubleshooting skills
  • Shell scripting (BASH & PHP)
  • Strong TCP/IP understanding and ability to produce detailed documentation
  • Write up new and maintain technical documentation
  • Ability to administer networking firewalls, routers, and switches
  • S3 Maintenance, Apache maintenance, Load Balancer Management
  • Puppet Management

Cloud Management

  • AWS Expertise (VPC, RDS, Route53 Integration (DNS))

Database fundamentals

  • Administer and maintain MySQL and other opensource databases
  • Write and perform basic queries  to evaluate database stability, integrity and performance
  • Large/Big Data Management
  • Administer and maintain Aurora infrastructure

Monitoring Systems

  • System Level (Nagios, Munin, Check_MK)
  • Writing checks & scripts
  • Log/Application Level (Splunk, Elastic Searching, Apache)
  • Ability to diagnose infrastructure as a whole!

Careers region: North America

Careers Category: IT, Development Operations & Security

Careers location: Austin, TX

Careers Type: Full-Time