A Living Ecosystem!
About the Company
With over 18 years of experience and operations spanning 10 countries, this company bridges talent, projects, and cutting-edge technology to deliver innovative solutions that transform today and shape the future of engineering. By uniting creative professionals with advanced technologies, it tackles real-world challenges with tailored, sustainable solutions across diverse industries. Specializing in sustainable engineering, information technology, telecommunications, energy, aviation, and smart urban mobility, the company stands at the forefront of innovation. Dedicated to simplifying technological complexity, it fosters efficiency, collaboration, and positive impact in every project it undertakes.
About the Role
Seeking a skilled Site Reliability Engineer to join a telecom-focused project team. The role involves managing and optimizing cloud infrastructure, container orchestration, and automation to ensure system reliability and performance.
Key Responsibilities
- Manage and maintain cloud infrastructures across AWS, GCP, Azure, and Oracle platforms
- Administer Kubernetes clusters and containerized environments using Docker
- Automate configuration and deployments leveraging tools like Ansible, Terraform, Helm, and custom scripts
- Develop and maintain operational scripts using Shell, Python, Go, or Perl
- Perform system deployments, migrations, tuning for security and performance
- Oversee CI/CD pipelines utilizing Jenkins, GitLab, and Helm charts
- Configure and monitor application components such as NGINX, Tomcat, and Apache
- Apply Site Reliability Engineering best practices and collaborate with international teams following ITIL guidelines
Mandatory Skills and Experience
- Strong Linux system administration and shell scripting skills
- Experience with cloud platforms: GCP, AWS, Azure, Oracle
- Proficient in Kubernetes (native or vendor-specific) and Docker/container management
- Hands-on experience with cloud service deployment and migration (IaaS, Kubernetes, etc.)
- Expertise in automation tools including Ansible, Terraform, Helm
- Programming skills in Shell scripting, Perl, Python, or Go
- System tuning and security optimization for JVM, Tomcat, NGINX, etc.
- CI/CD pipeline management with Jenkins, GitLab, Helm Charts
- Solid understanding of networking concepts: TCP/IP, subnetting, iptables
- Knowledge of load balancing technologies (NGINX, F5, HAProxy)
- Familiarity with ITIL processes for service transition, operation, and continuous improvement (ITIL Foundation certification is a plus)
- Good grasp of system architectures, SOA, BPM, messaging, database replication
- Experience installing and configuring Web/App tier components (IIS, Tomcat, Apache)
- Database skills including SQL query writing
- Understanding of Hadoop ecosystem
- Strong focus on system and application security
- Excellent troubleshooting and postmortem analysis abilities
- Adherence to SRE mindset: shared responsibility, blameless postmortems
- Native Portuguese speaker with technical proficiency in English