Lead Site Reliability Engineer (remote)

  • Tutuka
  • Remote, Other
  • Apr 13, 2019
Full time Data Developer Finance Information Technology JAVA SCRUM

Job Description

At Tutuka, we think everyone should have access to user-friendly payment services. We make connecting easy, by making simple, safe payments happen for people around the globe. We enable payments via virtual and physical cards for partners like banks, telcos, retailers, developers and fintechs across the world.

Job Description

As the Lead Site Reliability Engineer (SRE) at Tutuka you'll be working closely with the entire technical team ensuring the reliability of enterprise-level, highly scalable, highly secure financial processing systems that power tens of millions of transactions and tying them to web, mobile and API interfaces that make it easy for people to issue, redeem and reconcile prepaid cards all over the world.

We already have a team of amazing developers that work out of our local offices in Johannesburg, South Africa as well as remotely across Europe and Southeast Asia, and now we need you to drive improvements in our reliability, scalability and efficiency.

What you will be doing

You'll find every day an exciting challenge, helping our technical team transform a monolithic enterprise processing environment with bank-level security and 99.95% uptime, into a sleek, nimble, micro-service serverless processing environment with better than bank-level security and 99.99% uptime.

If it was easy, we would already have done it! This role may or may not involve the following:

  • Work closely with software engineering teams to improve availability, latency, performance, efficiency, monitoring, emergency response, and capacity planning
  • Across hybrid cloud environment of hosted data centre and AWS
  • Handle upgrades of infrastructure and services through automation
  • Identify, gathering, documenting and automating responses to key performance metrics, logs, and alerts
  • Find optimizations and other efficiencies to scale the application
  • Develop playbooks and tools to streamline processes and shorten problem resolution time
  • Perform periodic on call duties
  • Maintain infrastructure as a code management process


We love taking on team members with a variety of skill levels, from intern to PhD. But there's no getting around the fact that we need this person to know what they're doing, and hit the ground running.

You should already be an SRE guru with:

  • Solid understanding of operational principles, such as capacity planning, monitoring and incident handling
  • Experience automating manual processes, leveraging cloud (preferably AWS) platforms
  • Telemetry, tracing, logging, and alerting best-practices
  • Experience implementing monitored and seamless deployment pipelines
  • Internet fundamentals. HTTP/s, DNS, TCP/IP, security-by-design, caching

Extra kudos are awarded for:

  • JVM performance tuning
  • Experience in monitoring of cloud based systems
  • Knowledge of automated testing frameworks and methodologies
  • Experience with some scripted and compiled/virtual languages (for example JavaScript and Go/JAVA)

Additional Information

Lots of space to challenge yourself:

  • Learning about how the payments industry works

  • Working with global clients and partners

  • Working across multiple teams

  • Helping to grow our technology by understanding your customer’s needs, and conveying that into tangible applications

What's in it for you:

  • Working at the cutting edge of payment innovation

  • International exposure and experience

If you can see yourself in this role and feel you can add to the ongoing success of Tutuka, then please get in touch and apply. (If you do not have Site Reliability Engineering experience, your application cannot be considered!)

Tutuka looks to build strong, diverse teams built from different backgrounds, experiences and identities.