Production Engineer

  • Rubikloud
  • Toronto, ON, Canada
  • Mar 25, 2022
Full time Developer

Job Description


Named as one of Deloitte’s Fast 50 2015 Companies to Watch, we are rapidly growing. We are looking for a Production Engineer to join our team and you will get the chance to work with various team members to execute on the Rubikloud vision.


As a Production Engineer, you are a hybrid software and system engineer who is responsible for the operability of the production system. You treat operations as if it’s a software problem.

You will ensure proper monitoring is in place to help identifying issues as early as possible, and troubleshoot and resolve the issues effectively. You will build automation to prevent issue recurrence, and work with engineering teams to improve the system to become more fault-tolerant. You will ensure deployment processes are well-defined and automated to facilitate rapid product rollouts while minimize system downtime. You will build tools and automation services to make your job as efficient as possible - you just don’t enjoy doing manual work.


The ideal candidate will be passionate about an operations role that involves deep knowledge of software and system engineering.


What You Will Do:

  • Owner of the entire Rubikloud's SaaS production platform from data warehouse, data ingestion and processing pipelines, machine learning engines, front-end web applications, user authentication service, logging system, and API gateway.
  • Analyze, troubleshoot and resolve production system issues.
  • Automate deployment and upgrade procedures to facilitate rapid product release cycles.
  • Work with engineering teams to ensure systems are designed and implemented with operability in mind.
  • Define and track operational metrics to ensure system performance, scalability, stability, and security.
  • Conduct software performance analysis and system tuning.
  • 2nd tier escalation contact for service incidents and on-call rotation.
  • Develop processes, tools and documentation for production operations.


  • 5+ years in a UNIX-based large-scale SaaS operations role.
  • Bachelor’s degree in Computer Science, Computer/Electrical Engineering or equivalent experience.
  • System background with a strong command of Linux, containers and networking.
  • Proficient in Python and Linux shell scripting.
  • Experience with cloud computing platforms (one of AWS, Azure, or GCP).
  • Experience with deployment tools and configuration management (Docker, Terraform, SaltStack, Chef).
  • Experience with container formats (Docker, RKT) and orchestration systems (Kubernetes, DC/OS)
  • 3+ year in a Continuous Deployment environment.
  • Ability to decompose large complex systems and find failure scenarios.
  • Expertise in analyzing and troubleshooting large-scale distributed systems.
  • Strong sense of ownership and adaptable in a fast paced environment.
  • Detail oriented and strong analytical skills.
  • Strong written and verbal communication skills.


Nice to have:

  • Experience building large-scale distributed system from scratch, complete with deployment tools.
  • Experience working with large-scale distributed big data systems ( Spark/Hadoop ).
  • Experience coding in Java/Scala.
  • Experience in analyzing security of systems.


We are a group of intellectually curious people who are passionate about making a big splash in the world of retail. We offer competitive compensation including equity options - we want all the members of Rubikrew to own part of the Company. You will have a full health benefits package including extended health care, dental, vision etc. Lunch is catered Monday to Thursday so you won’t have to worry about packing a lunch! Healthy snacks and drinks are also provided to keep you energized. We want our employees to feel like they’re always developing personally and professionally so we offer a a personal development budget that you are free to use for knowledge expansion or as a fitness allowance.