Senior Site Reliability Engineer (Tokyo, Japan )
Cookpad is looking for software engineers to join our Site Reliability Engineering team. Our Site Reliability Engineers use systems engineering and software development skills to build and operate the platform behind the largest recipe sharing community in the world. You will be building the systems and tools that enable our engineers to operate and develop Cookpad's global service.
Cookpad is a tech company building a community platform that enables people to share recipe ideas and cooking tips. It’s a global platform used by on average around 100 million people every month across the world. Over 4 million recipes have been created by people in almost 70 countries.
Our mission is to make everyday cooking fun. Because we believe that cooking is the key to a happier and healthier life for people, communities and the planet.
Our heritage is unique: Cookpad was founded in Japan in 1997 and is a listed company in Tokyo.We set up our international HQ in the UK and here we’re a start-up, building the global platform and working with our colleagues around the world.
Cookpad is growing at speed and we’re looking for exceptional people who make things happen and create solutions on the scale we're looking for.
This role is based in Tokyo in Japan.
What you will do:
As a Site Reliability Engineer, you will be responsible for building and operating our infrastructure platform through the following activities.
- Build our highly available, performant and scalable container deployment platform with AWS.
- Design, develop and implement solutions that improve the stability, scalability, availability, and performance of Cookpad's Global service.
- Improve observability of our platform and applications to make troubleshooting process straightforward.
- Assist our product team's availability goals, by introducing Site Reliability Engineering solutions such as resilience engineering and chaos engineering.
- Accelerate software delivery cycle based on numerical metrics such as deploy frequencies, mean time to repair, development lead time, and change fail percentage.
- Participate in operational responsibilities. In the case of incidents, you will be involved in analysing and mitigating root causes as part of our blameless post-mortem culture and build solutions and automation to prevent them from happening again.
This is a senior level role and we are looking for the following skills and experience:
- Experience in software engineering and automation
- SRE/DevOps experience and comfortable operating software in a Linux based environment
- Experiences with deployments of containerized applications by ECS or Kubernetnes
- Familiar with at least one Cloud environment, for example, AWS, GCP, or Azure
- Strong communication skills in English and building working relationships with coworkers in locations around the globe
- Strong coding skills in at least one programming language, and happy to learn Ruby
- Familiar with Infrastructure as Code
- Passion for solving problems using open source software
- Experience with AWS
- Experience operating Ruby on Rails applications
- Solid foundation in deployment and management for large scale of Linux systems
- Understand large-scale complex systems from a reliability perspective
- Solid competency with SQL (ideally in a federated database environment; MySQL a plus)
- Understand event sourcing architecture for microservices (Apache Kafka a plus)
- Experience collecting system and application metrics for observability (Prometheus a plus)
- Deep network analysis experience
- Strong Linux system-level analysis capabilities (Ubuntu a plus)
- Knowledge and experiences about highly available and scalable architectures for services expanded in multi-region is a big plus
- Experience delivering Cloud Native Computing Foundation software in production (Kubernetes, Envoy, Prometheus, Fluentd, etc...)
- Contributions to open source
- Location: Ebisu, Tokyo
- Assistance with Japan work visa
- Working Hours: Flex Time (no mandated core time but need to join daily stand-up meetings at 18:00(JST)
- Monthly contract payment (reviewed yearly)
- Work pattern: Monday-Friday
- Holidays: paid leave provisions
The benefits we offer are based on how we can best support your personal and professional well-being.
What happens next?
We’re building a global product with a global team that’s full of world-class talent. Our hiring process is designed to let your talent shine and for us to get to know each other so we know we’re the right fit.
- When we receive your application, it’s reviewed by one of your peers to see if your experience and skills are a match for what we’re looking for in the role. If they are, one of our Talent team will get in touch for a chat. If you’re a developer, it’s great to see some of your sample code via a Github, Stackoverflow, or BitBucket profile.
- We’ll then ask you to complete a tech assignment or task; or maybe to provide some sample code.
- The final step is to meet the team: mainly your team but also people from other teams, including the leadership team.
The Cookpad team is made up of an incredible, diverse range of people. We are proud to be an equal opportunity employer. We do not discriminate based on race, ethnicity, colour, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.