Site Reliability Engineer

Shein
San Diego, CA

About SHEIN

SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.

Position Summary

We are looking for an experienced Site Reliability Engineer (Official name Site Reliability Engineer I) to join our Site Reliability Engineering team. Site Reliability Engineers at SHEIN are hybrid software/systems engineers whose overarching goal is to ensure that production services are “always on.” They strive to build the most reliable and performant systems on the planet.

SREs work closely with cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data; knowing exactly what happens across the ecosystem, identifying problems before they occur and addressing them as quickly as possible.

They are also responsible for improving operational efficiency, utilization and system resiliency of the platform. They own critical open-source software that our platform relies on and they are core participants in every significant engineering effort underway in the platform.

Additionally, SRE’s are tasked with driving forward the operability of the platform to reduce the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise along with a desire to be challenged by issues of scale and complexity to make our service better for our customers.

Job Responsibilities


  • Participate in an on-call rotation to ensure 24/7/365 availability of SHEIN's production system.

  • Supervise capacity & utilization and work closely with cross-functional teams to orchestrate scale up/down of the services.

  • Own and operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, Redis.

  • Build tools and design processes that help improve observability and system resiliency of the platform.

  • Triage site availability Incidents and proactively work towards reducing MTTR for customer impacting incidents.

  • Partner with service owners to implement service level metrics and service level objectives that act as service level health indicators.

  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.

  • Develop and maintain technical documentation, network diagrams, runbooks, and procedures.

  • Drive initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices.

  • Respond to production incidents leverage experience in software development, systems engineering, and networking to proactively prevent recurring issues.

  • Provide relief and sustainable resolution to issues within our infrastructure.

  • Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.

  • Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.

  • Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.


Job Requirements


  • Bachelor's degree in Computer Science or Information Systems or equivalent technical discipline.

  • 3+ years of working experience in an enterprise 24/7 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments.

  • Systematic problem-solving approach, combined with a sense of ownership and drive.

  • Full-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integration/build systems, Java, SQL and NoSQL databases.

  • Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability.

  • Strong experience with observability tools such as Grafana, Prometheus, Zabbix, etc.

  • Experience in any of the scripting/programming languages such as Python, GoLang, etc.

  • Familiar with container technology, such as Docker, Kubernetes, Mesos, etc.

  • Strong verbal and written communication skills; able to work effectively with geographically remote teams.

  • Experience with one or more OSS technologies like Elasticsearch, Kafka and Redis.

  • Proficient with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions.

Nice to Have


  • Experience with big data related component operation and maintenance experience (Hadoop/Yarn/Hbase/Hive/Spark, etc.)

  • Solid understanding of Linux system.

Benefits and Perks


  • Bonus and RSU eligible

  • Healthcare (medical, dental, vision, prescription drugs)

  • Health Savings Account with Employer Funding

  • Flexible Spending Accounts (Healthcare and Dependent care)

  • Company-Paid Basic Life/AD&D insurance

  • Company-Paid Short-Term and Long-Term Disability

  • Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)

  • Employee Assistance Program

  • Business Travel Accident Insurance

  • 401(k) Savings Plan with discretionary company match and access to a financial advisor

  • Vacation, paid holidays, floating holiday and sick days

  • Employee discounts

  • Free weekly catered lunch

  • Dog-friendly office (available at select locations)

  • Free gym access (available at select locations)

  • Free swag giveaways

  • Annual Holiday Party

  • Invitations to pop-ups and other company events

  • Complimentary daily office snacks and beverages

#LI-AR1

Pay Range

$101,400 - $148,700 USD

Posted 2025-09-22

Recommended Jobs

Senior Software Engineer

Earnin
Palo Alto, CA

ABOUT EARNIN As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living p…

View Details
Posted 2025-09-22

Software Engineer - Foundations

Modern Treasury
San Francisco, CA

OVERVIEW This position is based out of San Francisco Bay Area or New York City. We're looking for a software engineer focusing on core infrastructure to join the Modern Treasury Engineering team.…

View Details
Posted 2025-09-14

Sales Associate

Meissner Sewing & Vacuum Centers
Roseville, CA

About Us: At Meissner Sewing & Vacuum Centers, we’re more than just a store—we’re a legacy. Founded in 1930 and still family-owned, we take pride in being Northern California’s premier destination f…

View Details
Posted 2025-08-25

Home Care Aide - Live-in Caregiver

Interim HealthCare of Vacaville, CA
San Francisco, CA

Job Description Job Description Caregiver / Home Care Aide (HCA) in  San Francisco  Shift is Monday - Saturday 2pm - Monday 2pm If caring for others comes natural to you, this is an ideal c…

View Details
Posted 2025-07-29

Explore Palo Alto: Where Innovation Meets Compassionate Care

NurseRecruiter
Palo Alto, CA

Registered Nurse - Pediatric Emergency Room - Travel - (Peds ER RN - Pedi ER RN) Take your nursing career to new heights as a Pediatric Emergency Room Registered Nurse in the vibrant, innovative city…

View Details
Posted 2025-08-16

Explore Vibrant Santa Clara as a Pediatric RN!

NurseRecruiter
Santa Clara, CA

Registered Nurse - Pediatric - Travel - (Peds RN - Pedi RN) Explore vibrant Santa Clara as a Pediatric RN! Join a dedicated team in a leading medical center where your skills will directly impact you…

View Details
Posted 2025-08-18

Member Service Representative

Yolo Federal Credit Union
Woodland, CA

We are recruiting for a Member Service Representative (33.5 hours per week) for our Woodland Branch. If you thrive on providing top-notch service, love helping your community, and enjoy teamwork, then…

View Details
Posted 2025-09-10

Quality Inspection Supervisor

GKN Aerospace Services Limited
Garden Grove, CA

Quality Inspection Supervisor Date: Aug 12, 2025 Location:Garden Grove, CA, US Company: GKN Aerospace Careers Fantastic challenges. Amazing opportunities.   GKN Aerospace …

View Details
Posted 2025-09-10

Product Sales Representative (Remote)

Stratford Davis Staffing LLC
Encinitas, CA

Join Stratford Davis Staffing as a Remote Product Sales Representative! Are you ready to take control of your career, unlock your earning potential, and enjoy the freedom of working from anywhere? …

View Details
Posted 2025-08-31

Physical Therapist

Palm Careers
Salinas, CA

Physical Therapist (Full-Time) We are currently seeking skilled and compassionate Physical Therapists (PT) to join our dynamic team at a not-for-profit, community-based Acute Care Hospital designate…

View Details
Posted 2025-07-31