VP, Site Reliability Engineer, Group Consumer Banking and Big Data Analytics Technology
Singapore, Singapore | DBS Bank
Industry:Banking / Investment Banking
Functions:Financial Services Professional
IT / Information Technology
Job Description:111 people have viewed this job
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
This role is within the Branch and Self Service Banking Technology platform (Consumer Banking Grou), which supports, maintains and develops our ongoing capability for all Consumer Banking Retail Branches & the Self Service channel (ATM’s, VTM’s etc) - in Singapore and across multiple regions – HK, IN, ID, TW.
Our platform is undergoing a modernisation program, and this is a unique opportunity to build modern, system critical operational capabilities into state of the art platforms built on cloud native architecture patterns.
This role is dual purpose:
To define & build our future state Site Reliability Engineering (SRE) practices, process, tools & capabilities across the team.
Lead operational incident resolution for critical systems when they occur.
The position would comprise of approximately equal focus on both software development and operation disciplines. This position will also develop software to automate operational processes along with coding for the shared engineering backlog deliverables.
The ideal candidate will have a blend of experience across:
Infrastructure deployment & operations (true understand of how underlying infrastructure works)
Software development – you are expected to code on the job, build automation
Monitoring & Observability – implementation and operation, to measure availability, latency, performance, efficiency,
Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics
Develop software to automate manual operational work.
Engage with both the development and support teams throughout the life cycle to help build for reliability.
The workload for the position is multifaceted and would include:
Close working collaboration with development and application support teams through SDLC; to maintain and improve the service against established Service Level Objectives by applying software engineering principles.
Responsible for the availability, performance, change management, monitoring, and capacity management of their services.
Incident manage, troubleshoot business critical incidents, conduct post-mortems and ensure permanent closure of the incidents.
Analyse patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
Manage the efforts to split between manual operational work and engineering work.
Work with partner organizations and vendors to provide solutions to current business issues.
Participate in a shift model covering 24x7x365 support.
Experience with CI/CD pipelines &tooling (for example: Jira, Jenkins, Maven SonarQube, Fortify, NexusIQ etc.)
Good understanding of cloud native architecture, microservices, data management principles, big data, middleware technologies & distributed computing
Proven experience with cloud platforms (AWS, PCF) is preferred.
5+ years of scripting/software experience (bash, python, java and perl)
Familiarity and working experience on DevOps testing and release techniques (i.e. A/B Testing, Blue / Green Deployments and Canary Release, etc…)
Working knowledge on DevOps tools/technologies (Docker, Kubernetes, OpenShift) will be preferred.
Knowledge of database technologies (MariaDB/MySQL, etc..)
Strong understanding of all LINUX security best practices
Extensive experience in application/system/network performance and availability monitoring (Grafana, Vizceral, Tivoli, Splunk, etc..)
Proven technical leadership experience, including the ability to quickly understand an issue, appropriately / efficiently troubleshoot to detailed levels and direct swift resolution.
Strong ability to take ownership of issues and drive resolution across teams.
Assertive personality and drive improvement across environment.
Effective written and verbal communication skills.
Ability to develop strong client relationships and partner with technology engineering teams.
Already a member? Sign In