Career Profile

Experienced Big Data consultant worked on heavy data load Bigdata projects for multiple clients of Banking, Healthcare, Supply chain, E-Commerce, and Retail Domain.

Experiences

Senior Big Data Consultant

2019 - Present
DBS Bank, Singapore​

Working as Lead for Ingestion framework team of 5 developers and maintaining two Ingestion frameworks running 1000s of daily ingestions in six environments along with its related services.

  • Successfully released a new generation of metadata driven ingestion framework with user interface during this time of pandemic. Developed jobs and utilities to migrate old clusters to new clusters.
  • Working with architect and project manager to discuss and deliver new features and enhancements
  • Responsible for delivery of stories in sprint and backlog grooming.

Digital Data Consultant

2017 - 2019
Accenture AI, India

Worked with one of the largest banks in APAC to build an Enterprise Big Data Platform using open source tools and a design principle focused on flexibility and transparency.

  • Lead a team of 5 developers from offshore and helped project teams with Ingestion issues and queries.
  • Worked on Spark based Ingestion framework. Developed and maintained jenkins pipelines.
  • Develop a test framework to automated testing of use cases.
  • Worked on Talend based Ingestion framework for data ingestion into data lake. I worked on features like Rest API ingestion and Json data parser.
  • Worked on IBM MQ and Kafka based streaming Ingestion POC on top of Pivotal Cloud Foundry and hadoop cluster. As a part of this, I also worked on xml to text parser to convert xml messages from MQ

Associate Consultant

2016 - 2017
CoreCompete LLP, India

Worked on a ​forecasting based inventory management platform for a leading automotive aftermarket parts provider in the USA for ​Inventory​ replenishment.

  • Developed aws emr based spark modules for replenishing stores in a store family due to Assortment change, Route change and Store close.
  • Developed PySpark framework to run SQL files in spark cluster.
  • Developed automated script to launch aws emr cluster along with required input hive tables.
  • Worked on a migration project of SAS to Hadoop based data pipeline.
    • Worked on converting SAS to Hive & Spark based data pipelines.
    • Develop SQOOP based scripts to automate data loading from databases like SAP HANA, DB2, SQLServer etc.
  • Worked with the SAS Data Science team to prepare data pipelines using pyspark,Spark SQL in AWS Cluster.
  • Consult/Help/guide peers on big data related problems and solutions.
  • Help on debugging issues in hadoop, Hive, Spark jobs.
  • Involved in Big Data engineer recruitment pane

Senior Software Engineer

2013 - 2016
Zaloni Technologies, India

Zaloni has a data management and governance tool named Bedrock, which helped in managing data ingestion pipelines and its metadata. I worked on the first production project of Bedrock and successfully deployed to a Fortune 500 healthcare company.

  • Developed Map Reduce based framework for data ingestion in Bedrock having features of record counting, tokenization, watermarking, Schema validation and file duplications check..
  • Developed a shell script based framework for data extraction from database sources using SQOOP , Hive, Hcatalog etc. supporting full load and Incremental loads on top of Bedrock tool
  • Worked as tech lead for many data offloading projects and POCs using Bedrock.
  • Worked on a project to migrate and merge all existing data ingestion pipelines to merge hadoop map reduce workflow

Software Engineer

2012 - 2013
Imababa Solutions and Services, India

As a Startup company, I worked on a lot of POCs and product concepts.

  • Involved in requirement gathering and technical discussion to formulate development plans.
  • Developed web applications and backend modules.
  • Helped frontend developers with necessary apis.
  • Contributed to discussion while designing database models.

Technical Skills

Java , SQL, Scala, Python, Groovy, Shell Script, SAS Base

Spark, Kafka, Hadoop, Hbase, PySpark, Airflow, Talend, S3, Hive, Sqoop, Hue, Ranger, Avro, Parquet, Snappy , Oozie etc.

CDH, AWS, HDP, Map​R​, PHD, Alluxio etc.

Jenkins, OpenShift, Pivotal Cloud Foundry ,Git, Maven, Jenkins, Jira, Bitbucket, Collibra.