Experience

July 2023 - Current

Data Engineer II

Chewy

Chewy is the most trusted and convenient destination for pet parents.

• Designed analytical solutions for business users that provided the data needed to operate the business.

• Designed, developed, implemented, tested, documented, and operated large-scale, high-volume, high-performance, faulttolerant data structures for analytics platform using Snowflake and AWS.

• Implemented data ingestion routines both real time and batch processing (full load as well as incremental load) using best practices in data modelling, ETL/ELT processes.

• Resolved common data integration challenges like converting data types, handling errors, and translating data using PySpark on Databricks and Python.

• Integrated data from different flat files like CSV, Parquet files, data sources like Postgres, SQL Server, CloverDX using SQL.

• Orchestrated new data pipelines using Airflow and upgraded the code from development to staging and production environment using Terraform.

• Built and deployed automated Jenkins jobs for running Terraform code.

• Worked with different AWS cloud technologies like Airflow Managed Service, S3, Lambda, Kinesis, DynamoDB, Secrets Manager and Transfer Family to build end to end data solutions.

• Participated in technical architecture decisions, reviewed code and gave meaningful feedback that improved the quality of application.

• Monitored and troubleshooted data mart issues and performed regular backups and disaster recovery procedures.

Mar 2021 - July 2023

Data Engineer - Senior Consultant

CVS Health

CVS Health is the leading health solutions company, delivering care like no one else can.

• Conducted walk-through, requirements gathering meetings and worked with the stakeholders for gathering specifications.

• Implemented a large cloud data warehouse platform to host Clinical Trials Services business on Microsoft Azure and Snowflake.

• Implemented strategy, system architecture design and road map to establish & maintain a comprehensive patient database across multiple lines of business.

• Assisted in building data model, design process and governance for mapping patients to diseases.

• Developed data pipelines to extract and transform different data types (including structured/semi- structured) from multiple sources and scale/containerize using orchestration tool built using Python and Kubeflow on Azure Kubernetes Service (AKS).

• Designed, developed, tested, deployed, and continually improved existing data framework using Python and SQL.

• Imported and exported data to and from multiple data sources using API calls, RDBMS, flat files, parquet files etc.

• Worked on multiple Feasibility study data analyses that uncovered business worth of $123 Million in total sales.

• Worked with cross-functional teams to design, develop product, implement features, and deploy them by coordinating with teams like IT project team, Data Science team, Release Management DevOps (CI/CD) team and other partners.

• Implemented automaton using Python to run standard Feasibility SQL queries which led to 70% reduction in manual effort.

• Identified, assessed, and resolved complex analytical problems including data sourcing, data ingestion, data transformation issues, potential risks, facilitated issue resolution, risk mitigation, and implemented data standardization.

• Mentored new team members to bring them up to speed on the product implementation and assisted with technical difficulties.

• Conducted code review and suggested improvements in existing workflow/data processes.

• Led and managed offshore resources to support operations on a day-to-day basis, standardized escalation hierarchy and established SOP/guidelines to support SLA per customer needs.

Nov 2017 - Feb 2021

Implementation Specialist - Data Management

NeoXam

NeoXam is an international publisher of software solutions in the financial sector.

• Designed and updated existing data models based on the requirements after interacting with non-technical business users.

• Collaborated with enterprise data warehouse, data governance, and business teams on data quality issues and architecture of data repositories.

• Wrote high quality complex SQL queries using joins, nested subqueries, and temporary tables to generate reports for the stakeholders.

• Created end to end data pipeline from multiple databases like MS SQL Server, Oracle, and MySQL and transformed data into staging layer to have an ultimate single source of truth of data ready for the end users.

• Developed and tested extraction, transformation, and load (ETL) processes using various sources like XML, Excel, csv, REST and SOAP API, web services and JMS messages.

• Automated multiple routine tasks for end users using Python and shell scripting, which reduced the manual work by 95%.

• Implemented data wrangling, cleaning, transforming, merging, and reshaping data frames using PySpark on Databricks.

• Hosted and managed DataHub application on the cloud using AWS.

• Developed Tableau visualizations and dashboards integrating data from multiple data sources using Data Blending.

• Provided project management leadership following Kanban methodologies, to have streamlined execution of the projects.

• Analysed and implemented support procedures to solve various issues faced by the clients working on DataHub.

• Performed quality assurance testing and code migration from Dev to Test and Production environments.

May 2016 - Dec 2016

Reporting and Business Intelligence Developer

GMO

GMO LLC is an Asset Management firm.

• Applied Agile & Scrum methodologies, following full software development life cycle (SDLC) gathering requirements, designing, developing, documenting, testing, and deployment in Business Intelligence.

• Wrote optimized SQL queries, designed, and implemented tables, functions, and views to generate verification reports.

• Designed ETL data pipeline templates for populating enterprise data warehouse and built scripts to generate direct load, SCD 1 and SCD 2 SSIS mappings.

• Migrated QlikView dashboards to Tableau following OLAP and Data Modelling concepts to generate integrated dynamic data visualizations for Data Analysis and finding Key Performance Indicators.

• Optimized pre-existing stored procedures joining multiple tables to implement business logic using MS SQL Server resulting in reduction of execution times by 2 mins on an average.

Education

Sept 2015 - Aug 2017

Master of Science in Information Systems

Northeastern University

Coursework: Advance Database Management, Data Warehousing and Business Intelligence, Big Data Analytics, Advance Data Science, Project Management, Application and Web Development, Web Designing

GPA: 3.5/4.0

Sept 2010 - Aug 2014

Bachelor of Engineering in Computer Engineering

University of Mumbai

Coursework: Database Management, Data Warehousing and Data Mining, Data Analysis, Data Science, Application Development, Data Structures and Algorithms, Web Designing

GPA: 3.8/4.0

Download Resume

Skills

Programming Languages

Python, SQL, Java, Linux/Shell, R, Html, CSS, JavaScript.

Databases and Data Integration

Snowflake, Teradata, MS SQL Server, MySQL, Oracle, PostgreSQL, SSIS, Talend, Alteryx, DataHub.

Cloud Technologies

Microsoft Azure, ADLS, Data Factory, Synapse, Event Hubs, AWS, Glue, S3, EC2, Kinesis, Redshift, EMR, QuickSight, Lambda, GCP, BigQuery, Dataflow, Dataproc, Cloud Composer.

Big Data Technologies

Hadoop, Spark, Databricks, Kafka, MapReduce, Hive, Pig, HBase, MongoDB.

Containerization and Orchestration

Kubernetes, Docker, Kubeflow, Airflow.

Business Intelligence

Tableau, QlikView, QlikSense, PowerBI, SSRS, Gretl.

Machine Learning

Linear Regression, Logistic Regression, Random Forest, Decision Tree, Support Vector Machine, Convolutional Neural Networks, Deep Belief Network, Bayesian Network.

Other Tools

Git, GitLab, GitHub, Bitbucket, JIRA, Rally, SharePoint, Confluence.

Projects

Formula1 Racing Project using PySpark on Databricks | Azure, Databricks, PySpark, Delta Lake

Built a Product Recommendation System using historical order logs of an online retail store using AWS technologies.

Product Recommendation System | AWS

Built a Product Recommendation System using historical order logs of an online retail store using AWS technologies.

Python Programming using OOPs | Python

Built three interactive games using Python.

Data Engineering on Cloud | AWS

Built an end to end life cycle of a typical data engineering project on AWS. The purpose of the project is to understand how AWS components interact and chain together in order to setup a data warehouse in AWS Redshift from scratch.

Data Pipeline for Flightaware Data Analysis | Python

Built an automated data pipeline to fetch aircraft data from the FlightAware website. The app fetches requested aircraft data and builds some analysis on top of it.

Olympic Athlete Dataset Analysis | QlikSense

Analyzed Olympic Athlete dataset using QLikSense for drawing insights.

Statistical Modelling | SQL, SSIS, Gretl

Built a complete robust Geodemographic Segmentation Model for Churn Investigation of a Bank dataset. Applied Logistic Regression, drew Cumulative Accuracy Profile curve.

Text Mining of Amazon Reviews | R, Excel

Scraped the reviews, stemmed them, performed Sentiment Analysis and created a Sparse Representation. Performed classification of Support Vector Machine(SVM), Convolutional Neural Networks and Deep Belief Networks on the sparse output.

Traffic Congestion Management using IOT | Java, Oracle

Developed an enterprise level java swing application (Internet of Things) to help manage Traffic Congestion using a complex algorithm. Implemented Ecosystem Model using Polymorphism, Abstract Classes, Singleton pattern, Enumeration, factory design pattern along with the concept of work requests, work queues. Integrated JFreeChart API for graphically analyzing aggregated data through congestion sensors.

Project Management for Investment Management Firms | JIRA

Managed a project for Asset Management Firms by preparing Project Goals, Timeline, Milestones, Risks, Dependencies, Functional Requirements and Non-Functional Requirements.

Social Networking Website | HTML5, CSS, Ajax, JavaScript, JQuery, Bootstrap, JSP, Hibernate, MySQL

Built a social networking website, where one can make friends, send/receive messages, upload photos, see one's list of friends, and other networking features.

Northwind Sales Analysis | QlikView

Loaded Fact table integrating data from cross table, data from QVDs, data from excel, microsoft access and xml. Removed synthetic keys, resolved circular loops, generated data in the script, added Preceding load to the script, created master calendar, created link table, handled slowly changing dimensions using IntervalMatch function, added Metadata and Section Access for Security.

Contact

Contact Me

Address

Franklin MA, 02453

Contact Number

+1 (857) 250-7875

Email Address

swarupmishal@gmail.com

I'm Swarup Mishal, a self-motivated professional with a passion to work with data!

And a proactive learner who is intrigued to find insights from data to enhance the business!

About

About Me

Experience

Experience

Data Engineer II

Data Engineer - Senior Consultant

Implementation Specialist - Data Management

Reporting and Business Intelligence Developer

Education

Education

Master of Science in Information Systems

Bachelor of Engineering in Computer Engineering

Skills

Skills

Projects

Projects

Contact

Contact Me

Address

Contact Number

Email Address