I am a Data Enthusiast professional with 6.5 years of experience working with Healthcare and Financial data using Python and SQL. I have always been a numbers person, with exceptional statistical, mathematics and computer skills. I have successfully led and managed multiple projects at the same time, keeping the client satisfaction as the highest priority. As much as I am into data manipulation, it is the analysis of data that gets me going. I like to explore the relationships between numbers and translate digits and spreadsheets into stories. I take pride in my ability to make data accessible to both executive decision-makers and frontline sales staff. On a personal level, I am detail-oriented, organized, precise in my work and have willingness to learn new things. I have strong communication skills with a knack for clear and illuminating presentation skills. I am comfortable facing the numbers on my own, but I enjoy being part of a motivated team of smart people. I have aquired ability to analyze large datasets performing statistical quantitative analysis of trends to provide strategic direction to business. I am well versed in Data Engineering, Data Management, Data Warehousing, Business Intelligence, Data Analytics, working with Big Data cloud based technologies like Azure, AWS, GCP, Data Science, Application Development, Web Development and Project Management over the course of time.
Chewy is the most trusted and convenient destination for pet parents.
• Designed analytical solutions for business users that provided the data needed to operate the business.
• Designed, developed, implemented, tested, documented, and operated large-scale, high-volume, high-performance, faulttolerant data structures for analytics platform using Snowflake and AWS.
• Implemented data ingestion routines both real time and batch processing (full load as well as incremental load) using best practices in data modelling, ETL/ELT processes.
• Resolved common data integration challenges like converting data types, handling errors, and translating data using PySpark on Databricks and Python.
• Integrated data from different flat files like CSV, Parquet files, data sources like Postgres, SQL Server, CloverDX using SQL.
• Orchestrated new data pipelines using Airflow and upgraded the code from development to staging and production environment using Terraform.
• Built and deployed automated Jenkins jobs for running Terraform code.
• Worked with different AWS cloud technologies like Airflow Managed Service, S3, Lambda, Kinesis, DynamoDB, Secrets Manager and Transfer Family to build end to end data solutions.
• Participated in technical architecture decisions, reviewed code and gave meaningful feedback that improved the quality of application.
• Monitored and troubleshooted data mart issues and performed regular backups and disaster recovery procedures.
CVS Health is the leading health solutions company, delivering care like no one else can.
• Conducted walk-through, requirements gathering meetings and worked with the stakeholders for gathering specifications.
• Implemented a large cloud data warehouse platform to host Clinical Trials Services business on Microsoft Azure and Snowflake.
• Implemented strategy, system architecture design and road map to establish & maintain a comprehensive patient database across multiple lines of business.
• Assisted in building data model, design process and governance for mapping patients to diseases.
• Developed data pipelines to extract and transform different data types (including structured/semi- structured) from multiple sources and scale/containerize using orchestration tool built using Python and Kubeflow on Azure Kubernetes Service (AKS).
• Designed, developed, tested, deployed, and continually improved existing data framework using Python and SQL.
• Imported and exported data to and from multiple data sources using API calls, RDBMS, flat files, parquet files etc.
• Worked on multiple Feasibility study data analyses that uncovered business worth of $123 Million in total sales.
• Worked with cross-functional teams to design, develop product, implement features, and deploy them by coordinating with teams like IT project team, Data Science team, Release Management DevOps (CI/CD) team and other partners.
• Implemented automaton using Python to run standard Feasibility SQL queries which led to 70% reduction in manual effort.
• Identified, assessed, and resolved complex analytical problems including data sourcing, data ingestion, data transformation issues, potential risks, facilitated issue resolution, risk mitigation, and implemented data standardization.
• Mentored new team members to bring them up to speed on the product implementation and assisted with technical difficulties.
• Conducted code review and suggested improvements in existing workflow/data processes.
• Led and managed offshore resources to support operations on a day-to-day basis, standardized escalation hierarchy and established SOP/guidelines to support SLA per customer needs.
NeoXam is an international publisher of software solutions in the financial sector.
• Designed and updated existing data models based on the requirements after interacting with non-technical business users.
• Collaborated with enterprise data warehouse, data governance, and business teams on data quality issues and architecture of data repositories.
• Wrote high quality complex SQL queries using joins, nested subqueries, and temporary tables to generate reports for the stakeholders.
• Created end to end data pipeline from multiple databases like MS SQL Server, Oracle, and MySQL and transformed data into staging layer to have an ultimate single source of truth of data ready for the end users.
• Developed and tested extraction, transformation, and load (ETL) processes using various sources like XML, Excel, csv, REST and SOAP API, web services and JMS messages.
• Automated multiple routine tasks for end users using Python and shell scripting, which reduced the manual work by 95%.
• Implemented data wrangling, cleaning, transforming, merging, and reshaping data frames using PySpark on Databricks.
• Hosted and managed DataHub application on the cloud using AWS.
• Developed Tableau visualizations and dashboards integrating data from multiple data sources using Data Blending.
• Provided project management leadership following Kanban methodologies, to have streamlined execution of the projects.
• Analysed and implemented support procedures to solve various issues faced by the clients working on DataHub.
• Performed quality assurance testing and code migration from Dev to Test and Production environments.
GMO LLC is an Asset Management firm.
• Applied Agile & Scrum methodologies, following full software development life cycle (SDLC) gathering requirements, designing, developing, documenting, testing, and deployment in Business Intelligence.
• Wrote optimized SQL queries, designed, and implemented tables, functions, and views to generate verification reports.
• Designed ETL data pipeline templates for populating enterprise data warehouse and built scripts to generate direct load, SCD 1 and SCD 2 SSIS mappings.
• Migrated QlikView dashboards to Tableau following OLAP and Data Modelling concepts to generate integrated dynamic data visualizations for Data Analysis and finding Key Performance Indicators.
• Optimized pre-existing stored procedures joining multiple tables to implement business logic using MS SQL Server resulting in reduction of execution times by 2 mins on an average.
Coursework: Advance Database Management, Data Warehousing and Business Intelligence, Big Data Analytics, Advance Data Science, Project Management, Application and Web Development, Web Designing
GPA: 3.5/4.0
Coursework: Database Management, Data Warehousing and Data Mining, Data Analysis, Data Science, Application Development, Data Structures and Algorithms, Web Designing
GPA: 3.8/4.0
Python, SQL, Java, Linux/Shell, R, Html, CSS, JavaScript.
Snowflake, Teradata, MS SQL Server, MySQL, Oracle, PostgreSQL, SSIS, Talend, Alteryx, DataHub.
Microsoft Azure, ADLS, Data Factory, Synapse, Event Hubs, AWS, Glue, S3, EC2, Kinesis, Redshift, EMR, QuickSight, Lambda, GCP, BigQuery, Dataflow, Dataproc, Cloud Composer.
Hadoop, Spark, Databricks, Kafka, MapReduce, Hive, Pig, HBase, MongoDB.
Kubernetes, Docker, Kubeflow, Airflow.
Tableau, QlikView, QlikSense, PowerBI, SSRS, Gretl.
Linear Regression, Logistic Regression, Random Forest, Decision Tree, Support Vector Machine, Convolutional Neural Networks, Deep Belief Network, Bayesian Network.
Git, GitLab, GitHub, Bitbucket, JIRA, Rally, SharePoint, Confluence.
Built a Product Recommendation System using historical order logs of an online retail store using AWS technologies.
Built a Product Recommendation System using historical order logs of an online retail store using AWS technologies.
Built three interactive games using Python.
Built an end to end life cycle of a typical data engineering project on AWS. The purpose of the project is to understand how AWS components interact and chain together in order to setup a data warehouse in AWS Redshift from scratch.
Built an automated data pipeline to fetch aircraft data from the FlightAware website. The app fetches requested aircraft data and builds some analysis on top of it.
Analyzed Olympic Athlete dataset using QLikSense for drawing insights.
Built a complete robust Geodemographic Segmentation Model for Churn Investigation of a Bank dataset. Applied Logistic Regression, drew Cumulative Accuracy Profile curve.
Scraped the reviews, stemmed them, performed Sentiment Analysis and created a Sparse Representation. Performed classification of Support Vector Machine(SVM), Convolutional Neural Networks and Deep Belief Networks on the sparse output.
Developed an enterprise level java swing application (Internet of Things) to help manage Traffic Congestion using a complex algorithm. Implemented Ecosystem Model using Polymorphism, Abstract Classes, Singleton pattern, Enumeration, factory design pattern along with the concept of work requests, work queues. Integrated JFreeChart API for graphically analyzing aggregated data through congestion sensors.
Managed a project for Asset Management Firms by preparing Project Goals, Timeline, Milestones, Risks, Dependencies, Functional Requirements and Non-Functional Requirements.
Built a social networking website, where one can make friends, send/receive messages, upload photos, see one's list of friends, and other networking features.
Loaded Fact table integrating data from cross table, data from QVDs, data from excel, microsoft access and xml. Removed synthetic keys, resolved circular loops, generated data in the script, added Preceding load to the script, created master calendar, created link table, handled slowly changing dimensions using IntervalMatch function, added Metadata and Section Access for Security.
Visualized ad-hoc A-B test, created Bins, classification tests for analyzing variables, combined multiple charts together, and validated mined data using Chi-Squared test in Tableau.
Derived insights using Bar & Line Charts, Pie Charts, Geographic Maps, Scatter Plots, Groups, Area Charts, Filters, Dual Axis Charts, Crosstabs and Calculations of a Sales Dataset using Tableau.
Used Chicago Criminal Activity dataset for analyzing and plotting graphs using Python. Made use of Pandas, NLTK, Numpy, dplyr, Scikit-learn, Datetime, Matplotlib, Seaborn, Dataframes and other Python libraries.
Cleaned the Insurance data, applied Principle Component Analysis, Linear Regression, Decision Tree and SVM Regression to predict results in R.
Analyzed NY Times data for different types of articles and community reactions related to Presidential elections. Data was scraped using various RESTful APIs such as article-search api, community-api.
Different types of emails were observed and analyzed when they were sent. Jeffrey Skilling, the CEO of Enron, was involved in the scandal. His email directory was analyzed to find out other users, he interacted most with.