Profile Summary
Data Engineer with excellent understanding of distributed systems, parallel processing and building and designing scalable Data Pipelines.
Professional Experience
- High proficiency in Data warehousing and Business Intelligence.
- Experience in Design, document and implementation of ETL pipelines using Informatica Powercenter and Python libraries.
- Hands on experience in creation of PL/SQL procedures, functions and triggers.
- Experience in performance tuning of SQL and PL/SQL.
- Extensively used different transformations in Informatica Powercenter such as Source Qualifier, Expression, Lookup, Filter, Update Strategy, Router, Normalizer, and Sequence Generator.
- Implemented complex SQL queries while developing reports in SAP Business Objects.
- Working experience in migration of Java implementation of Big data analytics application to Python.
- Worked on AWS Cloud services such as S3, Redshift, EMR and RDS.
- Strong knowledge on Big Data Technologies – Hadoop, Hive and PySpark. Exposure to Scaled Agile Framework working environment.
- Strong understanding Software Development Lifecycle.
- Good knowledge on Version Control Systems such as Git.
SKILLS
Operating System: Linux
• Languages: Python (Proficient), SQL, Statistics
• Frameworks/ Libraries: Scikit-Learn, TensorFlow, Keras, NLTK, SpaCy, Flask
• ML tools: Amazon Sage Maker, WEKA, Rapid Miner, Dataiku, Data Robot, Azure ML Studio
• Big Data Stack: Spark and Hadoop ecosystem
• IAAS Platform: AWS, Azure
• Orchestration Framework: Airflow
• Visualization tool: Excel, PowerBI, Tableau
• Subversion & Reporting tool: Git & JIRA, DVC
• Soft Skills: Written and Oral Communication, Life Long Learner, Collaboration, Business Acumen, Reliable and Consistent
Senior Data Engineer
- Requirement Gathering for the all the PPM Segments from FE. Designed and implemented cloud-based application programs an analytical data structures using AZURE Cloud.
- Designed and implemented batch processing on history data using Azure Databricks, ADF and performed analytics using PySpark.
- Setting up the Data Pipeline in ADF and CI/CD using Azure Devops Prepared documentation and analytic reports effectively and efficiently delivering summarized results, analysis, and conclusions to stakeholders.
- Monitored incoming data analytics requests, executed analytics, and efficiently distributed results to support development strategies.
- Managed Production Go- Live Activities from Client side Working as a single resource for the overall project. Handling all the
- Project management, client communication and overall development of the Data Asset and Productionization of the Project Publishing the Reporting Asset to Dremio and Power BI
Data Engineer
- Spark, Scala, AWS, Databricks, Azure Devops, Codeway Building IntuitiveReports from Raw Data [ETL]
- Development of spark jobs in Scala to perform different data cleansing activities to clean the input data.
- Developing spark jobs to apply different transformations in data, create reports, and store in AWS redshift.
- Working with data frames to store and report data. Scheduling jobs through Databricks and Airflow and providing production support in case of failures.
- Migrating Spark Jobs from AWS EMR to Databricks Data Ingestion and Sync Process
- Crafted a hybrid spark structured streaming job optimized to mitigate small files, metadata growth, data retention, and data lag while delivering
- Successfully conducted POC on Delta Lake and Apache Hudi for streaming application sink and source
- Dynamically partition big critical datasets, thereby reducing processing. Data Analytics and Machine Learning Operations Maintenance and enhancement of machine learning services.
- Performing data pre-processing to send data to machine learning models using python (Pyspark) and Databricks.
- Getting a thorough understanding of the functioning of different classification models like random forest and Time Series Forecasting ARIMAmodel. Daily publishing of data insights report maintaining data quality