Main Responsibilities and Required Skills for Big Data Engineer

A Big Data Engineer is responsible for designing and developing big data applications and data visualization tools. They collect and present data for reporting and build data pipelines. In this blog post we describe the primary responsibilities and the most in-demand hard and soft skills for Big Data Engineers.

Get market insights and compare skills for other jobs here.

Main Responsibilities of Big Data Engineer

The following list describes the typical responsibilities of a Big Data Engineer:

Access

Access and act on the right cross-channel KPIs, dashboards, reports and AI-powered insights.

Address

Address area-level risks, provides and implements mitigation plan.

Analyze

Analyze and develop data set processes for data ingestion, modeling and mining.
Analyze and solve problems at their root, stepping back to understand the broader context.
Analyze, recommend and implement improvements to support Corporate initiatives for EDP.

Architect

Architect and build data pipelines for both real-time telemetry and data warehousing.

Assemble

Assemble large, complex data sets that meet functional / non-functional business requirements.

Assist in

Assist in building a sustainable big-data platform.
Assist with prototyping emerging technologies involving.

Author

Author clear technical documentation.

Automate

Automate CI and deployment processes and best practices for the production data pipelines.
Automate test coverage for data pipelines.

Benchmark

Benchmark the performance in line with the non-functional requirements.

Build

Build AI / ML model based alert mechanism and anomaly detection system for the product.
Build and maintain a framework.
Build a product to process large amount data / events for AI / ML and Data consumption.
Build complex Data Engineering workflows.
Build complex SQL queries using MongoDB, Oracle, SQL Server, MariaDB, MySQL.
Build data lake on Azure cloud.
Build data products that reduce friction to enable our marketing initiatives to pivot quickly.
Build high-performance algorithms, prototypes, and proof of concepts.
Build large-scale data processing systems using cloud computing technologies.
Build out of strong development unit test practices, with a goal of automated regression testing.
Build the infrastructure required for optimal.

Collaborate with

Collaborate in the design, development, test and maintenance of scalable data management solutions.
Collaborate with IT and business area partners on work groups and initiatives.
Collaborate with other teams.

Collect

Collect and present data for reporting and planning.

Communicate

Communicate systems issues at the appropriate technical level for each audience.
Communicate with internal teams and stakeholders to understand project requirements.

Conduct

Conduct Root-cause analysis of data issues.

Configure

Configure and manage connection process.

Content

Content Units including News and Local Services.

Continue

Continue to build knowledge of the company, processes and customers.

Contribute to

Contribute significantly to architectural decisions around our data.
Contribute towards shaping the architecture, design and scalability of our processes and pipelines.

Create

Create all necessary documents and communicate to the team in support of the project.
Create and develop data pipelines for new sources and uses of data across Mojio.
Create and maintain data warehouse schemas and ETL processes.
Create and maintain optimal data pipeline architecture.
Create and maintain optimal data pipeline architecture to meet business needs.
Create a startup mentality to accelerate the introduction of new capabilities and transform teams.
Create complex data solutions and build data pipelines.
Create data tools for analytics and data scientist team.
Create documentation to support knowledge sharing.

Define

Define data retention policies.
Define metrics for tracking how customers are interacting with products and service.
Define standards and best practices for end to end development lifecycle.

Design

Design and build data processing solutions, and improve current ones.
Design and build the infrastructure for data.
Design and code (Java, Scala, Spark) solutions to support common and strategic data sourcing needs.
Design and develop big data applications and data visualization tools.
Design and develop highly scalable and extensible data pipelines from internal and external sources.
Design and implement components of our Next Generation Platform.
Design and scale databases and pipelines across multiple physical locations on cloud.

Determine

Determine best course of action for meeting business needs.

Develop

Develop a data model around stated use cases to capture client's KPIs and data.
Develop and automate data quality checks.
Develop and enhance platform best practices.
Develop and maintain ETL processes using SSIS, Scripting and data replication technologies.
Develop and operate our data pipeline & infrastructure.Develop code using Python, Scala, R languages.
Develop data models and mappings.
Develop data processing scripts using Spark.
Develop data profiling, deduping logic, matching logic for analysis.
Develop expertise in developing microservices and hosting them on our platform.
Develop expertise in Golang / microservices.
Develop HA strategies, including replica sets and sharding to for highly available clusters.
Develop highly scalable and extensible data pipelines from internal and external sources.
Develop innovative solutions to Big Data issues and challenges within the team.
Develop parallel algorithms and data processing using Apache big-data stack (like Hadoop, Kafka.
Develop parallel data-intensive systems using Big Data technologies.
Develop Python, PySpark, Spark scripts to filter / cleanse / map / aggregate data.
Develop set processes for data mining, data modeling, and data production.
Develop solutions that put clients first.
Develop the robust and monitorable datapipeline and related services.

Document

Document and communicate product feedback in order to improve user experience.

Drive

Drive and support automation and integration of infrastructure and system processes.

Elevate

Elevate code into the development, test, and Production environments on schedule.

Ensure

Ensure self and peers are actively seeking ways to objectively measure productivity.
Ensure systems meet business requirements and industry practices.
Ensure that objects are modeled appropriately.
Ensure the Hadoop platform can effectively meet performance & SLA requirements.

Estimate

Estimate engineering work effort and effectively identify and prioritize the high impact tasks.

Evaluate

Evaluate and provides feedback on future technologies and new releases / upgrades.
Evaluate the efficiency of software / product releases and conduct read outs on results.

Execute

Execute basic to moderately complex functional work tracks for the team.

Expand

Expand and grow data platform capabilities to solve new data problems and challenges.

Explain

Explain technical considerations at related meetings, including those with internal clients.

Explore

Explore and evaluate new ideas and technologies.

Follow

Follow architecture standards.
Follow build and automation practices to support continuous integration and improvement.
Follow industry-standard agile software design methodology for development and documentation.
Follow software development methodology.

Help

Help design and implement components of Next Generation Platform.

Identify

Identify and develop Big Data sources & techniques to solve business problems.
Identify and communicate technical problems, process and solutions.
Identify and resolve issues, bugs, and impediments.
Identify, design, and implement internal process.

Implement

Implement and manage large scale ETL jobs on Hadoop / Spark clusters in Amazon AWS / Microsoft Azure.
Implement security measures by encrypting sensitive data.

Improve

Improve database tables, views, processes and storage to be more efficient and save costs.

Influence

Influence within the team on the effectiveness of Big Data systems to solve their business problems.

Initiate

Initiate and conduct code reviews, create code standards, conventions and guidelines.

Integrate

Integrate platform into the existing enterprise data warehouse and various operational systems.
Integrate third party products.

Integrate

Integrate these solutions with the architecture used across the company.

Interface with

Interface with customers, understanding their requirements and delivering complete data solutions.

Investigate

Investigate and integrate up-and-coming big data technologies into existing requirements.
Investigate issues reported by testing teams to determine impact, root cause, and solve them.

Lead

Lead functional and architectural design of assigned areas.
Lead in prototyping emerging technologies involving.
Lead others to solve complex problems.
Lead technical efforts, including design and code reviews, and mentor staff appropriately.
Lead work and deliver elegant and scalable solutions.

Learn

Learn from deep subject matter experts through mentoring and on the job coaching.
Learn how to use our application platform.

Maintain

Maintain and incrementally improve existing solutions.

Make

Make a significant contribution towards Infoblox's big data pipeline.
Make our data lake run like a core service.
Make significant contributions towards design and development.
Make sure design decisions on the project meet architectural and design requirements.

Manage

Manage and implement data processes (Data Quality reports).
Manage own learning and contribute to technical skill building of the team.
Manage system / application environment and ongoing operations.

Optimize

Optimize queries, data models, and storage formats to support common usage patterns.

Own

Own one or more key components of the infrastructure.

Participate in

Participate in an on-call support rotation.
Participate in development of datamarts for reports and data visualization solutions.
Participate in infrastructure and system design of the NCR Data Lake.
Participate in periodic team on call rotations supporting all our Big Data platforms.
Participate in strategic planning discussions with technical and non-technical partners.

Perform

Perform a range of assignments related to job discipline.
Perform code reviews and supports SQL optimization and tuning.
Perform on-call activities as needed for the environment and technologies.
Perform optimization, debugging and capacity planning of a Big Data cluster.
Perform security remediation, automation and self heal as per the requirement.
Perform tasks such as writing scripts, write SQL queries, etc.

Plan

Plan / schedule tasks, lead small development teams, and mentor junior colleagues.

Present

Present ideas and recommendations on Hadoop and other technologies best use to management.

Process

Process unstructured data into structured data, manage schema of new data.

Provide

Provide follow up Production support.
Provide leadership by mentoring junior DBAs and by leading internal projects and initiatives.
Provide ongoing operations and support for production systems to meet defined SLAs.
Provide oversight and guidance to our Data Engineering development team.
Provide RDMS support for MasterCard Applications.
Provide support, on-going maintenance, and required modifications to multiple Hadoop environments.
Provide technical assistance to junior team members and to colleagues across the organization.
Provide the skills to consistently search for improved methods to provide customer service.
Provide verifiable technical solutions to support operations at scale and with high availability.

Recommend

Recommend and implement solutions to improve performance, resource consumption, and resiliency.
Recommend technological application programs to accomplish long-range objectives.
Recommend ways to improve data reliability, efficiency and quality.

Recruit

Recruit, mentor, build and motivate the IT teams that will positively impact our business.

Research

Research, design, implement and test technology solutions.
Research modern technologies to solve unique challenges.
Research new uses for existing data.
Research opportunities for data acquisition and new uses for existing data.

Resolve

Resolve alerts and perform remediation activities.

Review

Review and test code changes in lower environments.
Review code and provide feedback relative to best practices and improving performance.

Seek

Seek to understand the data being worked with as its often unstructured data sets.

Specialize

Specialize in data egestion (from the Entrprise Data Lake to anyalitical and operational systems).
Specialize in data governance and security of data assets.
Specialize in making trusted data available and accessible to the users.

Submit

Submit change control requests and documents.

Suggest

Suggest technical and functional improvements to add value to the product.

Support

Support Cloud Initiatives.
Support Data pipeline with bug fixes, and additional enhancements.
Support enterprise Big Data platforms in AWS including EMR, Presto, Spark, and Ranger.
Support IAAS and Devops initiatives for infrastructure delivery transformation.
Support MercuryPlus Data delivery effort.
Support storage retention and disposition of data.
Support TMX internal / external users for application related inquiries.

Take

Take ownership of design and implementation of scalable and fault tolerant projects.

Test

Test deliverables against a user story's acceptance tests.

Train

Train and mentor staff with less experience.

Transform

Transform the data to create a consumable data layer for various application uses.

Understand

Understand deeply how to build data warehouses and data marts.
Understand merging medically coded data across coding types such as SNOMED CT, ICD10, CPT, CCS, etc..

Use

Use Spark to implement truly scalable ETL processes.

Work with

Work closely with other team to ensure that features meet business needs.
Work closely with team members from across Mastercard to identify functional and system requirements.
Work closely with the engineering team.
Work closely with various cross-functional product teams.
Work in a small agile team to deliver highly optimized batch and real-time data processing.
Work on Data and Analytics Tools in the Cloud.
Work on deployment and making sure products are production-ready and function smoothly.
Work on geographically dispersed team embracing Agile and DevOps principles.
Work on Performance Tuning and Increase Operational efficiency on a continuous basis.
Work to establish a Hadoop efficiencies on our Cloudera stack.
Work to identify gaps and improving the platform's quality, robustness, maintainability, and speed.
Work with infrastructure, security, and other partners.
Work with senior stakeholders to develop a clear understanding of requirement drivers.

Write

Write programs, develops code, tests artifacts, and produces reports.
Write the system / technical portion of assigned deliverables.

Most In-demand Hard Skills

The following list describes the most required technical skills of a Big Data Engineer:

Python
Spark
Java
Hive
Scala
AWS
Kafka
Hadoop
SQL
Azure
Hbase
Hdfs
Cassandra
Big Data Technologies
Sqoop
PIG
CS
GIT
Nosql Databases
CE
EE
GCP
Design
Designing
ETL
Oozie
Docker
Jenkins
Big Data
Mongodb
Storm
Cloud
Hadoop Ecosystem
Kubernetes
Microservices Architecture
Batch
Data Warehousing
Dynamodb
EMR

Most In-demand Soft Skills

The following list describes the most required soft skills of a Big Data Engineer:

Written and oral communication skills
Problem-solving attitude
Analytical ability
Organizational capacity
Interpersonal skills
Collaborative
Curious
Leadership
Innovation
Attention to detail
Creative
Passion for deep technical excellence
Personal qualities
Tenacity
Multi-task
Passion for learning
Adaptable to changes
Time-management
Flexible
Presentation
Team player
Teamwork
Troubleshooting skills

Main Responsibilities and Required Skills for Big Data Engineer

Main Responsibilities of Big Data Engineer

Access

Address

Analyze

Architect

Assemble

Assist in

Author

Automate

Benchmark

Build

Collaborate with

Collect

Communicate

Conduct

Configure

Content

Continue

Contribute to

Create

Define

Design

Determine

Develop

Document

Drive

Elevate

Ensure

Estimate

Evaluate

Execute

Expand

Explain

Explore

Follow

Help

Identify

Implement

Improve

Influence

Initiate

Integrate

Integrate

Interface with

Investigate

Lead

Learn

Maintain

Make

Manage

Optimize

Own

Participate in

Perform

Plan

Present

Process

Provide

Recommend

Recruit

Research

Resolve

Review

Seek

Specialize

Submit

Suggest

Support

Take

Test

Train

Transform

Understand

Use

Work with

Write

Most In-demand Hard Skills

Most In-demand Soft Skills

Restez à l'affût du marché de l'emploi dans le sport!