Appen

Data Engineer

Job Locations US-Remote
Posted Date 1 month ago(12/9/2021 2:10 PM)
ID
2021-5708
Category
Engineering

Overview

DATA ENGINEER

 

 

Appen’s mission is enable customers to build better AI by creating large volumes of high quality, unbiased training data faster. To accelerate Customer needs and product growth, Appen is building modern data lake and data warehouse. Data Engineers develop modern data architecture and tools to provide end to end data solutions and meet key business objectives.

 

The data engineer will be an integral member of the Data Engineering team.

 

Responsibilities :

 

  • Design, build data models to support structured and unstructured data
  • Design, build and deploy scalable high-volume data pipelines to move data across systems
  • Lead the architecture and implementation of batch and real-time data pipelines with instrumentation
  • Design, build data transformations, metrics and KPI with data governance and data privacy policies
  • Build centralized data lake, data warehouse and visualizations that support multiple use cases across different products for engineering and enterprise
  • Work with Product team to deliver features on time
  • Build data subject matter expertise and own data quality
  • Design and develop software and data solutions that help product, engineering and business teams make data-driven decisions
  • Owning existing processes running in production, problem solving and optimization
  • Partner with data science team to provide quality data for model development and productionizing machine learning models
  • Partner with analytics team to build datasets that support visualizations
  • Conduct design and code reviews to deliver production quality code

 

Qualifications:

 

  • 4+ years of experience in data warehouse space
  • 4+ years of experience in custom ETL/ELT design, patterns for efficient data integration, change data capture, implementation, and maintenance
  • 4+ years of experience in query writing(SQL & NoSQL), schema design, normalized data model and dimensional model
  • 2+ years of experience in Kafka, Java, Python , API, Git, CI/CD, and AWS Cloud
  • 2+ years of experience in any MPP databases (AWS Redshift, Snowflake, etc) and RDBMS (PostgreSQL, mySQL)
  • Experience processing variety of data sources : Structured, Unstructured, Semi-Structed, SQL, PubSub, API and Event based in cloud based infrastructure and data services
  • Experience in Airflow, S3, DBT
  • Excellent communication and collaboration skills
  • Strong coding skills in Java and Python
  • A passion for building flexible data sets that enable current and future use cases
  • Analyzing large volumes of data to provide data driven insights, gaps
  • Experience using development environments such as Docker, Kubernetes.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed