SENIOR DATA ENGINEER – ETL
Appen is the global leader in data for the AI Lifecycle. With over 25 years of experience in data sourcing, data annotation, and model evaluation by humans, we enable organizations to launch the world’s most innovative artificial intelligence systems. Our expertise includes a global crowd of over 1 million skilled contractors who speak over 235 languages, in over 70,000 locations and 170 countries, and the industry’s most advanced AI-assisted data annotation platform. Our products and services give leaders in technology, automotive, financial services, retail, healthcare, and governments the confidence to launch world-class AI products. Founded in 1996, Appen has customers and offices globally.
- Design, build data models to support structured and unstructured data
- Design, build and deploy scalable high-volume data ETL pipelines to move data across systems
- Lead the architecture and implementation of batch and real-time data pipelines with instrumentation
- Design, build data transformations, metrics and KPI with data governance and data privacy policies
- Build centralized data lake, data warehouse and visualizations that support multiple use cases across different products for engineering and enterprise
- Technical lead for new and existing product initiatives, assist with definition of product direction
- Work with Product team to deliver features on time
- Build data subject matter expertise and own data quality
- Design and develop software and data solutions that help product, engineering and business teams make data-driven decisions
- Owning existing processes running in production, problem solving and optimization
- Partner with data scientists and machine learning engineers to provide quality data for model development and productionizing machine learning models
- Partner with analytics team to build datasets that support visualizations
- Conduct design and code reviews to deliver production quality code
- BS or MS in Computer Science or related technical discipline (or equivalent).
- Excellent understanding of computer science fundamentals, data structures, and algorithms.
- Excellent problem-solving skills.
- 6+ years’ work experience in software development area with at least 4+ years’ experience in data warehouse space
- 4+ years of experience in custom ETL/ELT design, patterns for efficient data integration, change data capture, implementation, and maintenance
- 4+ years of experience in query writing (SQL & NoSQL), schema design, normalized data model and dimensional model
- 4+ years of experience in API, Git, CI/CD, AWS Cloud, JAVA or Python language
- 2+ years of experience in any MPP databases (AWS Redshift, Snowflake, etc) and RDBMS (PostgreSQL, mySQL)
- 1+ year experience using Docker, Kubernetes.
- Experience processing variety of data sources: Structured, Unstructured, Semi-Structed, SQL, PubSub, API and Event based in cloud-based infrastructure and data services
- Experience in Hadoop, Map/Reduce, SQL, Kafka/Storm, Elastic Search, Airflow, and DBT
- Excellent communication and collaboration skills in English
- Strong coding skills in Java or Python
- Proven results-oriented person with a delivery focus in a high velocity, high quality environment.