SENIOR DATA ENGINEER
Appen’s mission is enable customers to build better AI by creating large volumes of high quality, unbiased training data faster. To accelerate Customer needs and product growth, Appen is building modern data lake and data warehouse. Data Engineers develop modern data architecture and tools to provide end to end data solutions and meet key business objectives.
The data engineer will be an integral member of the Data Engineering team.
- Design, build data models to support structured and unstructured data
- Design, build and deploy scalable high-volume data pipelines to move data across systems
- Lead architecture and implementation of batch and real-time data pipelines with instrumentation
- Design, build data transformations, metrics and KPI with data governance and data privacy policies
- Build centralized data lake, data warehouse and visualizations that support multiple use cases across different products for engineering and enterprise
- Work with Product team to deliver features on time
- Build data subject matter expertise and own data quality
- Design and develop software and data solutions that help product, engineering and business teams make data-driven decisions
- Owning existing processes running in production, problem solving and optimization
- Partner with data science team to provide quality data for model development and productionizing machine learning models
- Partner with analytics team to build datasets that support visualizations
- Conduct design and code reviews to deliver production quality code
- 4+ years of experience in data warehouse space
- 4+ years of experience in custom ETL/ELT design, patterns for efficient data integration, change data capture, implementation, and maintenance
- 4+ years of experience in query writing(SQL & NoSQL), schema design, normalized data model and dimensional model
- 2+ years of experience in Python , Spark, API, Git, CI/CD, and AWS Cloud
- 2+ years of experience in any MPP databases (AWS Redshift, Snowflake, etc) and RDBMS (PostgreSQL, mySQL)
- Experience processing variety of data sources : Structured, Unstructured, Semi-Structed, SQL, PubSub, API and Event based in cloud based infrastructure and data services
- Experience in Airflow, S3, DBT
- Excellent communication and collaboration skills
- Strong coding skills in Python
- A passion for building flexible data sets that enable current and future use cases
- Analyzing large volumes of data to provide data driven insights, gaps
- Experience using development environments such as Docker, Kubernetes.
Appen is a global leader in the development of high-quality, human-annotated datasets for machine learning and artificial intelligence. Appen brings over 20 years of experience capturing and enriching a wide variety of data types including speech, text, image and video. With deep expertise in more than 180 languages and access to a global crowd of over 1 million skilled contractors, Appen partners with technology, automotive and eCommerce companies — as well as governments worldwide — to help them develop, enhance and use products that rely on natural languages and machine learning.
At Appen, we value performance, honesty, humility, and grit. We persevere and remain focused, whilst maintaining agility to achieve quality outcomes and exceed expectations. We’re truth tellers – respectfully of course. We take accountability for our actions and believe in giving and receiving direct feedback. We give credit where credit is due and show gratitude to others for their contributions. We seek diverse perspectives as we recognize the value in teamwork and collaboration. Through grit, we take ownership, and we don’t give up.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.