Appen is the world's leading innovative technology company, providing high-quality language and data services for machine learning and artificial intelligence. Appen is headquartered in Sydney, Australia, and has subsidiaries in the United States, United Kingdom, Philippines, and China. Appen has more than 20 years of experience in collecting and processing a variety of data, including data types such as speech, text, and images, etc. Appen has a pool of more than 1 million prequalified crowd resources around the world able to provide data collection and processing services in more than 180 languages. Appen’s clients include global leaders in high-tech, automotive, e-commerce, etc. and governments to help them develop and improve products and technologies based on natural language understanding and machine learning. Appen is listed on the Australian Stock Exchange with stock code: APX
Appen’s mission is enable customers to build better AI by creating large volumes of high quality, unbiased training data faster. To accelerate Customer needs and product growth, Appen is building modern data lake and data warehouse. Data Engineers develop modern data architecture and tools to provide end to end data solutions and meet key business objectives.
The senior data engineer will be an integral member of the Data Engineering team.
- Design, build data models to support structured and unstructured data
- Design, build and deploy scalable high-volume data pipelines to move data across systems
- Lead architecture and implementation of batch and real-time data pipelines with instrumentation
- Design, build data transformations, metrics and KPI with data governance and data privacy policies
- Build centralized data lake, data warehouse and visualizations that support multiple use cases across different products for engineering and enterprise
- Work with Product team to deliver features on time
- Build data subject matter expertise and own data quality
- Design and develop software and data solutions that help product, engineering and business teams make data-driven decisions
- Owning existing processes running in production, problem solving and optimization
- Partner with data science team to provide quality data for model development and productionizing machine learning models
- Partner with analytics team to build datasets that support visualizations
- Conduct design and code reviews to deliver production quality code
- Strong coding skills in Python and SQL.
- Detail oriented, goal focused, analytical mind set, willingness to take on challenges, good time management, and good communication in English and collaboration skills.
- Experiences on data analytics and data processing.
Following are nice to have:
- 5+ years of experience in data warehouse space
- 5+ years of experience in custom ETL/ELT design, patterns for efficient data integration, change data capture, implementation, and maintenance
- 5+ years of experience in query writing(SQL & NoSQL), schema design, normalized data model and dimensional model
- 3+ years of experience in Python , Spark, API, Git, CI/CD, and AWS Cloud
- 3+ years of experience in any MPP databases (AWS Redshift, Snowflake, etc) and RDBMS (PostgreSQL, mySQL)
- Experience processing variety of data sources : Structured, Unstructured, Semi-Structed, SQL, PubSub, API and Event based in cloud based infrastructure and data services
- Experience in Airflow, S3, DBT
- A passion for building flexible data sets that enable current and future use cases
- Analyzing large volumes of data to provide data driven insights, gaps
- Experience using development environments such as Docker, Kubernetes.