What you will be doing


Design and develop pipelines using Python, PySpark, and SQL
Use GitLab as the versioning control system
Utilize S3 buckets for storing large volumes of raw and processed data
Implement and manage complex data workflows using Apache Airflow (MWAA) to orchestrate tasks
Utilize Apache Iceberg (or similar) for managing and organizing data in the data lake
Create and maintain data catalogs using AWS Glue Catalog to organize metadata
AWS Athena for interactive querying
Familiarize with data modeling techniques to support analytics and reporting requirements, as well as knowledge of the data journey stages within a data lake (Medallion Architecture)


What we are looking for


Ideally, a degree in Information Technology, Computer Science, or a related field
Ideally, +5 years of experience within the Data Engineering landscape
Strong expertise in Python, PySpark, SQL, and the overall AWS data ecosystem
Strong problem-solving and analytical skills
Ability to explain technical concepts to non-technical users
Proficiency to work with Github
Terraform and CICD pipelines are a great ‘nice-to-have’
  • Data
  • Business Analysis and AI