Job Description
About the position
AgileEngine is one of the Inc. 5000 fastest-growing companies in the U.S. and a top-3 ranked dev shop according to Clutch. We create award-winning custom software solutions that help companies across 15+ industries change the lives of millions. If you like a challenging environment where you’re working with the best and are encouraged to learn and experiment every day, there’s no better place - guaranteed! :)
Responsibilities
• Design, develop, and maintain ETL pipelines to extract, transform, and load data across various data sources (cloud storage, databases, APIs)
,
• Use Apache Airflow for orchestrating workflows, scheduling tasks, and managing pipeline dependencies
,
• Build and manage data pipelines on Azure and GCP clouds
,
• Design and support Data Lake
,
• Write Python scripts for data cleansing, transformation, and enrichment using libraries like Pandas, PySpark
,
• Analyze logs and metrics from Airflow and cloud services to resolve pipeline failures or inefficiencies.
Requirements
• Experience (2+ years) writing efficient and scalable Python code, especially for data manipulation and ETL tasks (using libraries like Pandas, PySpark, Dask, etc.)
,
• Knowledge of Apache Airflow for orchestrating ETL workflows, managing task dependencies, scheduling, and error handling
,
• Experience in building, optimizing, and maintaining ETL pipelines for large datasets, focusing on data extraction, transformation, and loading
,
• Familiarity with cloud-native storage solutions
,
• Understanding and working experience with different file formats
,
• Expertise in writing efficient SQL queries for data extraction, transformation, and analysis
,
• Familiarity with complex SQL operations (joins, aggregations, window functions, etc.)
,
• Familiarity with IAM (Identity and Access Management), data encryption, and securing cloud resources and data storage on both Azure, GCP
,
• Upper-Intermediate English level.
Nice-to-haves
• Use some Java libraries to request data from APIs
,
• Knowledge of data governance practices, and the implementation of data lineage and metadata management in cloud environments.
Benefits
• Professional growth through mentorship, TechTalks, and personalized growth roadmaps
,
• Competitive USD-based compensation and budgets for education, fitness, and team activities
,
• Exciting projects with modern solutions development and top-tier clients including Fortune 500 enterprises
,
• Flextime options for optimal work-life balance, including working from home or in the office.
Responsibilities
- Design, develop, and maintain ETL pipelines to extract, transform, and load data across various data sources (cloud storage, databases, APIs)
- Use Apache Airflow for orchestrating workflows, scheduling tasks, and managing pipeline dependencies
- Build and manage data pipelines on Azure and GCP clouds
- Design and support Data Lake
- Write Python scripts for data cleansing, transformation, and enrichment using libraries like Pandas, PySpark
- Analyze logs and metrics from Airflow and cloud services to resolve pipeline failures or inefficiencies
- Use some Java libraries to request data from APIs
Requirements
- Experience (2+ years) writing efficient and scalable Python code, especially for data manipulation and ETL tasks (using libraries like Pandas, PySpark, Dask, etc.)
- Knowledge of Apache Airflow for orchestrating ETL workflows, managing task dependencies, scheduling, and error handling
- Experience in building, optimizing, and maintaining ETL pipelines for large datasets, focusing on data extraction, transformation, and loading
- Familiarity with cloud-native storage solutions
- Understanding and working experience with different file formats
- Expertise in writing efficient SQL queries for data extraction, transformation, and analysis
- Familiarity with complex SQL operations (joins, aggregations, window functions, etc.)
- Familiarity with IAM (Identity and Access Management), data encryption, and securing cloud resources and data storage on both Azure, GCP
- Upper-Intermediate English level
- Knowledge of data governance practices, and the implementation of data lineage and metadata management in cloud environments