Data Engineer, Data Platform
Sigma360
About Sigma360
Sigma360 is an MIT-incubated, venture-backed, Series B AI-driven global data and analytics company that helps clients manage risk. We convert the world’s messy data into actionable insights for financial institutions, corporates, and governments—powering workflows like name screening, investigations, and risk research.
We are a collaborative team that values ownership, clarity, and practical problem solving.
Why this role matters
Our data platform runs critical pipelines and scheduled jobs that feed our products and AI workflows. As we scale coverage and complexity, we’re investing in shared ownership of the platform so it stays reliable, maintainable, and easy to evolve.
What you’ll do
You’ll join the data engineering team and take ownership of a meaningful portion of our production platform.
- Operate and maintain scheduled jobs and pipelines (triage failures, debug issues, ship fixes, improve stability)
- Build and enhance data pipelines and integrations (APIs, file feeds, light web scraping when needed, normalization, and QA)
- Improve performance and cost efficiency (targeted PySpark optimizations, job tuning, pragmatic refactors)
- Improve maintainability and onboarding (runbooks, documentation, operational playbooks)
- Support data investigations and internal requests (trace lineage, validate outputs, answer “what happened?” questions)
- Provide day-to-day technical review (PR reviews, design feedback) to keep quality high and help the team ship safely
This role focuses on building and operating scalable, production-grade data pipelines in Databricks with a strong emphasis on reliability and integration.
Tech stack
- Databricks for development, orchestration, and scheduled workflows
- Python + PySpark + pandas for pipelines and tooling
- Data shipped downstream to Postgres and Neo4j, supporting a Golang backend
- AWS
What we’re looking for
This role is ideal for a junior-to-mid-level data engineer with ~3–7 years of experience who wants to grow into broader platform ownership.
We value autonomy, adaptability, and ownership—someone who can learn quickly and drive work to completion with high quality.
Required:
- 3+ years of professional data engineering experience
- Strong Python, SQL, and pandas
- Experience owning production data pipelines (debugging, monitoring, incident response, data quality)
- Strong written communication and ability to work autonomously in a remote environment
- Ability to overlap at least 4 hours with NYC business hours (9am–5pm ET)
- Bachelor’s degree, or equivalent practical experience, in Computer Science, Engineering, Data Science, or a related field
Nice to have:
- Databricks experience (Jobs/Workflows, notebooks, production operations)
- PySpark experience (or other distributed processing systems)
- AWS experience
- Experience integrating external data sources (REST APIs, bulk downloads, semi-structured data, light web scraping)
- Familiarity with Delta Lake / lakehouse patterns
What success looks like (first 6–12 months)
- Own and operate a set of critical pipelines and scheduled jobs end-to-end
- Improve job reliability in the areas you touch (fewer repeat failures, clearer root causes, faster recovery)
- Create runbooks and documentation that reduce time-to-debug and improve onboarding for new engineers
- Ship a steady cadence of pipeline enhancements and integrations while maintaining production stability
- Become a trusted reviewer for pipeline and integration changes, keeping quality high and unblocking the team with minimal escalation
What we offer
- Remote-first team with high autonomy and ownership
- Competitive compensation and meaningful equity
- Health, dental, vision, and other benefits (or local equivalent)
- Generous time off and a culture that supports learning and growth
Sigma360 is an equal opportunity employer. We are committed to fair hiring practices and to creating a welcoming environment for all team members. All qualified applicants will receive consideration without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, disability, age, familial status, or veteran status.