Job D5368 Description Click to apply: Please attach resume to mail
SOFT's client located in New York, NY ( Remote ) is looking for a Data Lake Developer - Python for a long term contract assignment.

BACKGROUND:
Data & Analytics is building a data lake and associated pipeline infrastructure to replace a custom, multi-pipeline system. The new system will support expanded and improved (1) datasets and tools for agency operations management, (2) performance evaluation systems for agency management and oversight reporting, and (3) open data platforms for public access to datasets that support oversight group analysis and application development. The data that is or will be included in the system is generated and/or used by virtually all parts of the agency. Types of data range from ridership to on time performance to employee workhours and overtime use to administrative functions.

Some data sets add a million or more records each day and current data infrastructure does not have the capacity to support user needs and future growth in the range and volume of the datasets. 

New technologies and techniques will expand functionality and enhance the timeliness and responsiveness of tools.

AIM:
Our plan requires skills in data engineering, coding in Python and other languages, and report/dashboard development in PowerBI and other data visualization tools. 

Specific tasks include:
1. Designing data structures and writing code to collect, combine and transform datasets to meet business needs.
2. Developing data lake architecture to automate data extraction and transformation of raw data to more complex and calculation-based tables.
3. Documenting work in a thorough manner consistent with team standards so that it can be easily understood by teammates and future users.
4. Designing and carrying out testing processes and quality controls on output data for validity, accuracy and usability by the desired audience.
5. Generating data visualization outputs

REQUIREMENTS:
• Skills and experience programming in Python and SQL – 3+ years
• Skills and experience in using data lake tools and demonstrated ability to learn new tools quickly
• Skills and experience using PowerBI
• Ability to clearly document all work (commented code, readme files, diagrams, etc.) so that work is easily transferred back to internal employees
• Excellent attention to detail and QC skills to ensure errors are found and corrected before outputs are made available
• Good verbal and written communication abilities for internal collaboration