Job Duties/Essential Functions
  • Design, build, and test Databricks workflows and jobs for both batch and streaming
  • Design clinical architecture and develop, implement frameworks such as Common Data Models (OMOP, CDISC, FHIR7, etc.) and techniques to organize clinical data from disparate sources.
  • Design a secure, cloud-based platform for acquiring and aggregating patient data to then be consumed by analytics workflows.
  • Collaborate with clinical data experts to select and implement ontologies (ICD10, SNOMED, RxNorm, etc) and to translate between disparate ontologies.
  • Implements repeatable techniques and methods around data transfer, pipeline testing, and platform infrastructure and management components.
  • Creates extract tools utilizing CDC, API’s, and SDK’s from source systems like Medrio and other EDC’s to hydrate the Lakehouse.
  • Creates SQL and Pyspark notebooks and packages to facilitate the movement, cleaning, and storage or data.
  • Enhancing functionality and scalability of the client services through technology innovations
  • Adhering to and promoting high standards in testing and integration. This includes writing unit tests as well as integration or pipeline tests.
  • Collaborating with other teams, including biological imaging, cancer biology, clinical data, and engineering
Competencies
  • Attention to detail
  • Ability to work both independently and as part of a team
  • Ability to design and develop robust, scalable cloud-based solutions
Required Education and Experience
  • Degree in Computer Science or related field desired
  • 5-7 years relevant experience
  • Must have experience in one or more pipeline or orchestration tools (Databricks, Snowpipes, Apache Airflow, SQL server integration services, AWS step functions, Synapse, etc.) Databricks is preferred.
  • Must have experience working with SQL (any) and be able to craft advanced level queries.
  • Understands REST API’s and how to utilize them to acquire and send data.
  • Appropriate understanding of Git and source control.
  • Must have experience working with and implementing data lakes.
  • Must have advanced ability in at least one major language including: python, Pyspark, SQL, Scala.
  • Must have experience with clinical data, maintaining clinical data lifecycle and implementing common data models. Preference is given to someone who has combined data into a single ontology from multiple disparate data sources.
  • Prior experience working with OMOP, CDM, and FHIR preferred
  • Prior experience working in biotech, clinical, healthcare, or life sciences preferred
  • Experience with modern mathematical and statistical software, and preferably AI/ML frameworks desired
Certifications/Specializations
  • No certifications or specializations required.
  • Databricks or other Lakehouse platform (Fabric, Snowflake, GCP, AWS, etc.) certifications are preferred.

Subscribe to be notified of new jobs

Personal Information









Attachments

Other Information