hero

The Moxxie portfolio is hiring!

Join these teams making life and work better

Data Scientist

Dandelion Health Inc

Dandelion Health Inc

Data Science · Full-time
New York, NY, USA · Remote
100,000 – 130,000 USD per year
Posted on Friday, January 27, 2023

Data Scientist

New York or Remote / Full-Time / Early to Mid-Career / Immediate Start

Resumes, questions, and requests for assistance or an accommodation due to a disability may be directed to Dandelion’s Director of Clinical Informatics, mara@dandelionhealth.ai.

References are required.

Salary Range: $110K - $130K + equity (negotiable)

Our Team

Dandelion Health was founded in 2020 by experts in health tech, hospital systems, academia, and clinical AI. We are building the world’s largest AI training and validation platform. Today, we pride ourselves on our ability to make data access as easy as possible for AI developers, while raising the bar for patient safety and data quality. Tomorrow, we will be the place where any AI developer can go to build a responsible clinical AI product. Our culture is all about learning from data and improving, so we can help our clients improve health through AI. Meet the rest of our team here.

Our Data and Cloud Stack

We partner with health systems to safely and ethically make their de-identified patient data available to AI developers. Currently the data is acquired from Sharp HealthCare and Sanford Health in the United States – with three additional U.S. health systems joining soon.

We have clinical data dating back to July 1, 2016. This data represents over 10 million patients and includes but is not limited to:

  • Structured data (e.g., 100% of the EMR, including some claims)
  • Unstructured text (e.g., clinical notes, radiology reports)
  • Images, video (e.g., PACS, pathology)
  • Waveforms, streaming inpatient data (e.g., ECGs, ICU and bed monitors)

We are an AWS and Python shop. Our clients, partners, and other internal teams may work with other languages and platforms from time to time (e.g., R, MATLAB, GCP, Azure).

Your Role

You work directly with the Director of Clinical Informatics and Head of Engineering to help (i) build the de-identification and ELT pipeline for multimodal clinical data, from our hospital partners’ source systems to our AWS environment; and (ii) create AI-ready datasets for our clients in our AWS environment. This will involve working across the organization with both engineers and clinical scientists to build our own clean, reproducible code base. Your team’s ultimate goal is to deliver the highest-quality data products possible to our clients, who are AI developers building products that improve patient health.

You have extremely high levels of attention to detail, because you understand that the quality of AI algorithms follows directly from the quality of the datasets on which they are trained; and that seemingly innocuous errors in underlying datasets can propagate into massive problems in the training and validation of algorithms.

As a result, you are not afraid to dig into massive, confusing, disorganized new datasets and get them under control. You are excited to learn new environments, languages, and skills. This is a small, early stage company with enormous ambitions and everyone pitches in to do everything.

You are a great data scientist in terms of technical skills (clinical data and programming). You also have a strong ability to identify and foresee issues, proactively map out solutions and scenarios, communicate your ideas and plans to all levels of the organization, and document and share your learnings.

Technology Experiences and Skills

We don’t expect anyone to have all of the following skills or experiences, but we do seek candidates who are interested in growing their skill sets and working with health care data and all its glorious complexity. The Data Team works closely with our Growth and Engineering Teams to put our work into production and meet client needs.

Required Skills

  • SQL
  • Python and/or R
  • Prior experience querying EDWs or databases and creating reports or analytics for healthcare data

Nice to have skills and tools that we like

  • Familiarity with the data aspects of electronic medical records, especially Epic, Cerner, Allscripts, etc.
  • Any experience working with DICOM or other imaging modalities
  • Prior experience working with insurance claims data
  • Familiarity with medical terminologies or controlled vocabularies such as ICD-10, SNOMED-CT, LOINC, CPT/HCPCS, NDC, and RxNorm
  • Any medical ontology experience
  • Git and version control
  • Familiarity with writing regular expressions and NLP more broadly
  • Experience working with non-relational data (e.g., XML and JSON)
  • Experience with OMOP common data model or any other common data models
  • Automated reporting (e.g., Quarto)
  • Data viz in Python or R
  • Experience with data manipulation Python libraries (e.g., Pandas, Numpy, Matplotlib)
  • Basic AWS knowledge (e.g., EC2, S3, SageMaker)

Note that familiarity with machine learning model development and deployment is not required for this position, but familiarity with high level ML concepts is a plus.

Potential Work Examples

  • Learning about EHR data models and querying data based on client needs
  • Researching client requests and investigating how the needed data is collected, stored, and used by the health system
  • Working with analytic and clinical staff to understand how data is stored
  • Cleaning and wrangling data
  • Normalizing data from across healthcare systems
  • Using and deploying our own ML tools
  • Automating tasks that can be automated
  • Documenting your work for future reuse and sharing knowledge across the company
  • Refining and iterating work based on internal and client feedback

Nature of our work

Our work is fast paced and iterative. We are growing, and we want to support our team members to grow in their skills as well. We are building a team that approaches problems with a diversity of perspectives, values experimentation, and refines our approach based on that experimentation. We work with the full spectrum of healthcare data from tabular data, videos, images, waveforms, etc. If a health system collects it, we might work with it!

If this looks like a partial fit, please reach out. We would love to share more about the work we do for you to understand if it would be a good fit for you.

There is occasional travel for in-person company working days on roughly a quarterly basis.

Team Benefits

  • Professional development days to build your skills
  • Access to physician and clinician informaticist to help understand health data from the clinical aspect
  • Collegial work environment
  • Academic bent towards inquiry and problem solving but start-up speed and flexibility
  • Remote work and flexible hours, need to be available for meetings which we try to keep to a healthy minimum
  • Great balance of focus time to work on projects but easy to access team members to discuss issues and work collaboratively
  • Dandelion is a mission driven company to improve patient care!
Dandelion Health Inc is an equal opportunity employer.