👋🏽

Hello I am

Saumya

📊 Data Scientist

🤖 Machine Learning Researcher

🏊🏼‍♂️ Swimmer

profile_bgprofile_circle
circle
circle
circle
circle

Creating Intelligent Systems
for a Smarter World

Software Development

Software Development

Experienced in software architecture, design patterns, and best practices to create efficient and robust software solutions.

Machine Learning

Machine Learning

Committed to delivering high-quality, scalable machine learning systems that meet business needs

Computational Biology

Computational Biology

Committed to contributing to the open source community with a focus on developing computational biology and bioengineering solutions that drive innovation and improve human health.

Data Science

Data Science

Skilled in communicating technical concepts to non-technical stakeholders to drive data-driven decision-making

ML and Software Development Endeavors

Machine Learning
Artificial Intelligence
Computer Vision
Software Development
Natural Language Processing
Data Analysis
Statistical Models
Computational Biology
All

Image Stitching 

Selected stitching points on both images using ORB and used RANSAC to remove outliers. Transformed images using homography matrix which allowed for stitching both images

Computer Vision

Bayesian Weather Prediction 

Used Bayesian Linear Regression with Zellner-g prior to predict future weather patterns

Data Analysis

Personal Portfolio 

Personal portfolio website designed using React Js and sanity

Software Development

Mental Health EDA 

n this project, we tried to analyze the factors responsible for depression in countries, and we decided to explore the relationship of depression with Age, Gender, Substance use disorders, and Economic factors

Data Analysis

Optical Character Recognition 

I use a a two-part problem-solving approach to implementing OCR using Bayes and Viterbi algorithms. For Bayes, emission probability was used to compare all 72 outcomes, resulting in great but sometimes noisy outcomes. For Viterbi, the algorithm used emission, transition, and initial probabilities to pick the most common letter sequences, resulting in slightly better outcomes.

Computer Vision

SARSA with Linear Function Approximation 

Implementation of SARSA algorithm with linear function approximation implementation for Frozen lake, Cartpole and Lunar Lander

Machine Learning

DDPG Experiments 

Experiements on OpenAI gym environment using Deep Deterministic Policy Gradients

Machine Learning

Scene Segmentation using Unsupervised Learning 

This project focuses on using unsupervised learning techniques, including K-means and convolutional neural networks (CNN), to improve image segmentation. The goal is to improve feature extraction and clustering functions so that spatially continuous pixels with comparable features can be allocated to the same label, and the number of unique clusters can be increased.

Machine Learning

Part of Speech Tagging 

I use a three-part problem-solving approach to implementing a POS Tagger using Bayes, Viterbi, and Gibbs Sampling algorithms. For Bayes, emission probability was used to compare all 12 tags, while for Viterbi, the algorithm used emission, transition, and initial probabilities to select the most common POS tags of given words, resulting in slightly better outcomes.

Natural Language Processing

The Game of Quintris 

Created a Quintris AI bot using heuristic search and expectiminimax algorithm

Artificial Intelligence

Image mapping using K means 

Mapping images via feature mapping. Used ORB to calculate feature keypoints and descriptors. Used PCA to analyse components and applied KMeans clustering to cluster similar images Topics Stars

Computer Vision

CompuCell3D 

Open Source contributions to CompuCell3D

Computational Biology

Time Series in R 

Time Series analysis concepts coded using R

Data Analysis

Graph Traversal using Heuristics and A* 

Solved graph traversal using segments, distance and time based heuristics and A* search

Artificial Intelligence

Image Transformations 

Created transformation matrices based on options from command line including Translation, Eucledian/ Rigid Transformation, Affine transform. Created an inverse wrap by multiplying a homogeneous point with inverse translation matrix and used Bilinear interpolation to fill in the remaining gaps Topics Stars

Computer Vision

Sliding and Picking objects with Robot Arm using Reinforcement Learning 

In this project, I attempt to solve fetch and slide open gym environment with Hindsight Experience Replay and the I experiment with Prioritised experience replay to see if there are any performance improvements

Machine Learning

Abusive Language Detection in User Tweets (SemEval-2021) 

In this project I work on sentiment analysis at two levels. First, I try to identify offensive tweets and after that, I try to identify the level of offence and the target audience

Natural Language Processing

Improving Image Captions with Depth Maps 

In this project, I experiment with the hypothesis that adding depth map information to visual genome dataset generates better scene graphs. The newly generated relations (subject - predicate - object) are used to train an image captioning model

Computer Vision

Horizon Detection 

I try a three-part problem-solving approach to identifying boundaries between air-ice and ice-rock in images. The first part uses a simple Bayesian network but results are unsatisfactory, so a Hidden Markov Model (HMM) is used in part two, with Viterbi algorithm to improve results using initial, emission, and transition probabilities. Part three incorporates human feedback, using Viterbi algorithm in two directions from a point on the boundary to the first and last columns, resulting in the best results for all images.

Computer Vision

Autograder 

Implemented hough transformation and canny edge detection to extract OMR regions from answer sheet. Used K means clustering to detect filled regions. Encrypted and injected correct answers in the answer sheet using barcodes

Computer Vision

2021 Puzzle 

Solved 2021 puzzle using heuristics and A* search algorithm

Artificial Intelligence

Skills & Experiences

Jenkins

Jenkins

docker

docker

pytorch

pytorch

airflow

airflow

TerraForm

TerraForm

Node Js

Node Js

GraphQL

GraphQL

spark

spark

C++

C++

python

python

TensorFlow

TensorFlow

AWS

AWS

kafka

kafka

PostgreSQL

PostgreSQL

 Apache Fink

Apache Fink

06/2023-Present

Senior Data Scientist, AI & ML

Carnegie Learning

Developing Adaptive Curriculum Recommendation algorithms

Instrumenting Data Pipelines capable of handling 100K clickstream records per second

05/2022 - 08/2022

Software Development Engineer Intern

Pearson (Savvas)

Streamlined the data analytics pipelines and automated query scheduling on AWS for ETL tasks

Developed Adaptive Curriculum Recommendation algorithms

06/2018 - 08/2021

Data Scientist

Playpower Labs

Led the research and development of an AI-powered paper-based formative learning products

Spearheaded and directed cutting-edge machine learning research in Computer Vision and Natural Language Processing

Created highly scalable and reliable data processing pipelines with the ability to process million records a second

Optimised Machine Learning algorithms for performance and scalability and deployed in real-time environments using MLFlow

01/2022 - 05/2023

Software Developer

Indiana University

Designed and implemented a high-performance processing pipelines for 3D cell simulations using CUDA

Created a scalable architecture that can handle 100+ parallel simulations

Implemented a logger feature for researchers to monitor their simulations

Publications

Software Development
Computational Biology
Machine Learning
Data Analysis
All

Using Curriculum Pacing in Learnsphere to Visualize Student Learning Trajectories 

The paper proposes the development of a Curriculum Pacing workflow component within the LearnSphere environment, a visual learning analytic method for observing student learning trajectories. The component will enable multiple education stakeholders to make data-driven decisions, such as identifying difficult content areas, ensuring expected pace, and building hypotheses about student learning behavior. Additionally, the data can be used by instructional designers to compare progress with expectations, and by data scientists to develop content recommendation algorithms.

Software Development

Advanced Chemical Transport Modeling in Dynamic Multicellular Contexts Using CompuCell3D 

The paper discusses the limitations of the Glazier-Graner-Hogeweg (GGH) method, a cell-based spatial model used in mathematical modeling of the transport of soluble signals in living organisms. It proposes a numerical reaction-diffusion solver for GGH models based on surfaces and an algorithm to account for the effects of cellular dynamics on coupled chemical distributions. This allows for more accurate modeling of tissues that were previously difficult to represent using the GGH method.

Computational Biology

Testimonials

Pratik Prajapati

Saumya possesses a wide range of skills in Machine Learning, Artificial Intelligence, Computer Vision, Software and Android Development. His thirst for knowledge and eagerness to learn new technologies is unparalleled. He consistently shares his knowledge with his peers, making him a valuable team member. Saumya is an excellent collaborator and has a keen ability to lead productive discussions and brainstorm innovative ideas. He places great importance on community building and works tirelessly to build relationships with others. As a senior, Saumya was not only a mentor but also a friend. He was always willing to lend a helping hand whenever I needed it. His patience and willingness to guide others made him an invaluable resource on our team. In light of Saumya's extensive skillset, strong work ethic, and collaborative attitude, I strongly recommend him for any position he may be pursuing. He would make a valuable asset to any organization lucky enough to have him.

Pratik Prajapati

Playpower Labs