Weeks 15

Happy New Year!!

Last year began with a lot of uncertainty. Balancing graduate coursework, on-campus work, the internship search, and personal growth required constant adjustment. Being persistent was definitely the key.

The first half of the year was a steep learning curve. I deepened my understanding of big data architecture, statistics, and machine learning, and applied these concepts through hands-on projects such as ResearchPod, BankIntel, and DoctorVisits. One aspect of education here that I truly value is its emphasis on practice, translating theory into real, end-to-end projects that mirror how data work happens outside the classroom.

Over the summer, I spent time back home in India. Returning to a familiar environment helped me reset, reflect, and regain clarity before coming back to continue building.

I completed my role as a Data Engineer with CU Libraries, where I worked on the digitization and metadata extraction of historical tsunami marigram scans from the 1800s. This project deepened my appreciation for data quality, archival workflows, and reproducibility. Knowing that this work will support climate and oceanographic research made the effort especially meaningful.

I also joined the Research & Innovation Office as an Impact Intern, where I learned extensively about research funding ecosystems and broader impacts planning. I supported faculty workshops, assisted in developing broader impact strategies, and built a website that surfaces automatically refreshed funding opportunities from Pivot-RP. This tool simplifies how researchers discover funding without manual searching, and I’m excited to continue improving it. I also managed the JEDIA website using WebExpress, which was a new platform for me, but I quickly adapted to maintaining live institutional content.

In December, I started my internship at Parsyl as a Data Scientist. Getting here took persistence in navigating applications, interviews, and the added complexity of being an international student. I’m grateful I kept applying and showing up, even when the outcome felt uncertain.

I’m currently working on the Claims team, where the focus is on making the claims process faster, more efficient, and more accurate. I’m enjoying learning the domain, understanding how data supports real operational decisions, and finding areas where I can contribute meaningfully. The team has been welcoming and generous with their time, and I’m excited about the growth ahead.

Coursework

Natural Language Processing

My team and I wrapped up the modeling part for SemEval Task 5, and it has honestly been one of my favorite projects this semester. We built our full pipeline, finished the codebase, cleaned up the repo, and we’re now waiting for the official test set that drops in January. We’re all excited to see how the model performs in the wild, but the whole process taught me a ton. From handling dataset quirks to understanding how subtle lexical cues completely change classification outcomes. Check out the repo here!

In class, we covered a huge chunk of material over the last two weeks: IR vs RAG, the shift from sparse to dense retrieval, single-encoder vs bi-encoder architectures, and the whole family of retrieval-augmented workflows. We also dug into Machine Translation. From the linguistic chaos of 7,000 world languages to morphological variation, word-order typology, segmentation issues, and why MT evaluation is basically an eternal headache (BLEU, chrF… all of them imperfect in different ways).

I’ve been keeping up my NLP 101 Blog, too. I wrote about Why Early Language Models Failed: Data Sparsity and the Classical Fixes. Check out the Blog here!

Neural Networks & Deep Learning

For Neural Networks, I wrapped up the semester with EcoSort, a project that I’ve genuinely enjoyed building from scratch. It’s a waste classification system that uses a fine-tuned ResNet-18 model to classify items into cardboard, glass, metal, paper, plastic, and trash, which is a surprisingly tricky task when items visually overlap. After training, validating, debugging, retraining, explaining decisions with Grad-CAM, and building the entire website, the model reached a solid 96% accuracy.

Beyond the numbers, EcoSort taught me how small dataset biases can push models in weird directions, and how interpretability tools like heatmaps actually help diagnose misclassifications. The full write-up, visuals, and code are available here!

Information Visualization

In Info Viz, we ended with individual dashboard presentations, and I built an interactive Streamlit dashboard analyzing global EV sales trends. I focused on keeping the layout clean and making comparisons intuitive, like the country-level stats, year-over-year changes, top performers, and a few obvious surprises that you only notice when you visualize the shifts instead of reading static tables.

Building the dashboard reinforced how much good visualization is about simplifying without oversimplifying. Getting the right level of detail, avoiding clutter, and letting interaction do the heavy lifting was basically the theme.

Future-Proofing SAP Data Integration for RISE with SAP

Even though I’m not an SAP customer, I joined this session out of pure curiosity. Mostly because SAP’s ecosystem genuinely feels like its own universe. It was surprising to see how much pressure organizations are under as ECC sunsets. The ODP ban hit harder than I expected: removing third-party access to core data sources basically forces companies to rethink long-standing integrations.

My biggest learning was how migration is a continuity problem. Teams have to keep analytics running, maintain compliance, and avoid breaking business-critical reports while transitioning to S/4HANA. Hearing how companies use CData’s connectivity layer to keep data accessible. Even across mixed setups involving Azure, AWS, Google Cloud, and on-prem systems. This helped me understand what “future-proofing” actually looks like in the enterprise world.

The session made me appreciate how fragile these pipelines are and how much engineering goes into making sure nothing collapses during modernization. It was the first time I really understood why hybrid and multi-cloud architectures require so much planning, and why a single SAP policy change can cause a massive ripple effect across entire data ecosystems.

Boulder Climate Ventures: Geoengineering Conversations

BCV hosted a really thoughtful session featuring Katja Friedrich, Julie Pullen, and Ryan Orbuch on geoengineering: what it is, how it fits into climate strategies, and where innovation (and investment) is heading.

I did not know how deeply data-driven the entire space is. Every geoengineering proposal, whether it’s atmospheric modeling or ocean-based interventions, depends on massive climate datasets, simulation accuracy, uncertainty quantification, and long-term forecasting.

Hearing scientists and investors discuss the need for reliable modeling, real-time monitoring, and transparent data pipelines made me realize how strongly data science sits at the center of climate innovation. It connected everything I’m studying: modeling, analytics, ML, and interpretability to real-world problems like climate risk, mitigation strategies, and evaluating planetary-scale interventions. It was a great conversation that balanced scientific possibilities with ethical and environmental considerations, and how essential good data and good models are in shaping climate decisions.

Agentic Data Engineering with DuckDB

This talk by Shaheen Essabhoy was honestly one of my favorites. She walked through how their team migrated from BigQuery to DuckDB using agentic AI workflows and cut months of manual rewriting. The focus was on cost efficiency, performance gains, and the way DuckDB enables local analytics without the overhead of large cloud engines. It was a reminder that “small but optimized” stacks can sometimes do more than massive systems.

Tech Careers at Visa: Non-Linear Journeys

Visa’s session was genuinely encouraging. Every speaker talked about how their career path was anything but linear. People who moved from finance to product, academic backgrounds to industry roles, and others who discovered tech halfway through something completely different. The main message was that your existing experiences always translate, even if the path looks unconventional.

They also shared insight into roles at the intersection of business, tech, and analytics, which are areas where domain knowledge and technical skill meet. It was great to hear directly from employees at Visa about unconventional career paths.

Career Launchpad (MS-DS)

The Career Launchpad event focused on resume strategy, navigating DS/ML roles, and core GenAI skills, all tailored specifically for MS-DS students. Akhilesh P. R. shared practical insights into career paths and what differentiates roles such as Data Analyst, Data Scientist, and Machine Learning Engineer, helping clarify how responsibilities and expectations evolve across these roles.

Pranjal Pathak then walked through how to tailor resumes for data science positions and highlighted the GenAI skills companies are actively looking for today. The session offered a clear view of how technical skills, role positioning, and storytelling come together in the current job market. Overall, a very focused and valuable discussion.

Snowflake Badges: GenAI + Data Engineering

Build 2025-2026: Gen AI Bootcamp
The GenAI Bootcamp was actually a great way to understand how Snowflake operationalizes LLMs inside the platform. I got hands-on with Cortex LLMs, built a couple of quick RAG-style mini apps, and experimented with how vector search and retrieval actually behave at scale. The best part was seeing how Snowflake abstracts away so much infrastructure, letting you focus on prompt logic, embeddings, and evaluation instead of wrestling with servers.
Build 2025-2026: Data Engineering Bootcamp

The Data Engineering Bootcamp covered the other side of the stack: ingestion patterns, data modeling, cost-aware pipeline design, and optimizing queries for performance. I spent time fixing bottlenecks, rewriting a few transformations in more efficient ways, and understanding when to use Snowpark vs SQL vs external UDFs. It was very practical and tied back to real-world pipeline decisions, especially around minimizing compute costs and avoiding unnecessary data movement. Always fun to see progress tracked like this - small wins but meaningful.

Towards the end of the year, I had the chance to travel to Amherst and Rhode Island to meet a friend from undergrad, and later visited Aspen. These places were strikingly beautiful.

There’s an awe that comes from standing among places and spaces much bigger than you. Those moments make you pause and reflect on how far you’ve come. There’s been a lot of change over the past year or two, and it’s pushed me to adapt faster than I expected. I don’t know exactly what’s next, and I’m learning to be comfortable with that.

I’m ending the year feeling grateful. Grateful for the opportunities I’ve had, for everything I’ve learned, and for the fact that I get to do work I genuinely enjoy.

Looking ahead, I want to stay more present and intentional. As responsibilities grow, prioritizing health, routine, learning, and balance feels essential to sustaining long-term growth. I’m excited to carry that mindset into the year ahead.

Here’s to 2026!

momentum

Weeks 15 - 18