Stories by Data Science Club @ NYU (Center for Data Science) on Medium

Looking Back at DSC x BAC x Corner’s Datathon 2025

Data Science Club @ NYU (Center for Data Science) — Wed, 08 Apr 2026 20:11:15 GMT

Last year’s highlights. This year’s inspiration.

The semesterly datathon has become a cornerstone tradition for the NYU Data Science Club. More than just a competition, it is the perfect opportunity for students studying data science to sharpen their creative and technical skills and collaborate with other like-minded students across different disciplines to tackle problems grounded in real-world relevance.

Our 2025 Datathon sponsor

Our partner for the 2025 Datathon was Corner, a rising social mapping app aiming to redefine how GenZ users discover places in the world around them. Rivaling other review-based platforms like Yelp and Beli, Corner focuses on capturing the vibe of places, (who typically goes there and how different types of people experience them) using sentiment and demographic insights to help users discover spots that truly match their preferences, whether that’s a restaurant, a nightlife spot, or a hidden gem worth exploring.

It should come as no surprise then, that the two prompts for last year revolved on understanding and leveraging data to draw insights about our generation’s preferences and habits. Both challenges were real problems that Corner was actively working on, and participants were provided with real anonymized data from the company’s production systems.

The first challenge pushed students to develop meaningful ways to best segment Corner’s user base into consumer archetypes based not just on demographic features, but on taste and their interests in places. The second challenge (and it certainly was a challenge!) asked participants to create a RAG-based system that turns a user’s query into relevant results using semantic lexical matching that feels more human and personal. For example, instead of searching for “italian food” or “restaurant” on traditional search systems, participants were tasked to build ‘Vibe Search’ that relied on more natural prompts such as “where to find hot guys” or “dance-y bars that have disco balls.” With intentionally open-ended prompts, teams had the freedom to experiment with their solutions that they delivered with impressive originality and creativity.

Through our conversations with the participants, one of the most striking aspects of the datathon that we came to appreciate was the diversity of its participants. Students from a plethora of majors as well as schools came together to form teams, and it was exciting to see a data science major in CAS partnering with a math major as well as a Stern finance student, each bringing a unique lens to the same problem.

Kicking off the Datathon

For many participants, the datathon was an opportunity to apply classroom knowledge and work towards a solution to a real-life issue. “My goal with the datathon was to get a feel of the field at NYU, and to tackle a project specifically,” Hugo Lee, a Computer Science major said. His teammate, Yuyang, echoed this sentiment stating that he was here to apply some of the skills he’d learned this year onto a realistic scenario. “I’m in the Business Analytics Club’s machine learning team and we’ve learned a lot that can be applied to a project like this. I want to engage with some real world implementations.”

Hugo and Yuyang’s team approached the second challenge by dividing their project into two components. They first encoded and embedded written reviews of geographic locations and places, using neural networks to predict which place best matched those reviews. Their front-end allowed users to input a description of a feeling or “vibe” that the user was looking for in a location, which was then encoded and fed into the neural network to be processed into recommendations.

But they didn’t stop there. Wanting to push boundaries and be more innovative, according to Hugo, they added a creative spin to the prompt. In addition to vibe prompting for locations, they wanted to extend their system to using song lyrics for the same purpose. With this, users could input a song whose melody, instrumentals, and lyrics would be analyzed by neural networks to infer a specific mood and translate it into location suggestions.

Participants hard at work tackling real-world data challenges

Across the room, other teams came up with entirely different approaches to the two problems. “Now, we are basically trying to ideate the process of clustering.” Jacob, a sophomore studying finance and data science, said regarding the work his team was doing. “We’ve already clustered places using an algorithm and are now going to cluster user information and reviews.” As straightforward as they may sound, the execution proved complex. Jacob’s teammate, Daniel, said that the main difficulty for their team was determining the bucket sizes for their clustering algorithm. “If there are too few categories, then all the user profiles would be too similar; but with too many categories we would have trouble comparing users and reviews at all.”

Moments like these captured the essence of the datathon. Beyond coding and modeling, finding a proper solution to the prompt required dedication, creativity, and critical thinking about trade-offs, edge cases, and the nuanced intricacies of real data, mirroring the kind of challenges faced in industry settings. Indeed, for many students, the datathon served as a glimpse into what working in the data science industry actually feels like. Jacob reflected on the event as a good learning experience: “Before this, I didn’t even know much about clustering algorithms, but now I can definitely see myself working on projects like this full-time.”

Participating is always worth it for the free food

Of course, no datathon would be complete without its lighter moments, and last year delivered plenty. From meme and trivia contests to creative challenges like building personalized lists on Corner, participants found ways to have fun alongside the competition. And, as always, there was the universal motivator: free food.

And finally, after a weekend of innovation and hard work, one winning team was selected for each challenge. Corner’s founder and CTO, Jake Xia, stepped in as our judge, reviewing each project and selecting the ones that stood out for both technical depth and creativity. The winners were:

Easy Challenge: Cornerstones — Andy Li, Taichung Wu & Travis Chew
Hard Challenge: The Unfortunate Vectors — Tomas Gutierrez, Robin Chen & Yarden Morad

As we reflect on the success of the 2025 Datathon, it’s clear that the event continues to grow, not just in size, but in ambition and impact. This year, we’re excited to partner with Pulse Foundry AI to bring Datathon 2026, sponsored by Vercel and v0. Participants can look forward to prizes and recognition for top teams, employment opportunities at startups, connections to Google and other leading companies, as well as exclusive access to sponsors and industry networks.

Tune in for Datathon 2026!

With the event just around the corner, taking place April 10–12, 2026, we’re excited to see what new ideas, innovations, and collaborations will emerge. Whether you’re a returning participant or considering joining for the first time, one thing is certain: the datathon remains one of the best opportunities at NYU to build, learn, and create something meaningful.

We hope to see you there!

Article written by Franklin Dong, Ananya Mittal, and Grace Wang.

Looking Back at DSC x BAC x Corner’s Datathon 2025 was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

DSC Project Exposition 2024 Recap

Data Science Club @ NYU (Center for Data Science) — Tue, 18 Feb 2025 13:31:39 GMT

Celebrating Innovation: Highlights from DSC Project Exposition 2024

Over the fall semester, NYU’s Data Science Club hosted a series of hands-on workshops featuring industry experts who shared their knowledge and insights on the real world applications of data science. These workshops guided students through the process of building a data science project — from ideation and an introduction to pandas to machine learning techniques and office hours where graduate students provided mentorship on project-related questions. The club and its members’ semester-long work and preparation culminated in the Project Exposition — an event designed to provide an opportunity for students to present their projects to a panel of esteemed and experienced judges as well as gain valuable feedback and exposure to industry applications of data science and analytics.

A series of workshops leading up to the Project Expo

On the day of the Project Expo, six project teams presented to a panel of five judges: Manisha Arora, Data Science Leader at Google; Daniel Shan, Software Engineer at Datadog; Jude Francis, Director of Data Science at Johnson & Johnson; Sophia Tee, Senior Director of Data Science at Penguin Random House; and Shaumik Daityari, Data Science Leader at American Express. These accomplished industry leaders came together to offer the participants comprehensive feedback based on their industry experience and insight, making the DSC’s annual Project Exposition event uniquely rewarding and fulfilling for the participants.

Our amazing judges at the Project Expo!

And The Contestants Were…

Vogue Fashion Data — Kikki Liu and Prim (Maymani) Adhiphandhuamphai

Kikki and Prim presenting “Vogue Fashion Data”

Inspired by the movie Clueless, this project aimed to create recommendations to help people pick out their everyday outfits by analyzing clothing collections from various fashion houses. The team initially planned to use image data but later pivoted to a different approach. They incorporated cross-validation, demonstrating their commitment to refining their model. Additionally, the judges noted that while the data processing could have been more structured, the project delivered strong results with a highly practical application.

Presidential Speech Analysis — Yuyang Hu, Angelina Wu, and Isaac Zhang

Yuyang, Angelina, and Isaac presenting “Presidential Speech Analysis”

This project examined the presidential speeches throughout history to determine whether these addresses have become simpler over time in terms of syntax and phrasing. Using transcripts from the Miller Center of Presidential Speeches, the team then analysed the complexity of these speeches using the Gunning Fog Index. Their results showed a clear downward trend –when a line of best fit was applied, there was nearly a 15-point decrease in the average Gunning Fog Index over the past 250 years.

While this project lacked an immediate practical application, its thorough analysis and well-executed methodology stood out.. The team also did an excellent job of retrieving and cleaning their data and had superior code quality. Although the project employed a good index model, the judges believed that a more complex model could have been used in place of linear regression. However, given the challenges of converting speech into structured data, the judges were left very impressed by the team’s execution and insights

Predicting Job Applications — Adrian Eddy and Thomas Pequengnot

Adrian and Thomas presenting “Predicting Job Applications”

This project aimed to build a predictive system that can analyze a resume to identify a user’s most influential skills and experiences and estimate expected salaries. After importing CSV files containing company and job-specific information, the team employed data cleaning and preprocessing to finally perform exploratory data analysis (EDA). Initially, they attempted a linear regression to predict salaries but later transitioned to other methods such as logistic regression and PCA, eventually reaching an accuracy of 0.64. Their workflow was well organized, and their segmented code was clean and easy to read. While their data’s non-linearity could have allowed the team to explore alternative models, the judges believed their approach was solid. Furthermore, they commented that some visualisations could have been improved, for example, some graphs would have benefitted from smaller numerical labels and thicker lines. Additionally, they commented that their choice of dimensions and variables could have enhanced predictive power. However, the project ultimately demonstrated a strong conceptual foundation with valuable real-world applications.

ForkCast — Krystal Wu

Krystal presenting “ForkCast”

Krystal’s project sought to investigate the relationship between customer sentiment, inspection data, and restaurant openings and closures in NYC. To achieve this objective, she decided to build a model to predict a restaurant’s opening and closing status given feedback data. After collecting data points from NYC restaurant inspections and Yelp, Krystal cleaned the data and conducted exploratory data analysis to conclude that customer sentiment appeared to be more strongly associated with restaurant success than inspection results. However, she also stated that more data would be needed to predict openings and closures with greater accuracy.

The presentation was polished, visually appealing, and well-structured, earning praise from the judges who referred to it as a “future Shark Tank” project. However, while her data processing approach was creative, the chosen dataset was difficult to work with. The judges believed that a larger sample size and alternative data sets would have improved the model’s predictive power. Despite the less-than-ideal results, the project demonstrated strong analytical thinking and an entrepreneurial mindset.

Hoop-seer — Bryan Ko and Andy Li

Bryan and Andy presenting “Hoop-seer”

This group aimed to develop a web app that predicts the winners of NBA matchups using statistical data from as far back as 1996. To this end, Bryan and Andy leveraged the NBA API — an open-source tool that queries the league’s official stats database- to extract specific features for this predictive model. Their efforts eventually resulted in an AUC as high as 0.80, allowing them to accurately predict the outcomes of many games..

The judges noted that while the team had good variable creation, they could have done better data preprocessing and should have considered a broader audience that didn’t know as much about sports. Additionally, while the project had great monetization potential and a very catchy name, its practical applications were not fully explored. However, their presentation was light-hearted, creative, and personal, making them stand out. This team also used a new metric called a ‘Kill’-which constitutes an occurrence of a team making 3 consecutive successful defensive possessions-and incorporated strong visualisations in their analysis. They were also unique in showing a real-time demo during their presentation, demonstrating the predictions of four different models and establishing a unique and well-executed project with significant potential.

Analysing Startup Trends and the Future of Funding — Faizan Shamsi

Faizan presenting “Analysing Startup Trends and the Future of Funding”

Faizan’s project aimed to identify the key trends in startup funding and see if there were any correlations between economic indicators and investment patterns. After web scraping and thoroughly cleaning the acquired data he then chose specific economic indicators such as GDP and inflation rates to see if there was any relationship between the state of the economy and startup funding. His conclusion, after performing linear regression analyses within specific industries, was that the field of AI startups has been gaining significant funding, that GDP and inflation show a clear correlation with funding trends, and that grants and seed funding dominate across industries, indicating the importance of early-stage support for startups.

Faizan’s project was structured around clearly defined key questions and featured an excellent presentation flow. While the project successfully merged external factors with funding trends, further exploration of practical applications could have strengthened the project. Additionally, the judges commented that incorporating more complex predictive models, instead of just linear regression, could have enhanced Faizan’s analysis. However, Faizan excelled in data sourcing, scraping, and cleaning, demonstrating strong technical skills.

And The Winning Team Was…

In speaking with Bryan and Andy of Hoop-seers, we had the opportunity to learn more about their thought process behind the project and gather their reflections on their experience during the Project Expo.

“I love watching and playing basketball… Another thing is that a lot of the major sports leagues have very comprehensive databases,” Andy explained when discussing the origins of this project. “So it was kind of a perfect idea where it was something that we wanted to make and it was something that put a lot of modeling techniques we learned in our coursework to real-life applications.”

Although this project was inspired by personal passion and their coursework provided them with the skills needed to complete it, piecing it all together was no menial task and required a lot of deliberation and precision. “The preprocessing part is like 80% of the work because it takes so long and also your methodology is so crucial because if you spend 16 hours prepossessing a dataset but you had the wrong way of doing it, you’re going have to restart and spend another 16 hours,” Brian commented.

Despite their success, Andy and Brian recognised areas for potential improvements. When asked what they would change about their process if they were to undertake it again, Andy said that it would be beneficial to “have a clearer vision of what we want to do… making sure we knew what we were doing and being able to execute that because I feel like a lot of our time was spent kind of chasing dead ends.” He also commented on the importance of feature engineering, stating that it is “super crucial to the performance of the models and it’s not enough to just take the commonly accepted features. You gotta see if you can do some engineering to make your own features, get better results.”

A big thank you to all the judges and participants present at the NYU DSC Project Exposition 2024!

To Conclude…

The most crucial part of learning is to apply the theoretical knowledge absorbed from the classroom onto real-life situations, crafting exciting projects that tackle issues within various disciplines in society. This was precisely the mission that the DSC embarked upon which resulted in the creation of the annual Project Exposition. The event was made possible and successful by the contributions of an extraordinary panel of esteemed judges as well as a group of ambitious students that worked hard to put together incredibly innovative projects. As we look forward to next year’s Project Expo, we aim to build on this foundation, encouraging even more students to participate and showcase their groundbreaking ideas that have the potential to shape the future.

Article written by Franklin Dong and Eesha Lingala

DSC Project Exposition 2024 Recap was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

DSC Fall 2024 Recap

Data Science Club @ NYU (Center for Data Science) — Wed, 29 Jan 2025 13:37:04 GMT

Here’s a recap of the events and workshops held by the Data Science Club @ NYU to kickoff the Fall 2024 semester with fun and intellectual flair!

Info Session (09/12)

Our first event of the semester kicked off with an engaging information session where we presented our club’s purpose and previewed this semester’s upcoming events. We also took the opportunity to introduce the Executive Board and highlighted the various teams within DSC: Publication, Events, Outreach, and Marketing.

The board of directors at DSC NYU (from left to right): Vishesh Goyal (Director of Events), Namay Jindal (Director of Outreach), Ritti Bhogal (President), Jia Qi, Abie Paguio (Co-directors of Marketing), & Ananya Mittal (Director of Publications)

Pizza & Python (10/02)

Our first social event of the year was Pizza & Python where aspiring data scientists gathered to have python-filled discussions and cheesy pizza! It was a perfect blend of learning and fun, setting the tone for an exciting year ahead!

Pizza & Python

PMC X DSC Panel with Subhash Saravanan, Product Manager @ Hello Fresh (10/11)

An informative session with Subhash Saravanan on data-driven project management.

We collaborated with the Product Management Club @ NYU to host an insightful event featuring Subhash Saravanan, Product Manager at Hello Fresh. He captivated the audience as he shared his professional journey, key strategies, and the pivotal role of data and in-depth analytics in shaping his decisions and driving product strategy at Hello Fresh.

Bagels & Bytes (11/1)

Our second social event of the semester, Bagel & Bytes, brought students together for a delightful blend of fellowship and fun. The event featured engaging games, networking opportunities, and, naturally, lots of bagels. It was a fantastic way to build connections and strengthen our community!

Engaging conversations paired with freshly baked bagels

Of course, this was not all! We hosted a series of workshops and events leading up to our marquee event for the semester, Project Expo 2024, which showcased an array of exciting projects from our talented data science community. Keep an eye out for an upcoming article diving into all the highlights! Thank you for all your enthusiasm and support last semester, and we can’t wait to see you at our events this semester!

Article written by Joshua Alfred Jayapal

DSC Fall 2024 Recap was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

DSC Project Exposition 2023 Recap

Data Science Club @ NYU (Center for Data Science) — Fri, 22 Dec 2023 18:05:42 GMT

Nurturing the Diversity of Data Science

How it all Began…

From early October to late November, NYU’s Data Science Club began a series of workshops that invited speakers to mentor students in essential skills that they would need to start their own data science projects. The workshops led to students building unique projects which they presented at NYU’s Data Science Club Project Exposition on December 1st.

The series started off with a Project Development workshop led by Jason Sheu from XiFin that focused on identifying what problems exist within their field of interest and what type of project (prediction, classification, or data analysis). Participants were able to focus on the holistic development of projects so that they can apply these concepts to a wide range of data.

Data Science Students at the Project Development Workshop

The level of excitement demonstrated in the previous workshop was also seen at the “How to Process Data” workshop with Aditya Singhal, the founder of the NYU Data Science Club and current SWE at Amazon Health. Aditya led an interactive session using a Reddit dataset from Kaggle, performing beginner-level data cleaning techniques to help participants begin their data science projects.

Following this workshop, the Data Science Club hosted the GitHelp Workshop with Yucen (Lily) Li, an ML PhD student at NYU and former SWE at Meta. Here, participants learned about tips for data visualization. Throughout the workshop, participants used R, Python, and SQL to work on identifying problem statements like predicting housing prices, classifying email messages, and analyzing climate-related data to discover patterns.

Many of those who attended the workshops intended to apply the skills learned to their unique projects, from GitHub logistics to cleaning data! Everyone who attended the first workshop started with ideas, and some didn’t even have that. However, workshop after workshop, students built foundational skills needed to bring these ideas to life.

Data Science Students at the GitHelp Workshop

Participants meticulously selected their dataset, preprocessed it, and created their visualizations. By choosing either the data visualization or machine learning route (or both), each group utilized a unique method to present their solution to a contemporary problem.

On the day of the project expo, four industry professionals judged the projects while the participants presented their findings from their months-long effort: Ren Liu, an NYU math major alumni with a master’s from UChicago in data analytics and 4 years of industry experience, Alex Kass, a Yale cognitive science major with a masters from Columbia in statistics and the director of data science for MongoDB, Sophia Tee, Northwestern math undergraduate a statistics masters from Yale with 10 years of experience with companies like Verizon and Samsung, and finally Will David, who has a computer science degree from John Hopkins and a masters in engineering with 3 years of experience at companies like JPM Chase and Microsoft.

Taking a Closer Look at Participants and Projects

A total of 9 projects were submitted, and judges chose the top 2 projects in the batch. Participants were given 7 minutes to present their analysis and findings, followed by a 3-minute round of questioning from the judges.

The final day of the Project Expo series, the NYU Data Science Club Project Exposition

The event commenced with four professionals attentively noting the performances of seven teams, who confidently occupied center stage to exhibit their analyses, complete with compelling graphs and insightful inferences. The atmosphere was electric, filled with eager anticipation and a sense of excitement, as attendees eagerly looked forward to the unveiling of these data-powered solutions. Alex Frzysucha, who presented a project on French manuscript translation said “This is more than just a project presentation; it’s my opportunity to demonstrate how data can breathe new life into historical texts.” The event began, and the audience was treated to a diverse array of projects that spanned a wide spectrum of domains, ranging from plastic waste management to french literature. Here are the details of these exceptional projects that made an unforgettable impression at the Project Expo 2023:

1. EloBAR: Predicting Live Win Probabilities — Jihwang Sung

EloBAR aims to predict live win probabilities for sports events, particularly focusing on the Korean Volleyball League. It combines ELO and BAR rating algorithms to calculate win probabilities based on the sets won by each team. The fusion of these two ratings provides a more comprehensive outlook on the game’s outcome, considering both live game scores and player statistics, and produces impressive results in predicting the winning team in a game.

2. Plastic Waste Analysis — Edward Wu and Vincent Qin

This project examined plastic waste data from the years 2010 and 2019, using GDP per capita and population density as features to analyze plastic waste contribution across countries. The analysis compared the change in trends of plastic waste in both years using various economic factors. A multiple regression analysis improved accuracy to 75%, and revealed a positive correlation between the country’s GDP and Plastic Waste Generation in both 2010 and 2019.

Edward Wu and Vincent Qiu presenting “Plastic Waste Around the World” (left). Jihwang Sung presenting “EloBAR: Predicting Live Win Probabilities” (right).

3. Predicting Musical Trends with Spotify Data — Annie Zhang and Krystal Wu

This project explores the prediction of future music trends using Spotify data. The authors conducted data cleaning and made use of features like acoustic quality and valence, and highlighted trends popular within the music community. Their work is an exploratory data analysis (EDA), and they plan to delve into time series analysis in future iterations.

4. Analyzing Professor Feedback — Ibrahim Sheikh

This project focuses on analyzing three years’ worth of professor weekly feedback data. Using web scraping and regex, the data was preprocessed, and participants employed natural language processing (NLP) techniques to predict whether students would find the next week confusing at class. His SVM model achieved an accuracy of 83.75%, but the author envisions that using transformer models would result in better text analysis.

5. Training AI Agents with Personalities — Zekai Zhang and Chan Hyun Yoo

This project aims to fine-tune AI agents around specific personalities by interacting with them in hypothetical scenarios. Each response generated by the agent becomes a data point, contributing to the development of its personality. Although there were challenges in UX and training, the project demonstrates the potential for creating AI versions of individuals.

6. Society Viewed Through Popular Music — Samuel Lee

The participant(s) analyzed the connection between music and society. Using data from Spotify and the Genius Lyrics API, the author conducted data cleaning and lemmatization, revealing recurring themes in music, such as love. Additionally, the author intends to continue this analysis by broadening the scope of his work beyond the U.S. and implementing Natural Language Processing (NLP) techniques.

Annie Zhang and Krystal Wu presenting “Predicting Musical Trends with Spotify Data” (left). “Training AI Agents with Personalities” presented by Zhang and Hyun Yoo (middle). “Society Viewed by Popular Music” presented by Samuel Lee (right).

7. Manuscript Translation using Computer Vision — Alexandra Frzysucha

Alex’s project seeks to translate French manuscripts using Computer Vision (CV). Current progress involves collecting scanned images of manuscripts and employing Optical Character Recognition (OCR) to convert text into strings. The next phase will incorporate Convolutional Neural Networks (CNNs) to further improve translation. Future plans include extending the project to translating manuscripts of other languages.

At the Project Expo 2023, two standout projects left a lasting impression on both the judges and the audience — the French Manuscript Translation project and the Class Feedback analysis. These projects not only showcased remarkable technical expertise but also the ability to effectively communicate complex concepts.

The French Manuscript Translation project, spearheaded by Alexandra, drew admiration for its clear objective. What truly captivated the audience was her phenomenal awareness and exceptional presentation skills, making the project both informative and engaging.

On the other hand, the Class Feedback analysis project, presented by Ibrahim, left judges impressed with its comprehensive approach. The project encompassed a well-structured presentation that covered data collection, analysis, model development, and prediction results in a logical order.

Alexandra Frzysucha presenting “Manuscript Translation using Computer Vision” (right). Ibrahim Sheikh presenting “Analyzing Professor Feedback” (left).

The Judge’s Corner: Remarks and Feedback

At the end of the expo, the judges shared their final thoughts, with an emphasis on clarity in defining a project’s results and purpose. They encouraged participants to reflect deeply on their project choices, urging them to consider the real-world applicability of their findings. Another notable expectation from the judges was the creation of an end product from the data analyses. Be it a unique application, a product, or even a piece of algorithmically generated music, the final product serves as a tangible testament to the practical utility and innovative spirit inherent in data science.

Transparency about methodologies and assumptions in data processing was another critical point of feedback. The judges stressed the importance of acknowledging any limitations or caveats in the approach. This transparency not only enhances the project’s credibility but also provides an educational component, offering insights for others in the field.

While many presentations at the expo were lauded for their design, the judges highlighted that the delivery of these presentations is just as vital. Effective storytelling in data science is about more than just conveying information; it’s about engaging the audience and making complex concepts understandable and relatable. This skill elevates a project from merely informative to genuinely engaging, connecting the audience with the data on a more profound level.

Now, let’s pivot for a moment to reflect on the versatile nature of data science, as shared by one of the judges at the Project Expo, Alex Kass. In a candid interview, Kass revealed his remarkable journey, highlighting how data science has become an adaptable and universally applicable field. Never has there been an academic field that is as flexible and widely applicable as the field of data science. From a prospective doctor to a musician, Kass has pursued various career paths before ultimately finding his passion in data science. While reflecting on his journey, Kass expresses an astonishment for how adaptive data science is in all of his different jobs: “You can apply it anywhere and in any context” [1].

Judges Panel: Alex Kass (the first person on the left), Sophia Tee (second), Ren Liu (third), and Will David (fourth)

Like many other data scientists, Kass began his academic journey with an entirely different field of discipline: cognitive science. He got his first sense of computer programming when taking a computer science class on building neural networks, generating syntax, and constructing rules around multiple languages. Kass continued his education at NYU by pursuing a masters degree in music, aiming to find a career that would lie at the intersection of all of his interests. During the next six years, he gradually realized the importance of analyzing data in producing and managing music technology, and therefore, decided to pursue a Master’s Degree in statistics. As his passion for data analytics grew bigger, he started building machine learning models for multiple companies across various industries such as commercial real estate and music technology. No matter where he works, he always finds data analytics skills to lie at the heart of problem solving in businesses. “You never run out of things to learn as a data scientist” [1], Kass commented.

After a long journey, Kass has found his passion in solving technological problems in business through applying machine learning.With consideration of his multi-faceted background, Kass placed heavy emphasis on the practicality of the projects that students displayed in the Project Expo. In particular, he focused on how useful a certain project was when it came to providing solutions or deepening our understanding of real world problems. Another aspect of the evaluation was whether or not students used storytelling techniques to get the audience excited about the story they are conveying. Overall, judges agreed that students demonstrated outstanding enthusiasm and creativity in their projects and successfully demonstrated data interpretation and visualization skills to address real world issues.

Kass’ experience serves as a mirror for students to reflect on their own journey. With various backgrounds and academic interests, participants found resonation in Kass’ story as they themselves are still striving to craft their own path. But what they know for certain is that they all see the importance of data science in any intellectual endeavor that they set themselves up for. From multilingual translation, environmental waste calculation, music generation, to financial analysis, data plays a critical role in every single one of the students’ diverse interests.

All participants and judges

We like to call ourselves data scholars, as we seek to use data to tell stories about the real world. Philosophers usually say that we improve the world not by solving the problems but by understanding them better. The project expo served as an intellectual gathering where data scholars and professionals could connect and produce better interpretations of the world with data. The past two months have shown us the amazing possibilities of data science. Not only did we end the year with a bang by celebrating the hard work and enthusiasm of the participants, but we also experienced the thrill of watching how data deepens our knowledge of real-world problems and brings new ideas to life.

Article written by Connor, Kathy Tran Anh Ngan and Joshua Alfred Jayapal

References

[1] Tran K., (Interviewer) Alex Kass, December 1, 2023.

DSC Project Exposition 2023 Recap was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

DSC September-October Recap

Data Science Club @ NYU (Center for Data Science) — Thu, 30 Nov 2023 20:32:41 GMT

Curious to see what the Data Science Club was up to the past few months? Read more to learn about how we started off the Fall 2023 semester with a bang!

Pizza & Python

A club-wide meet-up where members got to indulge in good pizza and even greater company!

Committee Meetup

The first get-together of the year for Data Science Club’s committee members! An event for our Events, Marketing, Outreach, and Publication committees to get to know each other better as we kick off a new, data-filled semester! (P.S.: Free merch was included :) )

Project Development Workshop

This was the first in a series of workshops and open office hours that the Data Science Club held to help students learn from industry mentors and have an opportunity to develop a data science project of their own! The Project Development workshop was led by Jason Sheu, a data scientist at XiFin, who covered the different steps involved in building such a project from deciding the topic and methods to the final product in the form of a writeup.

Data Processing Workshop

Continuing the series of workshops leading up to the Project Exposition, the Data Processing Workshop was led by Aditya Singhal, the founder of NYU’s Data Science Club and a Software Engineer at Amazon Health, and gave the attendees a hands-on experience in using Python and pandas to work with SQL files and reveal the mysteries that datasets have to offer.

Spooktacular Soiree

You thought there couldn’t be anything scarier than getting a p-value greater than 1 when you have 5 minutes left in your DS4E final, but the Data Science Club is here to prove you wrong. At our Spooktacular Soiree, co-hosted with CDS’ Graduate Student Community-Building Group and Women in Data Science group on 27th October, our attendees showed us their alternate (and perhaps, better) career options through their pumpkin paintings, battled each other in a halloween-themed data science Kahoot and discovered new fashion choices by wrapping each other in toilet paper to conclude this month’s events!

This series of amazing events will continue for the rest of the semester and beyond, so if you haven’t already, you should come and enjoy what our events committee has in store for you! In particular, keep a lookout for our Project Exposition happening on December 1st where you can showcase your projects and network with industry experts!

Signing off,

Ananya Mittal and Shannon Li

DSC September-October Recap was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

NYU Data Science Review: Writing Prompts

Data Science Club @ NYU (Center for Data Science) — Tue, 01 Nov 2022 18:31:54 GMT

Unsure of what to write about? Read on for a set of data science writing prompts that will spark inspiration!

Photo by Glenn Carstens-Peters on Unsplash

Let me set the scene. You’re brimming with excitement because you’ve just found out about the NYU Data Science Review. You can’t wait to put pen to paper (or fingers to keyboard) and write an amazing article.

You settle in at your desk, whip out your laptop, and —

blank.

What to write about?

Sometimes it can be difficult to figure out the article topic, despite your passion. Struggle no more! We’ve put together a list of brainstorming questions and data science topics to help your idea generation.

Prompts:

What was an assignment you struggled with? How did you overcome it, with what skills, etc.?
When did you last learn a new skill (coding language, etc.)? Can you teach it?
What intersection of data science and your second major/minor interests you?
Have you used data to address a real world problem?
When did you last feel excited by a lecture topic? How can you dive deeper?
Why is data important? Do you have a personal or academic experience in which you were reminded of this?
What are the hidden ways that data infiltrates our daily lives?
What are your favorite hobbies and passions? How is data applied or related to them?
What is a recent research project you worked on?
How do you want to use (or how have you used) data science to impact other’s lives?

Topics

Current Events

Misrepresentation of data in the news
Deepening understanding of current issues using data
Data driven solutions to real world problems

Artificial Intelligence/Machine Learning Applications

Natural Language Processing
AI & Art (DALL-E, Stable Diffusion, MidJourney)
Chatbots (LamDA AI)
Computer vision
Self driving cars
Large language models (GitHub Copilot, Jasper.ai)
Quantum computing in data science
Network Security/Cryptography (fraud detection, homorphic encryption)

Data Handling

Databases — How to use them/what exists?
Data visualization tools and applications
Data/web scraping

Ethical Data Science

Data lifecycle
How biases and discrimination arise in modeling
Performance analysis and metrics
Information use and privacy

Theory

Reinforcement learning (Markov modelling)
Deep learning (Convolutional Neural Networks, black box models)
Probability and statistical testing (bootstrapping, hypothesis testing, etc.)

Miscellaneous

Google Cloud Platform + AWS
Coding for data science

We hope these prompts sparked some inspiration for you! Now, go forth and write! When you’re ready to submit an article for publication with us, visit this guide for instructions and the submission link.

NYU Data Science Review: Writing Prompts was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

Publish Your Writing in the NYU Data Science Review!

Data Science Club @ NYU (Center for Data Science) — Tue, 15 Feb 2022 20:33:20 GMT

A complete guide to writing for the NYU Data Science Club’s publication

Write! — Graphic by Meghana Kakubal

Introduction
Guidelines
Editing Process
Submission

Our mission is for the NYU Data Science Review to amplify the many voices in the NYU Data Science community. We’re always seeking students, faculty, and alumni to submit their writing for publication.

To understand the quality of work and types of content we are expecting to see, please refer to these publications and articles:

Towards Data Science (comprehensive guide to all things data science)
Burn Rate (Torch NYU) (discussion about personal experiences at NYU and career advice). An example article from this blog is Into the Interview.
Harvard Data Science Review (data science in relation to research/projects). An example article from this journal is Why is Data Visualization Important?
Also refer to the content already published at the NYU Data Science Review!

Whether it’s an editorial reflection on a data science experience, or a technical write-up after a research project, the NYU Data Science Review is the ideal place to showcase your work. We will further promote your writing on Medium, Instagram, and LinkedIn.

Keep reading for a thorough guide to what kind of content we’re looking for, stylistic guidelines, our editing process, and how you can submit.

Guidelines

There are some necessary components of a quality Medium article. The following is a quick list of things to check with your own draft.

Does your article have a compelling, unique title?
Does your article have a great, informative subtitle?
Does your blog include a royalty free/properly cited image (see the following link)?
Have you included five tags that you feel are relevant to what you’ve written about?
Correct grammar, spelling, and punctuation?
Factual accuracy!? Please double, triple-check to ensure.
Does your article length fall between the 5–7 minute range?
See below for citation and reference requirements.

Citation and References:

Please provide in-text hyperlinks to any resources you refer to in text. Additionally, list your references at the end of the article in this format:

[X] N. Name, Title (Year), Source

For example,

[1] R. Abelson, C. Jewett, When Omicron Isn’t So Mild (2022), The New York Times

Editing Process

Our submission form has two options — you can pitch an initial idea for an article, or you can submit a full draft.

If you haven’t begun writing yet but want to know whether Random Forest will publish an article you have the idea for, pitch it to us! An editor will work with you to workshop and approve the idea, as well as answer any questions you might have while writing.

If you have a full draft ready, first make sure it meets all of our guidelines. Then use the form to submit.

After submitting your draft, you will be assigned an editor who will work directly with you. This is to ensure that your draft is ready for publishing, not for thorough revision. Please have your writing as refined as possible before submitting.

The draft will also be reviewed by the NYU Data Science club faculty advisors. They will provide additional feedback as necessary.

After approval, we will add you as a writer on Medium and give you the go-ahead to submit your draft to the NYU Data Science Review publication.

Currently, the editors for Fall 2023 are Ritti Bhogal, Ananya Mittal and Shannon Li. We’re here to support your writing process, so please do reach out to us if you have any questions at all.

Submission

If you’ve got an idea ready to pitch, or a draft ready for editing, visit this Google form to submit: https://bit.ly/rf-submit

We look forward to reading and publishing your work!

Publish Your Writing in the NYU Data Science Review! was originally published in NYU Data Science Review on Medium, where people are continuing the conversation by highlighting and responding to this story.