Stories by Data Science Society at Berkeley on Medium

From Data to Decisions: Dynamic Pricing on Databricks Using Lakebase, Agent Bricks, and OpenAI’s…

Data Science Society at Berkeley — Wed, 01 Apr 2026 00:02:20 GMT

From Data to Decisions: Dynamic Pricing on Databricks Using Lakebase, Agent Bricks, and OpenAI’s ACP

Written by Arthur Rodrigues Baruzzi Lopes, a Data Science Senior at UC Berkeley and member of the Data Science Society at Berkeley (DSS).

Revised and edited by Aditi Bordia (Junior) and Rakshan Reddy (Sophomore), who are students at UC Berkeley studying Data Science and are also part of DSS.

Project members included Arthur Baruzzi (DSS TPM), Aditi Bordia (DSS TPM), Rakshan Reddy (DSS consultant), Tanish Kher (DSS consultant), Brandon Concepcion (DSS consultant), Kevin Hartman (Databricks), and Vijay Balasubramaniam (Databricks).

Context

Giants like Amazon and Uber have long had dynamic pricing as a backbone to their business. Other companies have been slower to adopt this practice, citing technical difficulties with implementation. However, what started as a “nice-to-have” has now slowly but surely become more of a necessity.

After all, we have entered the “now” economy, in which consumers expect immediate, personalized experiences and rapid fulfillment. This arose from the rise of digital platforms, on-demand services, and e-commerce– showing seemingly unstoppable growth due to the COVID-19 pandemic.

Databricks has noticed this economic trend and thus has created tools to enable both established and growing corporations to implement their own dynamic pricing tools. The company announced in June 2025 a brand new, powerful product: Lakebase– a lightweight, serverless, low-latency OLTP database architecture on Postgres, which enables the separation of storage and compute, built for the efficient development and use of AI agents and live data.

On the same day, Databricks also announced Agent Bricks: a native framework for building and managing AI agents directly within the Databricks workspace. With Agent Bricks, teams can create agents that reason over both structured and unstructured data, generate insights, and even act on them. From Knowledge Assistants that interpret internal documents, Genie Agents that turn natural language into SQL, to Multi-Supervisor Agents that orchestrate them all, Agent Bricks transforms Databricks from a data platform into an intelligent, conversational workspace that illuminates the AI and data black box.

Other prominent companies have noticed how important the “now” economy is. Among them, OpenAI, which, since September 2025, has allowed users of ChatGPT to fully complete the purchase of products on its platform through its Agentic Commerce Protocol (ACP). The service that was restricted to a few businesses such as Etsy is now expanding to large, multinational businesses worldwide. We are seeing great demand for businesses to start advertising their products through chatbots like ChatGPT, which led to the creation of the following project.

Project Explanation

During Spring 2025, the Data Science Society at Berkeley (DSS) partnered with Databricks and Accenture’s AI Refinery team to create a dynamic pricing solution native to the Databricks platform that connects to OpenAI’s ACP through Accenture’s AI Refinery (prior to this, Accenture and Databricks had already collaborated to introduce the Databricks Genie Agent to their AI refinery). This project shows how corporations using Databricks and Accenture’s AI refinery can enable dynamic pricing– the practice of adjusting prices in response to changes in demand and supply– and advertising within ChatGPT.

In general, for a dynamic pricing solution to function, engineers need (at a minimum) to create a model that predicts future demand of products given specific inputs (e.g., current price, quantity of units sold, quantity of units in stock) and a price optimization algorithm (given predicted demand at different price points).

Yet, this project aimed to also solve instabilities in visibility and compatibility for dynamic pricing solutions. We aimed to give visibility to the entire dynamic pricing architecture and performance, such that employees can not only easily view data on price changes, supply, and demand, but also fully understand different model evaluation metrics and discover any patterns between different pieces of data. To accomplish this, we created a chatbot with which employees can interact using Agent Bricks.

Then, we aimed to provide a dynamic pricing solution that is compatible with OpenAI’s ACP. To do so, we designed a database schema following OpenAI’s guidelines, which Accenture’s AI refinery team would use to link our dynamic pricing solution to the ACP. The following image describes the schema in question:

Figure 1. Database schema for our project given OpenAI’s ACP product specifications.

To implement this solution, the following tools were used:

Databricks’ workspace: allowed seamless integration between our Python files and notebooks with Lakebase and Agent Bricks.
— Notebooks and Python files: where the code for the sample demand forecasting model, price optimization algorithm, and policy verifier is located.
Lakebase: this low-latency OLTP database allowed for quick updates in the “product_details” table.
Lakehouse: stored tables that were rarely updated (and thus not needed in Lakebase), such as our “inventory” and “model_updates” tables.
Agent Bricks: enabled transparency for employees to see and understand different pricing policies and verify how well the dynamic pricing model is performing.
— Genie Agent: Chatbot with structured data (tables) containing information about price changes, predicted and actual demand, etc.
— Knowledge Agent: Chatbot with unstructured data (e.g., PDF documents) containing information such as pricing policies, quarterly earnings, etc.
— Multi-supervisor Agent: Chatbot that uses both the Genie and Knowledge Agents to answer complex questions posed by employees, which needs understanding of both structured and unstructured data.

The framework of our solution is divided into three parts. The solution’s backbone is the ML portion (the middle section of Figure 2), which changes the pricing of products. First, the company stores its data in Lakebase and Lakehouse. It then feeds the necessary data tables to the demand forecasting model (to either re-train the model with the new data or just for prediction, at the company’s discretion) to predict demand for all products for the next timestamp. The predictive model is then used in the price optimization algorithm to predict the demand at different price points, thus choosing the price point with the highest predicted profit. At the last step of the process, there is a policy verifier that acts as a safety net to ensure the new price follows the company’s guidelines (e.g number of price updates at a specific time interval). If it passes the verifier, the “price” attribute of the “product_details” table is updated, and a new row is added to the “model_updates” table to document this change.

Another portion of the framework is the one that adds visibility to the dynamic pricing solution using Agent Bricks (upper portion of Figure 2). The Genie Agent uses the structured data located in Lakebase and Lakehouse. The Knowledge Agent uses unstructured data (like text files) to extract information. The Multi-Supervisor Agent uses both of these agents to actually deliver answers to employees’ questions. Some sample questions that this agent can answer, for example, are:

What is the RMSE and MAE of the demand forecasting model for each product from January 1st, 2025, to April 1st, 2025?
How many times did the price for products A, B, and C change over this week?
Give me a summary of the pricing policies.
How correlated are pricing changes between each product? For example, when a specific product decreases its price, does it tend to have a positive or negative correlation to other products?

The last portion of the framework connects the dynamic pricing solution to OpenAI’s ACP in order for products to be displayed and sold in ChatGPT. It involves creating a Payment and Checkout Agent, which Accenture’s AI refinery team is currently developing.

Figure 2. Dynamic Pricing project framework which highlights the predictive ML (middle), employee visibility chatbot (top), and ACP connection (bottom) portions.

The project deliverables were: a presentation explaining the project, a technical documentation of all the work, and the fully functional, parametrized dynamic pricing solution in the Databricks workspace. The workspace had the database (with sample tables), all the code for the project (including a demo showing how it all worked), and the agents created for the employee chatbot functionality.

Conclusion

This dynamic pricing framework is greatly beneficial for a wide array of users, such as large-scale retailers, small businesses, and marketplace providers. For all of these users, this framework unlocks dynamic pricing for every product, powered by demand patterns, discounts, corporate policies, and more. It additionally allows employees to explore retail insights with the Multi-Supervisor Agent while leveraging OpenAI’s ACP to increase consumer reach. Overall, this enables businesses to maximize revenue, stay ahead of competitors, and streamline complex pricing across thousands of SKUs.

Dynamic pricing is not just about increasing revenue, however. This solution has the potential to improve customer satisfaction by specifically aligning prices with customer demand and market conditions, thus maintaining competitive and fair pricing. It also increases operational efficiency, as this automated system reduces the manual effort of pricing by streamlining the entire process and enabling rapid response to market changes.

Overall, this project shows that dynamic pricing can be much more than a forecasting model. When combined with Databricks infrastructure, agent-based visibility, and compatibility with OpenAI’s ACP, it becomes an aid for businesses to act faster, price smarter, and prepare for a more conversational future of commerce. As customer expectations and technology continue to evolve at almost intractable speed, solutions like this will no longer be a competitive advantage, but a business necessity.

Digital Da Vincis: Exploring the AI Renaissance

Data Science Society at Berkeley — Fri, 03 Nov 2023 03:11:42 GMT

AI technology is in a constant state of development, and its image recognition and generation techniques have been seen in creating art, editing photos, and recognizing art styles, movements, and art pieces. Recently, AI tools that have the capability to create digital art and edit uploaded photos to follow a particular theme, like turning photos into professional headshots, have been gaining popularity due to their presence on a variety of social media platforms — especially TikTok. The limits of art creation using image generation are endless — however, there have been controversies about whether AI generated pieces are outperforming real art.

With image generation, art has become a lot more accessible and widespread, and artists themselves have utilized this resource for inspiration in their creative process. With Generative Adversarial Networks (GANs) paired with deep learning algorithms, original artworks can be created, becoming a source of inspiration for aspiring artists. Not only has image generation influenced art and design, but it has also revolutionized the photography industry through generative photo editing and creating new photographs from given photographs. AI image recognition and generation have provided technologies to transfer art styles of famous artists to newly created artworks, analyze art styles and attribute them to certain time periods or past artists, and additionally help in art restoration techniques. AI image recognition and generation have expanded the possibilities within the art field, offering an endless variety of new tools and concepts for artists, art enthusiasts, and the creative space in general.

But while AI has created novel opportunities, it has also raised important ethical and philosophical questions about the nature of art and human creativity. The rise of AI-generated art has prompted discussions on authorship, copyright, and the nature of creativity. It challenges traditional notions of art, raises questions about the role of the artist in the creative process, and brings up the most important controversy of whether what is generated is truly qualified as “art”.

This was seen especially in Colorado where an AI-generated picture by Jason M. Allen won the award for emerging digital artists at the Colorado State Fair in 2022, causing many artists to feel frustrated. Many argued that it was unfair and how anxiety-provoking it was that art created by AI could receive such a high honor above all other artists. The rising controversy has led many artists to question what is now defined as art in this modern day and age.

Photographed by Kevin J. Beaty/Denverite, Jason Allen and his AI-assisted illustration, “Theatre D’opera Spatial” in Denver’s Pugasus Studio. Sept. 2, 2023

A similar incident occurred during a photography competition in March of this year. The artist, Boris Edlagsen, made headlines by declining a prestigious award at the World Photography Organization’s Sony World Photography Awards. The reason behind this unprecedented decision was the nature of the winning artwork itself — an AI-generated creation that challenged the very foundations of conventional artistic recognition. Boris Eldagsen, the recipient of the award for his piece titled The Electrician, pushed the boundaries of what we consider traditional art. The work, resembling an aged photograph, “was made by submitting language to an AI generator many times over. In the process, the work was altered using techniques known as inpainting, outpainting, and prompt whispering”. The award was given in March, and in April Eldagsen announced his decision to decline the award, calling for a dialogue within the photography world regarding the ever-evolving boundaries of the art form and whether AI creations belong within its scope. In response to Eldagsen’s actions and subsequent statements, the World Photography Organization expressed its discontent, citing deliberate attempts at misleading and invalidation of the warranties provided. As a result, the organization suspended its activities with Eldagsen and removed him from the competition.

Boris Eldagsen, “The Electrician”

Although image recognition and generation have created powerful tools for the public to feed their creativity, art generated by AI itself has been a subject of debate and discussion in the art world, with controversies related to authorship, originality, and the role of technology in creative processes. Such incidents often lead to thought-provoking conversations about the boundaries of art and the impact of AI on the art community. Despite the AI image generation’s controversy in creating artwork and while the technology is not yet perfect, it has developed greatly to encompass the creative realm but perhaps may never possess the touch that a real artist gives to their piece.

Key Points:

AI technology, particularly in image recognition and generation, is rapidly advancing and impacting various aspects of the creative world, from art creation to photo editing.
The limitless potential of AI in art creation has raised questions about whether AI-generated art can outperform traditional human-created art.
AI’s influence extends beyond art and design, revolutionizing the photography industry with generative photo editing, style replication, art analysis, and restoration.
The rise of AI-generated art has led to a range of debates within the art community. These debates encompass concerns about authorship, the authenticity of AI-generated art, and the evolving nature of art in the digital age.
Instances like the AI-generated artwork winning a competition at the Colorado State Fair in 2022 and Boris Eldagsen’s decision to decline a prestigious photography award highlight the controversy and shifting perceptions of AI in the art world.

Further Questions to Consider:

What is the future of AI-generated art, and how will it impact the traditional art world in the coming years?
How can we establish guidelines or criteria to distinguish AI-generated art from human-created art?
What are the implications for copyright and intellectual property rights in the context of AI-generated art?
Is there a need for a new category or classification system in art competitions to accommodate AI-generated works?
How can AI-generated art coexist with human art, and can it offer new opportunities for artists?
To what extent can AI replicate the depth and emotion in art that is often associated with human creativity?
What ethical and legal challenges arise when AI-generated art becomes a significant part of the art market?

Further Resources to Look Into:

https://www.cpr.org/2023/09/06/jason-allens-ai-art-won-colorado-fair-feds-deny-copyright-protection/
https://www.artnews.com/art-news/news/ai-generated-image-world-photography-organization-contest-artist-declines-award-1234664549/
https://news.harvard.edu/gazette/story/2023/08/is-art-generated-by-artificial-intelligence-real-art/
[Berkeley students] UGBA 39E: AI Ethics for Leaders — This course enables future leaders to understand and navigate the many decisions necessary to ethically leverage AI in professional settings, in business, government and nonprofit settings. It covers the basics of AI, relevant ethical frameworks, prominent ethical issues, and how organizations have addressed those issues previously. By the end of the course, students will be fluent in evaluating the impact of AI systems on individuals and society. They will be able to draw upon cases where leaders have been confronted with ethical issues and some of the tools that have helped resolve them.

Digital Da Vincis: Exploring the AI Renaissance was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

Acing the Data Science Interview: Words of Wisdom From Past Data Science Interns

Data Science Society at Berkeley — Tue, 17 Oct 2023 22:27:00 GMT

Lillian Jiang — Data Science intern @ JP Morgan

Lillian Jiang is a third year studying Computer Science and Business Administration. She was a data science intern at JP Morgan in Chicago last summer and made the most out of her chill 8–4pm corporate lifestyle while also learning all about debit cards 💳. Lillian is currently the VP of Marketing for DSS and has been a social good consultant for the last academic year.

How did you approach the process of data science recruitment? What was your general timeline like?

I started recruiting for data science roles at the beginning of September. Over the next few months, I had online assessments and interviews for companies. Most companies start wrapping up their internship recruitment process around late November and December, which was when I received my offers.

The recruitment timeline for most companies varies, but it’s typically a resume drop, online assessment, recruiter phone screen, technical on-site, and cultural interview.

What were the biggest challenges in data science recruitment? How did you overcome these?

My biggest challenge was knowing what would be asked in data science interviews. Since the data science process is so broad, there are so many different things that can be asked in the interviews. I felt overwhelmed thinking about all the different concepts I had to prepare for.

Someone from DSS recommended the “Ace the Data Science Interview” book. After flipping through some of the pages, I immediately bought it and read the whole book. It was extremely helpful in giving me a step-by-step walkthrough of the data science interview process. Whenever I was unsure about how to apply to companies and what to expect in interviews, I just consulted the book.

What did you learn from your internship? How would you describe the overall experience?

I enjoyed my internship experience very much! I filmed a vlog during my internship which is pretty representative of my time there. I learned a lot about how to make decisions using data and attended a lot of intern speaker events which exposed me to new industries and topics!

How did you deal with rejection and motivate yourself during the recruitment process?

Some weeks when I was feeling really sad, I would just take a break from recruitment. I reminded myself of other people’s journeys in data science. It was comforting to know that no matter what happens, you will find your way.

Last fall when I was recruiting, I knew that one of my top goals and priorities was to get an internship, which is why I took a lighter course load. When I was feeling unmotivated about recruitment, I would think about the situation in which I would have to recruit during the following semester with heavier classes, which scared me. This fear kept me motivated.

Bella Chang — Data Analyst Intern @ NerdWallet

Bella Chang is a current senior studying Data Science with a minor in Public Policy. This past summer she worked as a Data Analyst Intern in the Marketing Analytics & Business Intelligence department at NerdWallet, looking mostly at SEM/SEO campaign analysis and modeling revenue return 💰 from marketing campaigns. Bella has spent 2 semesters working as a Project Manager within DSS in the Social Good committee, and has also been an Academic Development mentor and director in the past for DSS’s DeCal Introduction to Real-World Data Science.

How did you approach the process of data science recruitment? What was your general timeline like?

I started recruiting around October but didn’t seriously begin looking for jobs until winter break. I wanted to apply to as many companies as possible in a wide breadth of areas I was interested in: I ended up applying to data science, data analyst, and even consulting roles. Many of these appeared at different times, generally with data science internships near the beginning of the school year (early fall) and data analyst/consulting roles in early winter (Jan/Feb). I practiced for OAs and technical assessments throughout the start of spring semester, around Jan-March. I received many of my offers around March/April and my offer for NerdWallet in May.

What were the biggest challenges in data science recruitment? How did you overcome these?

The biggest challenges were: 1) catering my application to every single role and 2) preparing adequately for technical and regular interviews along with my regular schedule.

Here’s how I dealt with the first challenge: I had one master list with all of my roles and descriptions. For each role, I would often choose the roles that I believe are most applicable to the role and use those as my experiences in my resume for that application. I would also make sure to properly read descriptions of roles and carefully scan through my resume for any skills/accomplishments that might correspond with those requirements/desired skills.

For the second challenge: I made plans for rehearsing SQL that catered specifically to my schedule, and also asked a lot of my recruiters if they could give me any tips on what they’d recommend I prepare. This worked pretty well!

How did you deal with rejection and motivate yourself during the recruitment process?

In terms of rejection, I think it was a journey to get to a place of feeling okay with everything. It was extremely frustrating for me at first but eventually, I learned that rejection is an inevitable part of recruitment. You always need to remind yourself that even the best, most qualified people are rejected by companies too. This really takes time but honestly, I’d recommend just starting to apply and getting familiar with the feeling.

In terms of motivation, I think it was nice to set limits and goals for myself. Creating schedules and to-do lists that are updated based on my schedule. I also limited myself from talking too much about recruitment unless it was in a net positive direction. Berkeley culture often paints recruitment as an integral part of life that everyone must be succeeding in — and sometimes people will get into conversations about recruitment to gauge how much time others are spending in recruitment and even to boast about how much time they’re spending on their own recruitment process. Remember everyone is different and you do not have to engage in these conversations if you don’t want to!!

What did you learn from your internship? How would you describe the overall experience?

I learned a lot about SEM/SEO campaign analysis — how marketing campaigns are conducted, run, and randomized! I got familiarized with campaign tools like Google Ads and SA360. I also learned a lot about how to present data science findings to different teams and how data gets processed and sent over to the marketing teams — TLDR: lots of communication between different teams, including Business and Data Engineering. My time with the data science/analysis team also gave me some insight into how modeling/predictions work and what limitations there are in modeling. Overall, the experience was super eye-opening to me. I got to see how data science operates on a daily and eminent level — through our Google search engine — and get a glimpse into data science from the business lens! I also learned a lot about consumer experience and how to think like a consumer, and how that influences a data scientist’s experimentation. These were definitely experiences I couldn’t have learned in the classroom and I’m super appreciative of being exposed to that opportunity.

Nikki Trueblood — Data Analyst Intern @ LeanTaaS

Nikki Trueblood is a current junior studying Data Science with an emphasis in Social Welfare, Health, and Poverty. This past summer she worked out of Santa Clara as a Data Analyst Intern at LeanTaaS, a late-stage healthcare tech startup that uses data science to improve operational efficiency in healthcare 🏥 (optimizing patient wait times, reducing nurse overtime hours, etc). Nikki has been involved with DSS for 5 semesters as a Social Good Consultant and Project Manager, working with companies such as Change.org, The Nature Conservancy, and Medic.

How did you approach the process of data science recruitment? What was your general timeline like?

This was my first time going through data science recruitment, so my approach was mainly mass-applying to roles including data science, data analyst, and occasionally swe/consulting/pm. I began working on my resume, practicing interview questions, and applying to companies in the fall and got an offer in November. Although I was tempted to take this first offer, it didn’t align with what I wanted out of an internship — connections and learning experiences in a field I was passionate about. Ultimately I moved on and continued to recruit which was really scary because I felt like I was starting all over again at square one. However, I’m so glad I did because during March/April I got an interview and then an offer from LeanTaaS and I was much more excited and passionate about the work I would be doing.

What were the biggest challenges in data science recruitment? How did you overcome these?

One of my biggest challenges was frustration with the overall system of the recruitment process. Given that this was my first time applying, I wasn’t expecting many of the struggles of current data science recruiting. Seeing thousands of applicants for a single LinkedIn job posting, having a recruiter tell me I was exactly what they were looking for and then not move me to the next round, and receiving technical assessments that tested me on skills that were irrelevant to the job description were all very frustrating. To overcome this, I think it was important for me to remember that applying to jobs is a two-way street. Just as companies are considering whether they want me, I should be considering whether I want them. When I think about it that way, I realize I may not want to work for a company that tests me on irrelevant skills, never responds to my job application, etc.

Another big challenge was preparation for technical interviews over a long period of time. I think a big help is to get a buddy to recruit with so you can hold each other accountable to do practice problems. I also think things like Leetcode SQL 50 are helpful because they give you an end goal to get to. But also recruiting can be exhausting and making sure I give myself breaks and don’t burn out were really important.

How did you deal with rejection and motivate yourself during the recruitment process?

I dealt with rejection mainly by reminding myself of a few things.

My self-worth is not tied to my career or ability to get a job.
The job market and the job application process are somewhat random and that’s not in my control.
Even the most intelligent, hardworking, talented people experience rejection when going through recruitment.
This is not the end of the world nor my career — I have so many years ahead of me and 10, 20 years down the line it’s not going to matter what my college internship was.
I may need to be rejected from this company in order to find the company that’s meant for me.

In terms of motivation, I think it’s good to not take things too seriously so that you don’t dread the application process. For a while I was able to trick myself into feeling like applying was a little game or a fun thing to check off my to-do list every day. If I got rejected, I’d be like “eh oh well… anyways let me apply to 5 more jobs.” Also, building recruitment into your routine can be helpful so that you turn it into a habit — you don’t think about being motivated to brush your teeth every day, you just do it.

What did you learn from your internship? How would you describe the overall experience?

On the technical skills side, I was able to work with new tools such as AWS/Amazon S3 and the Atlassian ecosystem (Jira, Bitbucket, Jenkins, Confluence) as well as develop skills from class like Python, Git, best coding practices, and data manipulation.

On the soft skills side, I learned how to tell stories with data based on the stakeholders, ask the right questions to do my work effectively, adapt to working with a variety of teams by adjusting levels of technicality, and make connections with other interns and full-time employees. I also learned a lot about how the healthcare industry works in general especially the ins and outs of shifts, schedules, and the patient experience for chemotherapy and other types of infusion.

My overall experience was great — I loved the laidback yet fast-paced startup culture that allowed me to take on interesting projects and really mold my experience to make it what I wanted. I also really loved doing work that was really impactful!

Acing the Data Science Interview: Words of Wisdom From Past Data Science Interns was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

Getting to Know Bailey Farren, Berkeley Alumna and CEO of Perimeter

Data Science Society at Berkeley — Fri, 06 Oct 2023 02:22:17 GMT

By: Christina Kim, Madhuri Suresh, Medha Iyer

Perimeter, founded by UC Berkeley alumna Bailey Farren, is a mapping platform designed to enhance information sharing among public safety agencies and residents during disasters. Inspired by her family’s experiences as first responders, she created Perimeter to bridge the gap between public safety agencies and communities. Despite initial challenges as a female founder in the male-dominated tech and public safety sectors, Bailey persevered, raising awareness about the urgent need to address climate change-related disasters. In addition to being a CEO, she made the Forbes 30 under 30 list and is currently Miss San Francisco. Discover her journey, brimming with inspiration, hurdles, and triumphs, as she passionately built her company from the ground up.

What is Perimeter? What was your motivation behind starting Perimeter?

Perimeter is a mapping platform that shares information with each other about natural disasters. Originally, I started the company after being exposed to wildfires in my hometown, the first fire being the Tubbs fire in 2017. My family and I had no information on what to do, where to go, or how to get there.

At the time, I thought that first responders must have access to this information. I thought what they needed was a solution to connect that information to residents, but I realized that they don’t even have that information sometimes. This is why Perimeter was started.

How has growing up with first responders shaped you into who you are today and would you say they influenced the creation of Perimeter?

My conversations about the problems Perimeter faced started at home. My dad was a firefighter, so growing up in a first responder environment made it easier to walk into these conversations.

My father would always find it so surprising that we could track a burrito to our door, yet he wasn’t able to be tracked in his field. “Why aren’t you being tracked in the field when I can track a burrito to my door?” I began asking him as many questions as I could and thankfully didn’t experience much resistance from the solution perspective because we developed Perimeter hand-in-hand with first responders and their input.

Did you have any trouble getting people to recognize this specific problem that Perimeter is tackling?

Five years ago, investors would tell me that the campfire in Paradise and other natural disasters were flukes. However, climate change is not only increasing the frequency but also the severity of these types of disasters. As one of the first founders focusing on public safety, we are speaking to a current problem and future problem. It’s taken a year, but people are starting to understand the severity of climate change now.

What was it like first getting your company started? Did you face any barriers as a woman?

Less than 2% of venture capital goes to female founders. Being a female founder poses its own unique challenges. Tech and Public Safety is a very male-dominated field, so as a 22-year-old woman just graduating college, a lot of people didn’t think I would stick it out. A lot of people thought I would quit.

How was your Berkeley experience and in what ways has Berkeley given you the skills to do what you do?

I was homeschooled my whole life leading up to Cal. Berkeley really felt like heaven, and there was so much diversity in terms of background and thoughts. I was exposed to so many perspectives and interests that made my world feel so exciting and colorful in ways that I just never really experienced growing up in a small homeschooled community in Santa Rosa. It just totally opened my world.

Did you always know you would get into tech/entrepreneurship?

I always wanted to be an entrepreneur. My next-door neighbor and I started a baking company when I was 12. We went door to door, and one day we even made $30! From a young age, I was always tinkering with something business-wise growing up, but I never would’ve imagined I would go into tech until I took my first SCET class at Berkeley. I realized that even if I didn’t have a technical background, I could still work with a team that was diverse and had strong technical skills. As long as I could develop leadership skills and communicate with the rest of my team, I could still play a big role in developing my product.

As a UC Berkeley Alumni, do you have any advice for current students also looking to start their own company?

When you’re sitting in class, whether you’re aware of this or not, you could be sitting next to someone who could be a great co-founder or investor. I had a professor be our first investor who led our second round of funding as well. Be on the lookout for and engage with amazing students, professors, GSIs, and people who could eventually play a role in the next chapter of your life. One of our first employees was my GSI. Everybody is looking for a great idea, and I think if you’re always on the hunt for things that can be improved, or challenges that you or your friends and family face, that is a place that tends to yield a lot of good ideas.

What made you want to run for Miss San Francisco?

I attended Miss California and met so many unique women, all so focused on developing solutions. I think it’s very easy to get exposed to problems and tell ourselves the story that there is nothing we can do about them. Being at Miss California, I met so many women thinking about the same problems but also thinking about solutions. I knew this would bring me face to face with things that would challenge me. I wanted to prove to others and to myself that women could step into both the entrepreneurship and pageantry worlds and make an impact.

Do you have any tips for being confident and improving stage presence?

One of the most basic things that everyone could benefit from is posture. It is really easy to want to make myself small when I am not feeling confident. What we learned is how to roll our shoulders back, stand up straight, and not shrink.

I would say that overall, the most impactful thing I learned was to demonstrate confidence. When you are first learning to fundraise as an entrepreneur, everyone tells you how important it is to come off as confident, but what wasn’t clear was how exactly to do that. What I learned is you have to find what makes you feel confident. To me, it meant being so focused and certain about what I was working on and why Perimeter needed to exist. I am totally unshakable in what my values are because I have gotten to the bottom of what we are doing and why we are doing it. These values ensure I’m never thrown off.

In the pageant world, even something as simple as walking brings confidence. Some people just have that incredibly energizing stage presence. Competing in California, I was surprised that no one had taught me how to walk and how to stand, especially when walking into a board room. Being able to display confidence physically contributes to how people see you, and I’m grateful to have learned the importance of stage presence.

How was it when you first found out you made it to Forbes 30 under 30?

Initially, I was shocked because it came out of nowhere. Forbes doesn’t tell you you are on the list until the list is released. One day, my cofounder sent me a screenshot of the Slack we were added to, asking me, “Does this mean we’re on the list?” Sure enough, we were there. Forbes 30 under 30. I was on it for enterprise technology, not social impact, so I was excited and surprised. It definitely had a big impact on us, helped us fundraise, and opened doors I didn’t even know it would.

What is next for you and for Perimeter?

Other than fires, floods, hurricanes, and preparing for other potential disasters as well, we are really focused on implementing the platform in regions that experience a very diverse set of emergencies and natural disasters. Right now we are deployed in California and Nevada, but our goal next year is to expand to all over the US.

This is primarily what I’ll be working on, but I am also looking towards building a community and support system of female founders interested in tech and entrepreneurship. I am excited about events this year focusing on empowering women in the Bay Area.

Thank you for reading this interview with Bailey Farren! We hope you found it encouraging and inspiring to stay curious, be resilient, and believe in your ability to create positive change.

More Resources:

Getting to Know Bailey Farren, Berkeley Alumna and CEO of Perimeter was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building Bridges: Identifying Gaps Between Data-driven Predictions and Decision-Making

Data Science Society at Berkeley — Thu, 28 Sep 2023 04:31:27 GMT

By: Sandya Wijaya

The recent explosion of “big data” in science, industry, and government has placed data at the forefront of innovation and problem-solving. In criminal justice, data is used to predict economic well-being at a granular level with mobile data and satellite imagery. In medicine, data is used to prioritize patients for medical interventions based on their predicted risk of complications and treatment plans. In public health, data is used to allocate fire and health inspectors in cities based on predicted probability of violation being detected upon inspection.

However, going from making a prediction to making a decision is not an easy jump. Bridging this gap requires understanding the assumptions that underlie a prediction. This article will outline the 3 main gaps that one must assess when harnessing data science for decision-making.

1) COMPLEXITY

The first gap to assess is complexity — predictions don’t always tell the whole story, and domain knowledge is integral to any data-powered problem.

Employing supervised machine learning (SML) techniques without understanding underlying assumptions can significantly compromise the validity and usefulness of conclusions.

For example, if a hypothetical medical model indicates medicine A has a positive effect on disease B’s treatment, only experienced clinicians can weigh the benefits against the side effects to determine whether such treatment is justified. It could be the case that though medicine A effectively treats B, the side effects will leave patients in worse health. Without including such domain expertise, a data scientist may go forward with the simple conclusion that medicine A should be used to treat disease B, and doing so can have fatal effects for patients worldwide.

There needs to be more attention paid to the limitations of pure prediction methods. Domain knowledge is integral to using data science for decision-making.

2) INFLUENCE

Another gap to assess is incentives and manipulability — resource allocation may influence the outcome you are predicting.

One example is New York City’s Firecast algorithm, which allocates fire inspectors according to the predicted probability of a violation being detected upon inspection. One characteristic to assess this could be how old a building is — an old building with old wiring is more prone to a fire breaking out, and thus it is important for fire inspectors to be allocated here. However, a new building owner can anticipate a low probability of being inspected based on these characteristics and may reduce efforts for safety as there is no risk to their business no matter their actions.
Another example is the market pricing system (MPS) of British Columbia, which uses data from timber sold at auctions to predict the prices that would have been obtained if a tract harvested under a long-term lease had instead been sold via auction. However, a lease-holder can intentionally bid artificially low in auctions, knowing that this would influence the predicted prices and lower their costs of harvesting from long-term leases.

In such cases, it is important to go back to the drawing board and assess the manipulability of characteristics that we are inputting into our SML models, as well as data collection techniques.

3) RESPONSIVENESS

The last gap we will discuss is responsiveness — certain groups may be more sensitive to interventions than others, and some may cost more than others to influence.

Using the same example of allocating fire inspectors to buildings, in the case that two buildings both fail fire inspection, to whom do we allocate resources to fix this issue?

One building may be at higher risk of fire due to old wiring, but it is this condition that makes it difficult to replace the wiring. Meanwhile, the other building may have a lower predicted risk, but it is more feasible to make substantial improvements.
Another responsiveness factor to consider is cost. If violations entail fines, some firms may be more sensitive to the prospect of fines than others.

Answering such questions involves estimating the expected improvement in overall quality of units (e.g., food poisoning rates).

PHOTO: GREGORY REC/STAFF PHOTOGRAPHER

All in all, consistent and efficient estimation of causal effects can be achieved by modifying SML techniques. One needs to include lessons from decades of multidisciplinary research using empirical evidence to inform policy.

Key Points:

Numerous gaps exist between making a prediction and making a decision. Bridging these gaps requires understanding the assumptions that underlie a prediction.
Causal inference is based on assumptions that are not directly testable and thus require domain expertise to verify.
Data is not always objective; it is important for us data scientists to fully and carefully assess how incentives and manipulability plays a role data-driven policy.
Resource allocation goes beyond who is most at risk, it also asks the more complex causal inference question of expected benefits based on who is most responsive to recommendations.

Further Questions to Consider:

What are examples of mistakes that can be made if we allocate resources based purely on prediction problems?
Can you think of your own example of a prediction-resource allocation problem? Does it suffer from Athey’s concerns?
What are the ethical concerns that go along with using data analysis for resource allocation?
How much domain knowledge is needed in order to close the gap in complexity and successfully apply machine learning techniques toward solving policy issues?

Further Resources to Look Into:

Source Article https://www.science.org/doi/10.1126/science.aal4321
Susan Athey’s Bio https://gsb-faculty.stanford.edu/susan-athey/biography/
Machine learning for environmental monitoring (Hino et al.) https://www.nature.com/articles/s41893-018-0142-9/
[UC Berkeley students] ENERES131: Data, Environment, and Society — This class discusses the difference between prediction and causal inference, and the intersection of using data science for environmental issues.
The Importance of Domain Knowledge: https://blog.ml.cmu.edu/2020/08/31/1-domain-knowledge/

What can you do?

Get a free subscription to the above news platforms with your Berkeley account!
Check out Berkeley’s updates/news at…
Articles + Resources: guides.lib.berkeley.edu/CEO
STEM: news.berkeley.edu/category/research/technology_engineering/
DEI: nature.berkeley.edu/diversity-equity-and-inclusion/news

Building Bridges: Identifying Gaps Between Data-driven Predictions and Decision-Making was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

An Interview with a Past Decal student, mentor, and director

Data Science Society at Berkeley — Thu, 07 Sep 2023 15:39:09 GMT

By: Lillian Jiang

Manas Khatore entered UC Berkeley in Fall 2021 as an intended Data Science major. Without any prior experience in the field, he enrolled in the DSS Decal (Introduction to Real-World Data Science) during his first semester. He later became an Academic Development mentor for the next two semesters and took up the role of the director of the Decal in Spring 2023. Manas is now a senior advisor in DSS and a tutor for Data 100. Read about his experience taking, teaching, and directing the Decal!

Past Decal students

Why did you apply?

I wanted to join a club community and was drawn to the welcoming atmosphere from DSS. However, I felt like I was not ready to join a committee because I had no prior data science experience. The Decal seemed like a great way to still be able to interact with people from the data science community and make friends.

I was also taking Data8 (Introduction to Data Science) and really enjoyed the class. The Decal seemed like a perfect opportunity to apply the fundamentals I learned in Data8 on a project I’m interested in.

How did you manage taking the Decal with your schedule?

The Decal was a 3–4 hour a week time commitment for me. The weekly check-ins that my group had with my DSS mentor was also a great way to keep us on track. Even if we all had a busy week, we could count on our weekly check-in to work on the project.

The Decal was also something I looked forward to. I really enjoyed the topics and meeting my Decal group. I met one of my close friends today from my Decal group! The Decal never felt like a chore for me.

What was your favorite lecture?

My favorite lecture was on Linear Regression and K-means clustering. I had always been interested in Machine Learning, but it felt like a very intimidating topic. The lecture was well-taught and introduced the topic in an approachable way that ignited my interest in ML. After taking that lecture, I was excited about ways I could apply it to my own project and to explore it further!

What was your project about?

My group was all pretty interested in environmental justice in climate. We actually decided on our project based on a scene from the movie Parasite, directed by Bong Joon-ho, where a flood devastates the poor Kim family who lives at a lower elevation but doesn’t affect the wealthy Park family who lives at a high elevation.

From that scene, we were inspired to explore the relationship between income, elevation, and flood risk. We found that there was not as high of a correlation between flood risk and elevation as we thought. Factors like quality of environment and drainage systems were more correlated with flood risk. However, there was still a slightly significant negative correlation between income and flood risk.

How was it like working in a group for the data science project?

I really enjoyed working with my group for the data science project! I would look forward to our weekly check-ins, where we would work on the project with our mentor and sometimes get boba or FreshRoll after.

Since we were working in a group, it was easy to split up the work based on the data science lifecycle based on our strengths. For example, I worked on the statistical testing part of the project while other members would work on data visualization.

We would also schedule times beyond our weekly check-in to work together or call or in-person. It was great to be able to bounce ideas off each other and ask questions.

What did you enjoy most?

Since I didn’t have any data science experience coming into college, I really enjoyed how the Decal gave me a lot of confidence to incorporate my data science skills to a real-world project.

I also really enjoyed seeing the other groups’ projects at the Decal project symposium. It was cool to see all the different topics and projects everyone worked on.

How did you support your students as a mentor?

As a mentor there is a fine line between supporting students and doing their project for them. I did my best to support the students by starting the conversation between them and asking guiding questions. It’s important to give students resources so that they can come to conclusions by themselves.

When helping students come up with their project idea, I knew that it was important that everyone was interested in their topic or else they would not be as motivated to work on their project, so I asked questions like “What sort of topics are you interested in?” and “What are your hobbies?”

What do you look for in a decal applicant? What is a successful decal student like?

A successful Decal student is one who is willing to put in the effort and learn. You don’t need to and shouldn’t know everything about data science, but you should come in wanting to learn and apply data science.

You should also be willing to talk to people and make it a social effort.

Thank you for reading this interview with Manas Khatore! We hope you found it helpful and encourage you to apply to our decal.

Next Steps:

An Interview with a Past Decal student, mentor, and director was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

5th Annual UC Berkeley Data Science Forum: Tech for Social Good

Data Science Society at Berkeley — Tue, 18 Apr 2023 04:11:53 GMT

By: Sandya Wijaya

This past weekend, data science clubs Data Science Society, Big Data at Berkeley, and SAAS teamed up to spearhead the 5th Annual UC Berkeley Data Science Forum. The UC Berkeley Data Science Forum is the largest data science-related event at UC Berkeley, and an all-day career-driven event on all things data. This year, over 400 people signed up for the event! This year’s theme was Tech for Social Good — the intersection of data and social impact. The goal of the event was to educate the UC Berkeley community on how to carry social good values in tech work and/or use tech to drive solutions to societal issues.

To show that this intersection comes in a spectrum, speakers came from a wide variety of different industries — NPO, big tech, startup, research, government, and so on. The event started off with 4 speaker presentations, then broke off to a lunch break before reconvening again for a speaker panel. Other than learning, the event provided other benefits such as resume drops, networking sessions, free food, and also gave away a raffle prize of a JBL speaker!

Credit: Sandya Wijaya

Summary of presentations and panel

Dave Thau, the Data and Technology Global Lead Scientist at WWF gave a presentation on “Data, Technology, and Social Good.”

A recent survey of WWF staff revealed that there are currently 100 AI projects in progress. Many were machine learning on imagery for monitoring land cover and use. Some used natural language processing to monitor internet traffic of various sorts, including early warning and campaign sentiment. There were also some prediction projects: deforestation, habitats under climate change, fire spread, PAWS.
Dave dived into various case studies but most significantly, he explained Forest Foresight (FF), an early warning system developed by WWF in partnership with other companies to predict and prevent deforestation. FF takes in satellite images of forests, and inputs this into trained machine learning model (XGBoost) to generate a map of forest cover at risk.
He also provided many open-source resources for students to take advantage of including wwfclimatecrowd.org, planetbaseddiets.panda.org/impacts-action-calculator, eatforum.org.

Alex Pompe, a Research Manager on Meta’s Data for Good team (dataforgood.facebook.com) presented about a dataset on social connectedness index.

Using a mathematical method that takes in friendship ties as well as total users in 2 different countries, this dataset was able to generate maps that indicated social network distributions all over the world.
For example, this is a generated map that shows social connectedness in SF to the world. As we see here, the darkest parts of the map lie around the area of California, meaning people in SF have the strongest ties to other people in California.

Credit: Data for Good at Meta

Malavika Kurup (Project Operations) and Akshay Juleemun (Marketing) from Delta Analytics presented about how Delta connects NPOs to data fellows who want to give back.

Delta Analytics is a Bay Area nonprofit connecting US-based data professionals that are looking for volunteer opportunities with domestic and international nonprofits. Their data fellowship program works to bridge the nonprofit skill gap by enabling nonprofits to accelerate their impact through leveraging skilled data professionals on project opportunities.
They dived into a case study of their past project with TeenSmart International, a not-for-profit organization based in Costa Rica that promotes healthy lifestyles and self-leadership in teens through free online services aiming to prevent and/or reduce six high-risk behaviors. In 2021, The Delta team built a dashboard to help TSI understand their customer base across platforms and their chatbot’s effectiveness at delivering teens to their website. In 2022, they continued to build models to predict some of the high-risk behavior using machine learning techniques and logistic regression. This year in 2023, the team is working on building the data pipeline to deploy the models on AWS and will build a dashboard to report out individual predictions.

Dan Hammer, a Managing Partner at Earthrise Media and Adjunct Professor at UC Berkeley, gave a presentation on data science and the environment.

He opened with a quote from a June 2022 discussion paper from TNFD: “There is no shortage of nature-related data but rather a lack of understanding about how data can be used to derive information that is decision-relevant to end users.”
Dan dived into various case studies such as the Global Forest Watch, Global Plastic Watch, and the Amazon Mining Watch.
Overall, he provided 4 main takeaways about environmental tech — 1) EVERYTHING IS CONNECTED, 2) NATURE IS HARD TO MEASURE, 3) NATURE IS HARDER TO VALUE, 4) NATURE IS PLACE-BASED.

The last event was a speaker panel

Panelists: Melis Akman (Research Scientist at Sound Agriculture), Shervin Bastami (Laboratory Operations Manager at Sound Agriculture), Yuyang Zhong (Program Manager at Coding It Forward), John Rademaker (CEO of homelessness NPO Ample Labs, Startup Coach for Social Impact).
Panelists came from a diverse background, from agriculture to civic tech to startups to basic needs.
Some questions included “What challenge is tech for social good currently making the most progress towards solving and farthest to solving?” and “How does the material you learnt in classes differ from what you are currently doing in industry and how did you fill those gaps?” and “How do you think recent advances such as ChatGPT has and will impact our studies going forward?”

Credit: Marlon Fu

We thank every single one of our speakers and panelists for take time out of their Sunday to share their industry insights on data science and social impact. In addition to shedding light on existing societal and environmental issues, the forum also facilitated discussion on future steps and tangible solutions to be executed by data enthusiasts — students, working professionals, and teachers alike.

Key Points:

To show that the tech for social impact intersection comes in a spectrum, speakers for the forum came from a wide variety of different industries — NPO, big tech, startup, research, government, and so on.
Each industry has different tradeoffs, and everyone can take advantage of the industry that best suits their needs & desires. For example, because initiatives by NPOs are wholly impact-motivated (instead of money-motivated), changes are usually more genuine, but this also means there may be less monetary compensation to their workers.
Tech for social impact and data ethics is important for everyone. As data scientists, we are leading decision making with our work — so it is vital that we hold societal knowledge that we can infer from in order to identify biases and ethics.

Further Questions to Consider:

What factors do you consider when finding trying to pursue a career in tech for good? Which industry fulfills these factors best?
What societal issues do you care about and why? How are you giving back to this issue now and/or plan to in the future?

Further Resources to Look Into:

What can you do?

Learn more about the different tradeoffs!
Get a free subscription to the above news platforms with your Berkeley account!
Check out Berkeley’s updates/news at…
Articles + Resources: guides.lib.berkeley.edu/CEO
STEM: news.berkeley.edu/category/research/technology_engineering/
DEI: nature.berkeley.edu/diversity-equity-and-inclusion/news

A class guide and tips to the Data Science Major at Berkeley

Data Science Society at Berkeley — Tue, 11 Apr 2023 18:53:16 GMT

By: Sandya Wijaya, Lillian Jiang, Medha Iyer

Disclaimer: The data science curriculum and class structures may change in the future. This information is relevant as of Spring 2023, the time this article was written and published.

As class enrollment is coming up and new Cal bears have just been admitted (congrats baby bears!), Data Science Society at Berkeley wanted to curate a go-to document to guide students of all levels in their pursuit of majoring in Data Science at UC Berkeley. We compiled insight from our club members on their recommended classes, honest tips & tricks, and resources they found to be most helpful. Please keep in mind that not everyone will have the same experiences in these classes, but here’s ours!

We will go over tips & tricks to succeed in the required classes to declare — DATA C8, DATA 100, CS 61A, CS 61B, MATH 54. Then, we will go over possible class options for each of the upper div requirements — CS 188, CS 189, DATA 102, INFO 159, DATA 144, DATA 140, IND ENG 142, EECS 126, STAT 134, and more!

Photo: Jill Hodges

Overview

The Data Science major program is designed to provide integrative course experiences in the lower division and upper division, as well as the technical depth in computation and inference required for students to engage in data science upon graduation.

On a high-level overview: The Requirements include -

Tips & Tricks on the required classes to declare

DATA 8 & DATA 100

DATA 8 is known to be a great introduction to the foundations of data science and coding. A lot of non-tech majors take the class as part of their own major requirements, so you’ll see a lot of students from all disciplines like environmental majors, policy majors, and so on.
Because DATA 8 uses a Berkeley-specific library, some specific rules may be a bit of a step away from real-world data science in an effort to make it more beginner-friendly. However, you will be able to learn something more accurate to real-world data science in DATA 100, which teaches you the pandas library used in industry and the data science lifecycle.
For both of these classes, be sure to make the most of your discussions! They are key to build upon the things you learn in lecture.
Look to Data Scholars Program for support as needed! Data Scholars is a close-knit program that provides mentorship, smaller class sizes, and increased support for underrepresented communities in the field.

CS 61A & CS 61B

Start projects with other people so that you know you’re on the right track and that you’re understanding what you have to do. The projects become hard when you realize that your structure was wrong.
Start early on projects! Do not underestimate how long debugging can take. Learning how to use the debugger can be a bit of a learning curve, but it’s very helpful in identifying errors in your code and implementation, which will save you lots of time in the long run.
Class office hours can get really full really quickly — especially on the week that the project is due — and a lot of students have expressed concerns about how queues are always unbelievably long (going up to 3 hours).

Here are some alternative places you can look for help and reviews that are usually significantly less full:

Computer Science Mentors (CSM) offers small group tutoring for the lower division CS classes. Each section has 4–6 students and that meets weekly for one hour. “I’ve personally attended almost all my CSM sections for all my lower division CS classes and found it extremely useful!” — Lillian
HKN has drop in tutoring sessions from 11am-5pm on the weekdays for most/all CS and EE classes. The tutoring schedule on their website is very helpful for seeing which tutors are available for which classes. They also host review sessions before some CS exams, which are advertised in the EdStem for the classes that you take. Additionally, they host their tutoring sessions at Cory 290, which is usually close to other CS related events such as class office hours — this allows you to go back and forth; if one fails, another option is close by!
Center for Access to Engineering Excellence (CAEE) offers drop-in group tutoring for various lower and upper-division engineering courses in 227 and 240 Bechtel Engineering Center and you can see the full schedule on their website. Tutors are past students of the class, and the space provides snacks and hot drinks as well so you can stay for a long time. “CAEE carried me through CS 61B! What I would do during project weeks is go to Bechtel every time I would work on my projects so that I can easily go to a CAEE tutor whenever, then only go to the crowded class office hours when I have a question CAEE cannot answer. This saved me a LOT of time!”— Sandya
CS Scholars is close-knit first-year student support program that provides smaller class sizes and guidance for underrepresented communities in the field. They also offer exclusive seminars and speaker series with appearances and mentorship from people like Professor John DeNero! “I participated in CS Scholars during my freshman year at Berkeley! The small class sizes and kind cohort helped me acclimate to Berkeley’s academic culture and create long-lasting friendships within the CS community!” — Medha

Math 54

Check out the Student Learning Center! The SLC hosts drop in tutoring sessions in the SLC Atrium at the César E. Chávez Student Center.
Consider taking the Math 54 adjunct course. This is a 1 unit PNP course which offers additional review and support for Math 54.
Go to discussions. Besides homework questions, there’s not many opportunities for you to test your knowledge and understanding of the math concepts. Discussion provides a great way for you to practice questions, identify any gaps in your knowledge, and meet other students.
If you want some more conceptual understanding, 3Blue1Brown has some YouTube “Essence of linear algebra” series, which uses animations to explain the geometric intuitions underlying linear algebra: https://www.youtube.com/watch?v=kjBOesZCoqc&list=PL0-GT3co4r2y2YErbmuJw2L5tW4Ew2O5B&ab_channel=3Blue1Brown

What upper divs are recommended?

Probability

DATA 140 (Probability for Data Science): Known among students to be one of the harder choices for this requirement but also most rewarding and relevant one to a career in data science!
EECS 126 (Probability and Random Processes): Hardest choice for this requirement and usually the second most popular choice among students, especially for CS + DS double majors as it fulfills a major requirement for both majors
STAT 134 (Concepts of Probability): More statistics focused — a lot of the people in the class are Statistics or Applied Math majors. Discussion will be your biggest help in this class!

Computational & Inferential Depth

DATA 144 (Data Mining and Analytics): Covers more of what you’ll use in industry — Linear/logistic regression, random forests, k-means, neural networks, etc. It’s also known to be a fairly easy A!
CS 188 (Introduction to Artificial Intelligence): Interesting introduction to artificial intelligence principles and their applications to fields like data science and probability-based problem-solving, but it is not exactly necessary for the data industry. It has a fairly lighter workload (weekly 2-part guess-and-check and challenge problem sets) for an upper-div. Projects and exams are very pattern recognition-oriented with a gamified approach; optimizing a Pacman game over the course of the semester.
INFO 159 (Natural Language Processing): Teaches language methods used for analyzing text in computational systems. It is a mix of coding and linguistics related information, so it is pretty interdisciplinary! Some students say the HWs can be pretty ‘shallow’; you don’t get as hands-on work as you’d hope.

Modeling, Learning, and Decision-Making

COMPSCI 189 (Introduction to Machine Learning): Hardest class for this requirement. You get to learn about how ML models (the ones you learn in DATA100 etc.) work under the hood, and understanding these inner workings will allow you to design better models/inputs! This will likely include a lot of rigorous math, so refresh on your Linear Algebra concepts beforehand! Beyond the math, there’s also quite a bit of coding as you’d be implementing a lot of the models from scratch. If you’re trying to go into ML Engineering or explore similar topics in research/grad school, this class is probably better for you than Data 102.
DATA 102 (Data, Inference, and Decisions): Combination of statistics, probability, and basic reinforcement learning. It focuses more on Bayesian modeling, interpreting results, and setting up hypothesis tests and models. The labs and projects are more similar to what you would do as a data scientist than CS 189, and you also don’t need as much of a math background for this class. Our members say it’s possible to get by with Data 100 and half of Data 140 knowledge.
IND ENG 142 (Introduction to Machine Learning & Data Analytics): Known to be the easiest class for this requirement but hardest class to enroll in. Doesn’t require as much math background and the concepts are pretty similar to Data 100. Assignments-wise, there are only 5 homeworks which can be a little long, and are more application-based (focus on how to apply specific algorithms and models) rather than the models itself.

Human Contexts and Ethics

Students will often use this requirement as an opportunity to explore something new they can apply data science to! There are many options such as urban design (CY PLAN 101. Introduction to Urban Data Analytics).
But if you want a class that is strictly data-related, DATA 104 is your best pick!

Key Points:

While fulfilling your requirements in each category, choose classes that interest you and allow you to explore your limits and challenge yourself.
Keep up with content, discussions, homeworks, and projects and don’t fall behind because catching up in technical classes can become a slippery slope!
Leverage Berkeley’s resources and mentorship opportunities to navigate courses and difficult topics.

Further Questions to Consider:

Are you considering a double major? What other majors could complement your data science major based on the fields you are interested in pursuing?
How should you map out major declaration and requirements if you plan on studying abroad?
What applications do the above classes have to some of your current hobbies or passions? Are there certain classes that intrigue you more than others? How might this reflect in your major domain emphasis?

Further Resources to Look Into:

Berkeley Academic Guide: https://classes.berkeley.edu/
BerkeleyTime: https://berkeleytime.com/catalog
Computer Science Mentors: https://csmentors.berkeley.edu/#/
Student Learning Center: https://slc.berkeley.edu/programs
Berkeley Reddit (for more course enrollment questions/tips!): https://www.reddit.com/r/berkeley/
Data Science advisors are also available to help! Email them at ds-advising@berkeley.edu, or find out about our other advising services.
Computing, Data Science, and Society: Data Science Resources

What can you do?

Get a free subscription to the above news platforms with your Berkeley account!
Check out Berkeley’s updates/news at…
Articles + Resources: guides.lib.berkeley.edu/CEO
STEM: news.berkeley.edu/category/research/technology_engineering/
DEI: nature.berkeley.edu/diversity-equity-and-inclusion/news

A class guide and tips to the Data Science Major at Berkeley was originally published in Data Science Society at Berkeley on Medium, where people are continuing the conversation by highlighting and responding to this story.

Using Machine Learning to Predict Weather

Data Science Society at Berkeley — Tue, 04 Apr 2023 00:05:48 GMT

By: Sandya Wijaya

Researchers at the University of California, Irvine, have recently developed a machine learning model called CSU-MLP which uses deep learning algorithms and satellite data to predict severe weather events like tornadoes and hail, with up to eight days in advance.

The machine learning model was trained on approximately nine years of weather observations across the continental U.S. To enhance its performance, the researchers incorporated meteorological retrospective forecasts, which involve creating “re-forecasts” from past weather events. The scientists extracted environmental factors from these model forecasts and linked them to previous instances of severe weather to identify atmospheric patterns and make predictions.

As a result, the machine learning model is capable of generating real-time predictions with a lead time of four to eight days for these hazardous weather conditions based on current environmental factors like temperature and wind. This model can be used to forecast the probability of severe weather events during a given period, providing critical information to help communities prepare and mitigate the impacts of these extreme weather conditions.

Research scientist Aaron Hill presents the CSU-MLP to forecasters at the Storm Prediction Center. Credit: Provided/Aaron Hill

The new model has several benefits, including providing early warnings for severe weather conditions that can help individuals and property prepare and take necessary precautions. This feature can be especially valuable for people living in regions that are prone to extreme weather conditions, like tornadoes and hailstorms. Additionally, the model provides a new approach for researchers to study the formation of severe weather events and understand how weather patterns evolve and change over time.

Overall, the development of this machine learning model represents a significant advancement in predicting severe weather conditions, which can potentially save lives and prevent property damage. The model’s ability to make accurate predictions for weather patterns with more extended lead times could be a valuable resource for meteorologists and researchers studying weather phenomena.

Key Points:

Researchers at the University of California, Irvine, have recently developed a novel machine learning model which has the ability to predict severe weather events with a high level of accuracy up to eight days in advance.
Such development is beneficial as it provides early responses that allow people to go into protection faster, and also generates new and more detailed insights.

Further Questions to Consider:

What datasets and techniques can be deployed to further refine this model to provide more precise predictions for severe weather conditions such as tornadoes and hail?
How can this model be integrated with existing weather prediction systems and networks, and what are the costs associated with implementing such a system?
How can this model adapt to the different weather conditions in different countries in order to allow this solution to be implemented worldwide?

Further Resources to Look Into:

Full article: https://scitechdaily.com/new-machine-learning-model-can-accurately-predict-events-like-tornadoes-and-hail-eight-days-in-advance/
Using Data Analytics for Weather Forecasting: https://www.nobledesktop.com/classes-near-me/blog/data-analytics-for-weather-forecasting#:~:text=Weather forecasting models are powerful,surrounding road conditions are collected.
More Data x Weather articles: https://towardsdatascience.com/tagged/weather

What can you do?

Get a free subscription to the above news platforms with your Berkeley account!
Check out Berkeley’s updates/news at…
Articles + Resources: guides.lib.berkeley.edu/CEO
STEM: news.berkeley.edu/category/research/technology_engineering/
DEI: nature.berkeley.edu/diversity-equity-and-inclusion/news

What the Silicon Valley Bank collapse means for tech-science startups

Data Science Society at Berkeley — Tue, 21 Mar 2023 02:44:00 GMT

By: Vennila Annamalai, Sandya Wijaya

On March 10, Silicon Valley Bank (SVB) collapsed after it was revealed that it needed $2 billion to cover debts due to rising interest rates. This caused several large venture-capital firms to advise their clients to withdraw funds, leaving many technology start-ups, especially those focused on green energy and biotech, concerned about future investment opportunities. While the US government announced on March 12 that it would guarantee deposits with the bank, some start-up CEOs do not believe this solution is effective for securing long-term investment in small businesses. Ethan Cohen-Cole of Capture6 — a clean-technology start-up in Berkeley, California developing ways of capturing carbon dioxide from the air — anticipates that investors will become less interested in investing in small companies, which will ultimately impact small start-ups that work on climate solutions.

Credit: New York Post

Also on March 10, the Bank of England annonced that SVB’s UK arm would go into liquidation, causing catastrophic losses for customers. However, a weekend of lobbying by tech leaders led to HSBC bank buying SVB UK for £1 ($1.20), rescuing its operations. Sebastian Weidt, CEO of Universal Quantum, had millions of pounds deposited with SVB UK and experienced a stressful weekend. Samira Ann Qassim, co-founder of Pink Salt Ventures, advises start-ups to hold accounts with different banks to avoid such situations. Aileen Ryan, CEO of Preoptima, plans to bank with SVB UK under its new ownership but intends to spread funds across multiple banks in the future.

Matt Lilley, President of Hult Business School in London, suggests that the collapse of SVB is a reflection of the wider financial environment’s challenges, citing rising interest rates as the main issue. He predicts that even without SVB, US decarbonization start-ups will continue to attract investment due to the US government’s Inflation Reduction Act. However, climate-tech entrepreneurs like Cohen-Cole are nervous about the potential retrenchment in lending, although he is confident that other capital providers will quickly replace any SVB lending.

Key Points:

Silicon Valley Bank collapsed due to rising interest rates. This left many tech start-ups concerned about future investment opportunities.
Governments can provide financial support to businesses in times of crisis. The US government announced it would guarantee deposits with the bank.
Matt Lilley, President of Hult Business School in London suggests that the collapse of SVB is a reflection of the wider financial environment.

Further Questions to Consider:

How can business protect themselves from the risks associated with rising interest rates?
What measures can governments take to ensure the financial stability of businesses?
How can entrepreneurs prepare for potential financial instability?
What role can technology play in creating a more secure financial future?

Further Resources to Look Into:

Original article: https://www.nature.com/articles/d41586-023-00778-8
Bank closures tracking website: https://www.fdic.gov/bank/historical/bank/bfb2023.html

What can you do?

Get a free subscription to the above news platforms with your Berkeley account!
Check out Berkeley’s updates/news at…
Articles + Resources: guides.lib.berkeley.edu/CEO
STEM: news.berkeley.edu/category/research/technology_engineering/
DEI: nature.berkeley.edu/diversity-equity-and-inclusion/news