Inspiration
Evan inspired me with a very common problem that I've had to deal with before: A data harmonization problem. I felt that a solution to this problem would not only help projects in the open earth foundation but also other projects since data harmonization is common and painful!
What it does
The harmonize script takes the file in one folder, explains the schema of it to GPT with some example rows, explains the schema of what it would look like processed, and then generates the code that would convert the data in the raw schema into data in the processed schema.
It always generates runnable python code and runs it and writes the output csv into the current file.
How we built it
Here I just used plain GPT3.5 from OpenAI with text prompting.
Challenges we ran into
We wanted to also feed in the sql schema into the prompt, but were worried that would make the prompt too big. So we tried out llama index but it ended in package dependency problems which we weren't able to solve by the end of the hackathon.
Accomplishments that we're proud of
Just the fact that the simple MVP for the customer request actually works! :)
What we learned
I learned a general strategy for using embeddings in AI problems.
What's next for Data harmonizer
Hopefully add more to the MVP by incorporating the SQL schema making it more acurate.
Built With
- gpt
Log in or sign up for Devpost to join the conversation.