Inspiration

Studies [1] show that users often exhibit a position bias when looking up information on search engines [2,3]. Research also indicates that when documents that express a particular viewpoint are systematically ranked higher than other documents, users tend to adopt this more prominent viewpoint [4,5]. These effects couple together to preclude many crucial but underrated views from ever showing up on the first page of search engines. This can lead to added bias and misinformation when exploring new and potentially sensitive topics. We aim to alleviate this disparity by utilizing Google’s robust LLM Gemini Pro to seek information on the web focusing on a plethora of meaningful perspectives and consolidating the different views into a crisp yet coherent and source-backed summary.

What it does

fishinformation is a search tool that aims to provide a more equitable search experience to users by enriching its responses with the diverse yet pivotal perspectives or schools of thought that may surround the searched issue or topic. By actively expanding upon your search query, fishinformation attempts to identify different and potentially opposing views on topics, gathers meaningful information from Google's vast array of indexed websites, articles, and platform posts to substantiate each perspective, and finally provides a carefully curated response (fit with sources for reference) that captures a representative snapshot of the Web's opinions on your question.

How we built it

The pipeline for fishinformation augments a regular Google search by first diversifying the search step to capture varying perspectives on the search topic, and then by sharing an easy-to-read summary of the perspectives:

  1. Diversifying the search
    1. We use the Gemini API to prompt the Gemini Pro model to generate alternate search queries from the user’s original query. The model is instructed to preserve the core topic of the search and list out different variants of the original query that favor specific perspectives on the topic.
    2. A context-specific selection of perspectives is generated by the Gemini model. For example: To understand the impact of certain foods on one’s overall and long-term health, Gemini suggests reviewing the discourse on the topic from the following perspectives: nutritional, environmental, global & cultural, and parental.
  2. Gathering information on perspectives
    1. The perspective-specific search queries are then used to retrieve links to relevant articles, documents, or blog posts using the Google Search API.
    2. The corresponding content is then crawled and added to the list of supporting material for each perspective.
  3. Consolidating a representative response
    1. The final response must be balanced, representative of the informed opinions on the internet, and still manage to be concise and easy to read.
    2. The relevant content from each article is summarized using an open-source summarizer.
    3. All summarised material pertaining to the different perspectives is then presented to the Gemini model, which is then instructed to answer the user’s original question based on the diverse set of informed opinions on the matter and generate a balanced, crisp, and easy-to-follow response for the user.

Challenges we ran into

  • Forming the problem: the conception of fishinformation and making Gemini understand our idea was one of the core problems that we solved as we developed with Gemini.

  • API Limits & Latency: As our tool accesses the Gemini and Google Search API, it was necessary to manage the number and nature of queries made per search. We iteratively reorganized our pipeline to ensure the number of API calls made was minimized, and the length of the requested content from the Gemini model was economically adjusted for efficient management of compute as well as latency.

  • I/O: Large language models are capable of performing impressive reasoning and editing tasks on textual content. Is is, however, non-trivial to specify strict formatting guidelines to extract relevant information for the generated text. With a few iterations of engineering the optimal prompt, the outputs from the Gemini model could be smoothly processed to fetch the relevant fields and text.

Accomplishments that we're proud of

  • The road to an equitable, informative, and robust search tool is long and involves companies striving towards this goal, as well as people from across the globe, to take an active part in the curation of such tools to ensure their fair access and a healthy impact on our community. We, in our team, feel that through this project, we have contributed a step in the right direction by leveraging today’s large language models to augment our ability to identify and better represent all viewpoints on important discourses on the internet.

What we learned

  • By investing ourselves in the background research, data, and pipeline of this project, we realized the impact of search results, their apparent order of display, and how minor changes in such a heavily used tool can bring about significant deviations in the quality and balance of responses to queries.

  • As budding engineers and researchers, this project proved to be extremely nourishing. We got to pool together tools like summarizers, Gradio for the UI, and the robust APIs for Gemini and Google Search within a pipeline, and implement solutions for a problem that needs careful yet urgent tending to.

What's next for fishinformation

  • Fine-grained perspective search: The Google Search API makes available a wide net of relevant articles for each query. Currently, we categorize these articles by model (Gemini) generated perspectives. Using clustering approaches to identify close-knit articles representing perspectives organically can further improve the representative behavior of our search.

  • Quicker end-to-end pipeline: currently researching and summarising takes a really long time. This has quick fixes that can make the end-to-end process faster, this is the next big step for fishinformation.

  • A more research-informed perspective binning: We need to incorporate academic ideas in inclusivity etc. in a formal way while we engineer prompts next.

References

[1] Tim Draws, Nava Tintarev, Ujwal Gadiraju, Alessandro Bozzon, and Benjamin Timmermans. 2021. This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affect User Attitudes on Debated Topics. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 295–305. https://doi.org/10.1145/3404835.3462851

[2] Ahmed Allam, Peter Johannes Schulz, and Kent Nakamoto. 2014. The impact of search engine selection and sorting criteria on vaccination beliefs and attitudes:Two experiments manipulating google output. Journal of Medical Internet Research 16, 4 (2014), e100. https://doi.org/10.2196/jmir.2642

[3] Robert Epstein and Ronald E. Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences of the United States of America 112, 33 (2015), E4512–E4521. https://doi.org/10.1073/pnas.1419828112

[4] Frances A. Pogacar, Amira Ghenai, Mark D. Smucker, and Charles L.A. Clarke. 2017. The Positive and Negative Influence of Search Results on People’s Decisions about the Efficacy of Medical Treatments. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’17). Association for Computing Machinery, New York, NY, USA, 209–216. https://doi.org/10.1145/3121050.3121074

[5] Ryen W. White and Eric Horvitz. 2014. Belief dynamics in web search. Journal of the Association for Information Science and Technology 65, 11 (2014), 2165–2178. https://doi.org/10.1002/asi.23128

Built With

Share this project:

Updates