image after adding sunglasses
starting point

illus.ai

Introducing our startup: illus.ai ~ Image editing made consistent, as easy as a conversation No more inpainting and writing complex prompts yourself. From photos to illustrations, modify any image, through chatbots.

Inspiration

In today’s digital age, visual content is king, but creating and modifying it remains a cumbersome and technically demanding task, particularly when fine details or complex modifications are needed. The existing solutions like stable diffusion models offer some respite, but they falter when tasked with precise, efficient, and versatile image editing. For instance, when using models integrated with platforms like ChatGPT for image inpainting, users often find that the modifications are inconsistent, especially with background retention.

This led us to identify a significant gap in the market: the high cost and time investment required when outsourcing image editing to professional designers, and the technical barrier for everyday users needing advanced image modifications.

Problem: Inconsistent Editing by Existing AI: Current AI systems, including those used for inpainting in ChatGPT, often produce inconsistent results, failing to maintain background continuity or apply nuanced changes effectively. Solution: illus.ai leverages LEDITS++, an advanced model that ensures precise edits without altering the image's background, significantly improving consistency.

Problem: High Resource Consumption for Artists: Artists often have to redo entire sections of their work to adjust or visualize different outcomes, which can be resource-intensive. Solution: With ControlNet, artists can input basic sketches and receive fully rendered image suggestions based on their prompts, streamlining the creative process.

Our Vision

We aim to revolutionize the design industry by replacing traditional design agencies with our AI-driven solutions, making high-quality design work accessible and affordable for everyone. Our technology enables users to perform complex image edits through simple conversational prompts, democratizing design capabilities without the need for expensive software or professional skills.

What it does

illus.ai allows any user to specify image changes conversationally. Our system intelligently determines the appropriate model for the task—whether that’s adjusting an image’s element or completely transforming styles—and executes the modifications without user intervention in the design process.

How we built it

Our development of illus.ai centered on creating a robust, scalable, and responsive full-stack web application, utilizing Reflex for an integrated frontend and backend, which is written entirely in Python. This architecture choice streamlines our development process, enabling us to implement real-time features and updates seamlessly across both client and server sides.

On the backend, we leveraged Flask as a lightweight framework to handle HTTP requests and manage interactions between our frontend and the deployed models on Intel servers. This setup ensures that our application remains efficient and responsive under various operational loads.

For the core functionality of image modification, we incorporated OpenAI's GPT-4o to interpret user inputs and generate precise editing instructions. GPT-4o's capability to distill complex verbal descriptions into actionable editing commands is pivotal. For instance, when a user describes wanting to transform a "tennis ball into a tomato," GPT-4o processes this input to output structured commands like "Remove: Tennis Ball, Add: Tomato." These commands are then parsed and served as directives to our image processing models.

The image processing itself is powered by Hugging Face’s implementation of the Stable Diffusion model, enhanced by our custom modification, LEDITS++. LEDITS++ introduces a novel masking layer within the diffusion process, which meticulously preserves the original image's background while implementing the desired edits. This approach significantly enhances the fidelity and consistency of the edits compared to traditional methods.

To address the challenges posed by specialized image contexts, such as animated illustrations where typical models underperform, we integrated a fine-tuning mechanism using the LoRA adapter. This adapter fine-tunes the last layers of our model specifically for the task at hand without the need to retrain the entire network, allowing for high adaptability and maintaining model performance even in niche applications.

Through these technologies, illus.ai combines advanced machine learning techniques and modern web development practices to deliver a powerful tool for image editing that is both accessible to non-technical users and powerful enough to handle complex, professional-grade design tasks.

For character generation and more complex image transformations, we incorporated ControlNet, a state-of-the-art model that excels in creating detailed and contextually appropriate images from basic inputs, such as stick figures or sparse sketches. This technology allows users to visualize complex scenes and character designs rapidly, making it an invaluable tool for artists and designers looking to iterate quickly on visual concepts.

Intel Bare Metal Servers To handle the computationally intensive tasks associated with running advanced diffusion models and AI systems, we deployed our application on Intel Bare Metal Servers. These servers provide the necessary hardware acceleration to ensure that our image processing tasks are performed efficiently and without delay. The use of bare metal infrastructure ensures that we can maximize the performance of our AI models by leveraging the full capabilities of the underlying hardware without the overhead associated with virtualized environments.

IDC JupyterLab from Intel For development and testing, we utilized IDC JupyterLab provided by Intel, which offered a robust, scalable, and flexible environment for experimenting with new models and algorithms. This platform allowed our developers to access powerful computing resources on-demand, streamline the iterative testing of our AI models, and refine our algorithms with high efficiency. JupyterLab’s integration into our workflow facilitated a smooth transition from development to production, ensuring that all components of illus.ai were thoroughly tested and optimized.

Challenges we ran into

Hardware Limitations: The high computational demands of diffusion models posed significant challenges, exacerbated by the limitations of typical laptop GPUs and some incompatibilities with Intel architecture.

Learning Curve with Reflex: As a new framework, Reflex presented a steep learning curve compared to more established frameworks like React, requiring us to pioneer many solutions independently.

Accomplishments that we're proud of Robust Chat Application: We've built a sophisticated chat interface that understands and processes user inputs for image editing, simplifying complex AI interactions into user-friendly conversations.

Precision and Accessibility: Our application not only performs edits with high precision but also makes advanced design tools accessible to non-designers, significantly lowering the barrier to creating professional-grade visual content.

Accomplishments that we're proud of

How far we've come! From a 1am slack call that grew into a 4-man team.

Scattered across the globe, worked together to create something incredible. We were super passionate and determined, and even though we faced some tough challenges, we kept going. And now, finally, our project is done, and it's a testament to our hard work and creativity. We were so lucky to be able to work with such talented people. It was an amazing experience, and we can't wait to do it again sometime.

What we learned

Our journey with illus.ai has been immensely educational, from mastering the intricacies of Reflex and stable diffusion models to optimizing our application for diverse hardware. We’ve gained insights into effective team communication and project management, especially in a high-stakes, innovation-driven environment.

What's next for us

As we prepare for our launch on Product Hunt, we are focused on refining our models based on community feedback and expanding our backend capabilities to support continuous operation. Future enhancements will include:

Advanced Instruction Parsing: Improving our AI’s ability to interpret complex editing instructions from conversational input.

Expanded Model Support: Incorporating a broader range of models to cater to specialized design needs like anime or architectural visualization.

Integration and Acquisition: Aiming for integration into major design platforms and potentially being acquired by industry leaders like Adobe.

By addressing these areas, illus.ai plans to cement its place as a pivotal tool in the design industry, transforming how professionals and casual users alike create and modify images.