Stories by Ruben Melkonian on Medium

LLM Deprecation and Migration Strategy: How to Adapt to Rising AI Prices

Ruben Melkonian — Thu, 30 Apr 2026 13:00:45 GMT

Practical frameworks for navigating LLM retirement and minimizing engineering overhead

Model retirement is a structural reality of the AI market, not a rare operational event. OpenAI (GPT), Anthropic (Claude), Google (Gemini), and other LLM providers frequently deprecate specific API versions in favor of newer models.

The biggest mistake is treating these models as permanent commodities. In a production-ready system, prompt logic and output stability are often optimized for the unique behaviors of a specific model.

Therefore, when a provider decides to retire a model, it forces an unexpected migration, even if your system is performing flawlessly and generating revenue. Losing a foundational model triggers a mandatory cycle of regression testing and recalibration for a new one. Building for reliability requires an explicit strategy, because the impact of these forced migrations is multidimensional:

Financial — direct price increases compound at scale. Even modest percentage shifts materially affect high-volume systems.
Operational — migrations consume engineering capacity that could otherwise drive product growth.
Technical — nondeterministic outputs make regression validation complex, especially for generation-heavy systems.
Strategic — heavy dependence on a single large model increases vendor lock-in and reduces negotiation leverage.

In this article, we will go through the effective practices from a personal experience that can help you survive a provider’s retirement cycle while maintaining system stability.

Why LLM Deprecation Is a “Hot” Topic for Engineers

From a technical perspective, the frustration is simple: systems work until they are forced to change.

In a traditional AI solution, when you train a model and deploy it to production, it doesn’t end here. You constantly monitor it, track quality metrics, and retrain only if user behavior shifts or performance degrades. There is no reason to replace a stable, revenue-generating model.

When you train your own model, performance degradation often occurs due to data drift. For example, a custom computer vision model might fail because of the adjusted camera angle or resolution. The model didn’t degrade over time for no reason; the real-world data distribution shifted. You control the retraining process. But with API providers, the dynamic is entirely different.

When you rely on API-based LLM providers, the decision is no longer yours. Even if a model fully satisfies your requirements, the provider may deprecate it for internal reasons — cost, infrastructure load, strategic positioning. They remove the model not because it degraded for you, but because it no longer makes sense for them commercially.

You can have a healthy production environment where everything behaves predictably and still be forced to migrate. That’s why this topic generates tension inside engineering teams: the problem comes from outside the system.

These external pressures display as three primary categories of business and technical risk that every team must account for.

Business and Technical Problems Caused by Model Deprecation

1. Direct Cost Increases

Even a 5–10% price increase becomes significant at scale. A chatbot handling 10,000–20,000 requests per day amplifies every marginal change. When the difference reaches 50% or more, it reshapes the product’s entire cost model.

Providers sometimes recommend a “closest replacement,” but that model may sit in a higher pricing tier. The market has shifted from competing on price and speed to competing on quality, with noticeably higher costs.

Market dynamics have also shifted. Earlier competition emphasized speed and aggressive pricing. Today, providers compete primarily on quality improvements — often introducing newer models at noticeably higher price points. In some generational transitions, price differences can approach 2x. For instance, generational leaps in models like Gemini can nearly double pricing. The harsh reality of vendor dependency is simple: you just have to accept it, or change the vendor. However, changing vendors isn’t the only way to protect your margins. You can also implement a cost-saving architecture to minimize expenses without sacrificing quality.

2. Migration Costs & Efforts

Changing an endpoint is trivial — you just change the URL. Adapting behavior is not.

The majority of time in migration is spent not on switching APIs but on:

Prompt adaptation
Regression testing
Structural validation of outputs
Fixing edge cases

In large systems, subtle changes in output structure may go unnoticed initially. The system technically works, but formatting differences or minor behavioral shifts may break downstream logic.

From personal experience, roughly 80% of the migration effort is spent on testing and refining prompts based on quality checks. And unlike classical software systems, LLMs are nondeterministic. You cannot rely on binary input-output comparison. Two valid answers may differ in wording, but both are acceptable, which complicates automation.

Figure 1. Migration Effort Breakdown. Image by Author

3. Business Risk

Even with provider guidance, a new model always behaves differently. Migration introduces uncertainty:

Response structure may shift
Tone or reasoning patterns may change
Edge-case handling may differ

These risks may be inherent to the current market, but they don’t have to be a disaster. The following three pillars form the foundation of a migration-ready strategy.

Testing and Migration Strategy

A structured approach reduces risk but does not eliminate it.

1. Maintain a Regression Dataset

Every production system should have a stable set of evaluation examples:

For chatbots: question-expected-answer pairs
For classification: labeled samples
For structured outputs: format validation cases

Every model update should be validated against this dataset. If quality does not degrade, migration becomes safer.

Classification tasks are relatively easy to validate because you can simply compare predicted labels (e.g., classes 1–5). Generation tasks are significantly harder because outputs are variable by design.

Since standard unit testing (e.g., assert output == expected) fails on generative text, engineering teams must implement specialized evaluation pipelines:

LLM-as-a-Judge: Use a larger, highly capable model to grade the output of the new model against a strict set of criteria. You can ask the judge model: “Does Candidate B contain all the factual information present in Baseline A, without adding hallucinations? Answer Yes/No.”
Semantic Similarity Scoring: Convert the old model’s expected output and the new model’s actual output into vector embeddings. If the cosine similarity score is high, the new generation is semantically acceptable.
Deterministic Guardrails: Evaluate the structure, not the free-form text. Use code-based checks to ensure the model outputs valid JSON or includes mandatory keywords.

2. Design Model-Agnostic Prompts

One practical recommendation: avoid overfitting prompts to a specific model. It occurs when developers lean into the specific quirks of one model (such as Claude’s heavy reliance on XML tags vs. GPT’s preference for Markdown).

For instance: if you are building with a new Gemini model, don’t test it alone. Run a suite of 100 test cases simultaneously across Gemini, GPT, and Claude models, using identical regression tests and adjusting your prompts to minimize differences between the models. Generate a summary table of passes and fails across all models. By doing so, you drastically future-proof your system and reduce future migration time.

The goal is not to fully abstract away differences, but to ensure approximate behavioral stability across providers. This reduces future switching costs.

3. Decompose Complex Tasks

Instead of solving everything in a single large model call, break tasks into smaller steps:

Retrieval ➡ Filtering ➡ Summarization ➡ Translation

Figure 2. Task Decomposition Pipeline. Image by Author

LLM APIs charge for tokens, not the number of calls. Splitting tasks does not significantly change total token usage but allows you to use simpler and cheaper models for subtasks. If you take a heavy task, like finding relevant articles, filtering them, summarizing, and translating — break it into four separate API requests, your total token count remains roughly the same. However, since you can route the simpler filtering and translating steps to much smaller models, your overall cost per token drops significantly.

Your benefits? Lower costs, greater flexibility, easier replacement if one model is deprecated, and access to open-source or self-hosted, cost-effective alternatives, like Llama, Mistral, etc., as well as:

Self-Hosting: You gain absolute permanence by hosting open-weights models on your own infrastructure.
Specialized Hardware APIs: Alternatively, you can use companies, like Groq, that build custom Language Processing Units (LPUs) — silicon chips designed specifically to accelerate language model inference. This allows you to access open-source models via API at blistering speeds (400+ tokens per second) and at a fraction of the cost of flagship proprietary models.

If your architecture depends on one costly “super-model,” you’re basically locked in. If tasks are decomposed, the number of replacement options expands. However, once your architecture is modular, the challenge shifts to selection: how do you identify which model is the right fit for your specific subtasks?

Evaluating Alternatives and Comparing Providers

Public dashboards, like Artificial Analysis, compare model speed, reasoning ability, and pricing. These benchmarks are directionally useful: a top-ranked model will generally outperform a low-ranked one.

You may rely on it as a source that helps to identify which models fall into the same performance tier as the one you are currently using.

However, differences between neighboring models are often marginal and task-specific. If two models have close benchmark scores (e.g., 48 vs 47), the public rankings don’t matter that much. Real-world performance will depend entirely on your specific use case. Benchmarks use neutral tasks; your workload may behave differently. Often, the most effective strategy to cut AI operational costs is to select models proportionate to the task, using smaller, fine-tuned models for domain-specific tasks rather than defaulting to expensive flagship LLMs.

Your strategy is the following:

Use rankings to shortlist candidates.
Always test models on your own regression dataset before integration.

At the same time, actively track model lifecycle announcements. Providers typically publish retirement timelines in advance. To use this information effectively, you should integrate these timelines into a broader strategy for observing market trends and internal quality metrics.

https://medium.com/media/e8a6cd8e74b130db5ad4ea8ab248bf4c/href

Automated Market Monitoring and Switching

Unfortunately, there is no magic pill. What you can do systematically:

Monitor provider lifecycle pages and deprecation timelines.

Be aware that deprecation usually happens in distinct phases: first, new users are blocked from accessing the old endpoint; next, existing users are given a grace period of a few months; finally, it is fully shut down. You are informed in advance, but you still must migrate.

Continuously track quality metrics in production.

Establish comprehensive monitoring, traceability, and observability. Whether your application runs in real-time or processes batches asynchronously, you must log everything, track intermediate outputs (especially in decomposed workflows), and collect quality metrics over time. When quality drops, you have a baseline to investigate. Usually, degradation happens because user behavior changes or the input data shifts — a data drift. By strictly logging production metrics, you can determine whether a drop in quality is due to your users changing their behavior or to your API provider stealthily updating the model behind the scenes.

Maintain regression tests that can be run against multiple models.

Build your CI/CD pipeline so that your automated regression tests continuously route production samples to your fallback models. This ensures that as your prompts naturally evolve over time, your fallback models remain fully compatible, and migrating remains as simple as flipping a configuration switch.

Periodically benchmark comparable models in the same price segment.

Set up a quarterly routine to evaluate new open-source or API models strictly within your current cost-per-token limit. Because providers often push expensive flagship upgrades during deprecation events, maintaining a shortlist of tested replacements is your best strategy to avoid forced budget increases.

Migration becomes stressful when delayed. When multiple models require replacement simultaneously, the workload multiplies quickly. Proactive evaluation and staggered migration reduce pressure and operational risk. Delaying these updates can be devastating. In real-world enterprise systems running upwards of 15 models simultaneously, a forced deprecation event can easily consume 1 to 1.5 human months of purely reactive, non-feature engineering work.

Model Deprecation Is a Structural Reality, Not an Exception

There is no single technical trick that eliminates this risk. What works is discipline in system design:

Assume replacement is inevitable. Build architectures that tolerate switching.
Continuously monitor quality metrics. Degradation detection should be standard practice.
Maintain robust regression datasets. Especially for generative systems.
Decompose complex workflows. Smaller subtasks widen your model choices and reduce cost pressure.
Benchmark alternatives proactively. Don’t wait for deprecation announcements.

The most resilient teams treat model providers as modular infrastructure layers rather than permanent dependencies. In the AI market, long-term stability does not come from choosing the “best” model today. It comes from designing systems that remain stable when that model disappears tomorrow.

LLM Deprecation and Migration Strategy: How to Adapt to Rising AI Prices was originally published in AI Advances on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Hidden Operational Costs of GenAI Products: Lifecycle, Risk, and Long-Term Maintenance

Ruben Melkonian — Fri, 10 Apr 2026 23:01:02 GMT

An analysis of the economics behind Generative AI products, exploring why initial cost estimates fall short and how businesses can plan for sustainable GenAI operations.

The question “How can a text conversation cost this much?” is now being asked more frequently by CTOs and finance teams as AI initiatives move from controlled pilots into full-scale production environments.

The misalignment stems from the fact that GenAI solutions behave differently from known application models, which users expect to have predictable, flat operating costs. Each interaction consumes compute power and storage resources, not to mention the human effort involved. What’s more, to keep the model in good shape, aspects such as quality assurance, regular data training updates, security strengthening, model monitoring, and maintenance are a must and come at a price.

What looks like a neat chatbot UI is, in fact, the tip of an operational iceberg, with not-so-obvious cost drivers beneath the surface. Teams budget for model access and infrastructure, but rarely anticipate how usage, data change, and quality requirements affect expenses in the long term.

To clarify the total GenAI development cost and help companies avoid financial surprises, we examine what happens across the product lifecycle once the GenAI system reaches production.

Illusion of Simplicity in GenAI Products

From the end user perspective, a GenAI solution looks disarmingly simple. What they see is a text box and a response that appears seconds later. Roughly said, it’s a conversation that feels professional, though.

Behind the scenes, that interaction triggers a chain of operations that looks nothing like a standard request-response flow. A single user prompt can involve using API calls, embedding generation, vector database searches, and dynamic context assembly before the model even begins inference.

A similarly multi-step flow continues after a response is generated. It has to pass through validation layers, safety filters, logging systems, and monitoring pipelines. To top it all, each step runs on separate services, often across different regions, and each consumes compute and network resources. The trick is that one action to the user is, in reality, dozens of coordinated operations happening in milliseconds.

This is where many of the hidden costs of AI originate. Not from one expensive component, but from the cumulative effect of many small processes.

Now, the most intriguing part — the economic model. It’s fundamentally new compared to any known enterprise software pricing. What is typically used in the IT services market is charging for provisioned infrastructure, licensing fees, and routine infrastructure. These costs are largely predictable and easy to govern and control.

GenAI systems, in addition to the aforementioned expenses, take into account the usage itself. Therefore, every interaction incurs a variable cost tied to prompt length and response complexity.

This is exactly why GenAI cost models appear sound in early estimates but collapse under real-world scale.

Predictable Cost of AI Organizations Can Plan For

When organizations evaluate generative AI ROI, they consider quantifiable expenses listed in budget proposals and procurement documents. They are understandable yet may still contain peculiarities that catch unprepared companies off guard once the technology is rolled out.

Infrastructure and Compute Resources

When you submit a query, neural networks containing billions of parameters process that input token by token. It’s very computationally expensive. That’s one of the main reasons every surveyed organization by IBM decided to abandon or table at least one GenAI project.

Since standard servers alone are insufficient, modern language models use specialized hardware, namely GPUs (Graphics Processing Units) or TPUs (Tensor Processing Units), to be able to perform mathematical operations in parallel. Even when teams access models through cloud APIs, they directly pay for these massive server farms running continuously in provider data centers.

We can split the cost of AI infrastructure into two categories: model training and inference.

API usage and Model Serving Fees

The most visible line item in generative AI pricing budgets is API charges. Some providers like OpenAI and Anthropic price their services in tokens — units of text processed during input and output. Organizations discover that per-token costs accumulate overwhelmingly quickly and generally exceed initial projections.

Other providers offer tiered pricing that rewards higher usage, as seen in Google’s Gemini offerings. While organizations processing millions of tokens monthly might pay 40-60% less per token than low-volume users, they rarely offset the exponential growth in usage, particularly for consumer-facing applications.

API expenses calculation equation

Storage and Database Systems

GenAI apps also need substantial storage and corresponding infrastructure:

vector databases to store embedded knowledge as mathematical representations;
traditional relational databases to manage user data and conversation history;
big ERP systems to sync operational records and enterprise workflows;
caching mechanisms to improve performance;
backup systems to ensure reliability and disaster recovery.

Vector databases deserve special attention because they charge for query operations on top of common storage usage.

One more interesting nuance is RAG (retrieval-augmented generation) architectures, which companies are eagerly implementing to ground AI responses in internal data in a secure and more efficient way. The crux is that the RAG solutions store every document twice. Once as original text for reference and retrieval, and once as mathematical vectors for semantic search.

Find more on RAG benefits in our recent article From ChatGPT Prompting to Corporate GenAI Solution.

Initial Development and Integration

When organizations opt for generative AI development services, they expect natural development costs coming from:

Specialized talent: ML engineers, full-stack developers, and prompt engineers at a bare minimum.
System development: Depending on the engagement model, it might be a fixed price per project, hourly billing, or a managed delivery model.
Integration with existing systems: Data pipeline construction to extract the necessary information from enterprise systems, proper authentication for security purposes, workflow integration, and API management.
Testing and validation: QA resources, user acceptance testing, security levels, performance testing.
Deployment: Production rollout, monitoring setup, and handover to internal teams, plus employee training if required.

Beneath the Interface: Hidden Costs of Using AI

While organizations carefully budget for infrastructure and API fees, the largest portion of hidden GenAI development costs surfaces only after deployment. The unpleasant part is that these operational expenses exceed visible costs by 200-300%, disrupting ROI projections and planning.

Continuous Data Pipeline

Before your GenAI app delivers the answer, data undergoes multiple transformations. Building and operating that complex data pipeline involves:

Extracting content from dozens of enterprise sources
Normalizing formats
Cleaning inconsistencies and duplications
Breaking documents into usable chunks
Generating embeddings
Indexing those vectors in a retrieval system

But developing a pipeline is hardly the finish line. It’s just the first step in AI product lifecycle management.

Knowledge bases quickly become stale because your company widens its product catalogs or updates its policies. So, all these changes must be synchronized with the model by reprocessing content and regenerating embeddings. Neglect this work and misleading outputs won’t be long in coming.

Enterprise data pipeline loop for GenAI

Returning to the topic of spending, to sustain such data workflows, you have to invest in engineering time, specialized tools, and human-in-the-loop workflows for quality control.

Monitoring and Observability Infrastructure

GenAI solutions demand much deeper visibility than basic uptime and error rates, as is the case with traditional software. It’s important to track every single API call to understand system behavior. And we not only speak of measuring whether it succeeded, but also how many tokens it consumed, how quickly it responded, and the essential output quality.

Then there is also a need to keep a close eye on AI-specific metrics, which is difficult without specialized tools. Those metrics include, but are not limited to:

Hallucination rates
Prompt effectiveness metrics
Cost per interaction
Context window utilization
Model performance drift

High-volume applications, like the AI financial analysis we built, generate terabytes of data each month. Companies have to build dedicated infrastructure separate from production systems to store and analyze such a colossal volume of data.

Quality Assurance and Testing Systems

GenAI outputs are non-deterministic, which means the same prompt can produce different responses. Investment in automated testing is unavoidable for high-usage deployments that require running thousands of test cases and comparing outputs against expected response characteristics.

These tools also automate sample human evaluations and systematic regression testing. However, human oversight remains crucial for handling edge cases and validating consistency. Academic studies on AI QA highlight the same finding: AI implementation costs scale with model complexity and use cases.

What is important to realize is that quality assurance is needed continuously throughout production, not only in the pre-launch phase. Every prompt modification or model update should be tested for regression to prevent a sudden drop in quality. You’ll also want to avoid harmful, biased, or inappropriate responses, so validating safety regularly should be a part of day-to-day operations.

Model Maintenance and Performance Management

AI performance degrades after a while by default, a phenomenon known as model drift. That’s because too many variables, like user behavior, content domains, and others that AI considers when giving a response, evolve or change. So, being able to detect performance drops as early as possible is advantageous yet demanding, calling for user feedback collection and analysis, and comparing benchmarking against baseline performance.

There are several practices teams actively use to combat hallucinations and drift. AI performance monitoring for catching deviations from the norm and repeated model retraining are the keys to success. Careful version control in this context is paramount for controlled recovery in case of unexpected behavior.

Security and Treat Protection

GenAI is unlike any known technology, and no wonder it introduces novel security challenges. Systems should defend against prompt-injection attacks that manipulate AI behavior, jailbreak attempts that coax inappropriate responses, and data-extraction exploits targeting unauthorized access to the company’s training data. Needless to say, common threats remain in play.

Strong security infrastructure, including input filters, output validators, pattern detection, and security system monitoring, is what companies need to shield production models. The arms race between attackers discovering unpatched vulnerabilities and cybersecurity pros building protections never ceases, assuming ongoing investment in security. Particularly, companies must budget for specialized AI security tools, regular penetration testing, and security-focused engineering time.

Content Safety and Moderation

As a rule, every AI output passes through many safety checks before it reaches users. Content moderation systems screen responses for harmful language, bias, personal identifiable information, and hallucinations.

Applied at scale, moderation adds up to latency and compute usage, accompanied by operational overhead that grows linearly with traffic. Automated filters do the initial screening, but it’s human moderators who sample and review flagged output to polish policies and determine new failure patterns. Some projects, like AI copilot for greenhouse operations, demonstrate that effective domain-specific safety requires custom moderation rules in addition to generic filters.

Compliance and Governance Operations

Regulatory frameworks — GDPR in Europe, CCPA in California, AI-specific laws that are gradually taking shape across the globe mandate that organizations embed transparency into systems at the engineering level. The company should be able to trace the data used for generating the response, who accessed it, which model version produced the output, and under what conditions.

Therefore, we see audit trail systems widely used for these purposes. Meanwhile, access control systems allow companies to enforce who can view which information, and retention policies automatically archive or delete data in accordance with regulations. Compliance also entails human processes, such as legal team reviews and AI policy development by governance committees.

Expert Human Teams

Behind every properly working production GenAI system operates a team of diverse specialists:

ML engineers to maintain models
MLOps engineers to manage deployments
Data engineers to sustain pipelines
Data Scientists with AI/LLM expertise to optimize interactions and defend against misuse

Nowadays, having an in-house ML team is a luxury. Given that AI talent demand surpasses supply across key roles by a 3.2:1 ratio, companies are forced to pay a premium to scarce specialists. For comparison, AI professionals generally earn 67% more than traditional IT roles.

And that’s not all. Team members spend time researching innovative techniques and assessing new models to stay ahead of tech advancements. Thus, the cost of AI expertise stretches to employee learning and experimentation in addition to base salaries.

Technology Evolution and Adaptation

Nobody will deny that AI technology, more precisely GenAI as we know it, progresses at unprecedented speed. New models appear literally monthly, usually bringing in improved capabilities or better performance characteristics. As a result, businesses face constant pressure to reassess their technology stack. It consumes noticeable resources to test new models against existing ones and benchmark performance differences.

Adopting a newer model is never a simple swap. Migration projects demand refactoring code and retraining custom components, along with hundreds of hours spent on compatibility testing. Even when migration promises long-term efficiency gains, short-term spending increases. Organizations striving to remain competitive should include these expenses in their overall GenAI cost strategy.

Scaling Complexity and Optimization

Unfortunately, usage growth drives exponential increases in costs associated with introduced operational complications. Let’s understand why this is so. Monitoring becomes more challenging because new users exhibit different usage patterns, which require additional analysis. More users also mean more edge cases, as the system will encounter unusual combinations of inputs or languages. Finally, companies will need to perform additional tests to cover new user scenarios.

So, how to optimize the cost of generative AI for growing user demand? The Quantum team suggests combining several tactics for maximum effect.

Using model routing lets businesses direct simple queries to cheaper models and preserve more robust ones for compute-intensive tasks. Prompt optimization, which may encompass removing outdated system instructions and condensing retrieved context, helps companies consume fewer tokens per request while getting answers of the same quality. Caching is another effective method aimed at storing the most frequently accessed responses to avoid redundant API calls.

Architecting Sustainable GenAI: Economics and Long-Term Strategy

As it turns out, keeping GenAI applications economically viable in production is not easy. Even well-funded pilots can become financial liabilities. Organizations that succeed are the ones that design for cost discipline as deliberately as they design for accuracy and latency.

Sustainable AI starts with acknowledging that API bills represent only a fraction of the total AI implementation cost. As we have discussed, spending accumulates through every system layer, from data pipelines to quality assurance. If organizations account for only direct model inference expenses, they have to deal with exponentially rising costs only after scaling, when course correction becomes several times more expensive.

This is the case for AI FinOps. Using FinOps practices in GenAI projects enables teams to attribute costs to specific use cases, user segments, or business outcomes, rather than treating AI spend as shared overhead. According to the FinOps Foundation, organizations with a mature cost attribution framework are much more likely to keep cloud and AI spending within forecasted limits.

Cost-effective GenAI is as much an architectural challenge as a financial one. For example, by opting for model-agnostic designs, teams gain the ability to switch providers if one of them increases pricing or changes compliance requirements, performance, etc. Multi-provider strategies have low lock-in risk and give companies leverage when negotiating usage-based pricing. Engineering modular architectures will let companies easily replace safety filters, models, or other components without re-platforming the entire system.

The Bottom Line on the Hidden Investment in AI Excellence

That simple text box masking extraordinary complexity tells only part of the story. Behind every seamless conversation stands a myriad of processes and operational responsibilities. This is why the question “How much does artificial intelligence cost?” doesn’t have a straightforward answer.

The success of GenAI products is not defined by model sophistication alone. It’s sustained by thoughtful architecture, investment in people and processes, and relentless operational excellence, managing this complexity. The narrative positioning AI as simple or inexpensive crumbles rapidly when confronted with reality. Each hidden cost of AI revealed reflects the necessities organizations discover only through experience.

Yet these complications shouldn’t discourage investment. GenAI delivers capabilities and business value that were unthinkable just a few years ago. The complexity and cost of AI mirror the real power required to make that possible at scale.

The Hidden Operational Costs of GenAI Products: Lifecycle, Risk, and Long-Term Maintenance was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

2026 Strategic Guide to Satellite Imagery Providers: Which Data Suits Your ML Project?

Ruben Melkonian — Tue, 07 Apr 2026 19:37:17 GMT

A guide to selecting satellite imagery providers and data specs that match resolution and delivery method to your project.

Continue reading on ILLUMINATION »

How to Accurately Estimate a Data Science Project: A Step-by-Step Framework

Ruben Melkonian — Mon, 08 Dec 2025 23:09:17 GMT

Learn why most data science projects fail, how to structure realistic data science project estimation, and improve performance metrics.

Photo by Patrick Perkins on Unsplash

When clients ask how long a data science project will take, they often expect the kind of answer you’d get for a traditional software task. But data science isn’t about building a known feature — it’s about exploring what’s possible, learning from data, and translating that into something the business can act on.

The irony is that while nearly every company today believes they need AI or advanced analytics to stay competitive, most data science initiatives fall far short of expectations or fail outright. There’s no single source for an exact failure rate, but industry signals are hard to ignore.

Back in 2016, Gartner analyst Nick Heudecker estimated that as many as 85% of data science projects fail. Another source claims only 13% reach production.

Despite rapidly evolving tools, platforms, and modeling techniques, the core reasons behind failure remain remarkably consistent: leadership relying more on gut instinct than insights, teams lacking a clear business case, and organizations pushing forward with data initiatives without first building a culture of evidence-based decision-making.

This article outlines a structured framework for estimating data science projects and demonstrates how to apply it to ensure realistic scoping and timelines. To help you apply this approach, we’ve also included a downloadable estimation file that can help structure and scope DS projects with realistic effort ranges.

The Core Difference: Accuracy Over Delivery

Software development estimation is driven by functionality. You know what you’re building-a checkout page, a dashboard, a mobile app- and the effort is measured in features.

In contrast, data science development services are measured in terms of model performance, often using data science performance metrics like accuracy, precision, recall, or F1-score. While accuracy is a common requirement in many systems (such as GPS devices, sensor platforms, or analytical software), in data science it carries a different challenge: you often can’t predict what level of accuracy is achievable until you explore the data.

You can’t promise a 95% F1-score on day one because it depends on factors you may not fully control at the start: data quality, feature availability, noise levels, hidden biases, or the fundamental learnability of the task. In software engineering, requirements like accuracy can often be engineered through design and calibration. In data science, they’re discovered through experimentation.

That changes everything. You’re no longer estimating how long it takes to implement a solution — you’re estimating how long it will take to find one. And that journey includes missteps, restarts, and iterations.

Why Iterations Work (and Waterfall Doesn’t)

We estimate DS projects as a series of iterations. Each iteration has a defined goal — build a baseline model, improve performance, test generalization — and is scoped and priced independently.

This lets clients control how far they want to go. After each iteration, they get measurable results. If those results are good enough, they can stop. If they need improvement, they can fund another iteration with clearer insight into expected gains.

Our iterations follow the CRISP-DM framework. It’s not a buzzword– it’s genuinely the backbone of our work. Every cycle includes understanding the problem and data, preparing inputs, building and evaluating a model, and considering deployment. This loop keeps the work grounded and allows us to structure estimates in a way that aligns with real progress.

Source: Image by the author.

Estimating the Whole Flow

The most overlooked truth in estimation in data science is this: modeling is only 30–40% of the effort.

It begins with requirements — understanding what the client truly wants to achieve, and how the output of the model will be used in their product or workflow. That means aligning with data science KPIs and thinking about where and how predictions will be consumed.

Then comes data collection. Often underestimated, this phase can swallow weeks of work. Open-source datasets may look promising but turn out to be noisy, incomplete, or misaligned with your real-world task. Even when the data is available, cleaning it, converting it, and storing it in usable form is a job of its own.

After that, we move into the iterations themselves: modeling, evaluation, and refinement. This is where the typical excitement lies, but without the previous two stages done well, no model can succeed.

Finally, and most often overlooked, we transition research code into production. This means setting up environments, packaging models into APIs, writing tests, logging outputs, and ensuring the solution integrates smoothly into the customer’s systems. It’s not glamorous work, but it’s essential. In fact, the most common reason DS projects fail to create business value is because this step is rushed or skipped.

Cross-industry standard process for data mining (CRISP-DM). Source: Image by the author.

A Test Example: Satellite Biomass Estimation

To demonstrate how part of our estimation framework works in practice, let’s walk through a test case we use internally to explain our structure. The task is to estimate data science project above-ground biomass using satellite imagery. This example isn’t based on a real client, but it closely reflects the types of projects we deliver.

It starts with dataset research: identifying open satellite sources, checking resolution, and downloading samples. The estimate for this phase is around 20 m/hours.

The first modeling iteration is designed to build a simple baseline model. Its goal is not to maximize accuracy, but to validate whether the data has predictive value. This includes preprocessing, model training, and evaluation — typically around 50 m/hours.

In the second cycle, we aim to refine the model by exploring new architectures, enhancing features, and improving performance. Another 50 m/hours.

The final stage involves preparing the solution for deployment: setting up Docker, building pipelines, and generating documentation. This takes about 20 m/hours.

The total estimate comes to roughly 150 m/hours. But more important than the number is the structure: this example shows how we factor in requirements, iterations, and production readiness from the beginning — something that’s often missing from traditional approaches.

You can explore the full Excel estimation file here. Source: Table by the author.

Avoiding the Most Common Pitfalls in DS Project Estimation

From experience, we’ve seen a few patterns emerge in failed DS project planning. The most frequent mistake is treating model training as the core deliverable. In reality, training is one-third of the total effort — maybe less. Requirements analysis, data handling, and production work are just as important.

Another issue is underestimating the number of iterations needed. A proof of concept might work, but refining it to meet business targets often takes multiple attempts. Planning for just one iteration and hoping for the best is risky and unrealistic.

Finally, there’s the integration gap. A model that performs well in a Jupyter notebook doesn’t create value unless it’s integrated, monitored, and used. That step needs to be scoped and budgeted just like any other.

Final Thoughts

Estimating a data science project isn’t about predicting the future. It’s about managing uncertainty in a structured way. By breaking the work into clear, time-boxed iterations, grounding your process in CRISP-DM, and not ignoring the messy yet essential parts, such as integration and deployment, you can build realistic and actionable project plans.

How to Accurately Estimate a Data Science Project: A Step-by-Step Framework was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Difference Between Generative AI vs Traditional AI

Ruben Melkonian — Tue, 02 Dec 2025 13:32:13 GMT

Go beyond the hype to see how AI truly works. This guide distinguishes between traditional AI and Generative AI, showing why the smartest businesses are building a future by balancing both.

AI has become one of the defining technologies of the 21st century. It is powering everything: from systems for fraud detection to personalized recommendations on Netflix. However, in the last 3 years, generative AI has taken the spotlight, offering not just simple analysis but creativity at scale.

According to McKinsey’s report, approximately 79% of the executives have already tested generative AI in their operational business processes. Meanwhile, Gartner assumes that over 80% of companies will use GenAI APIs by 2026. Such a rapid shift raises the question: how is generative AI different from the traditional AI that has dominated the market for the last 20 years?

Distinction matters here as each one has its strengths: traditional AI excels in predictive analysis and optimization, while generative AI creates original content. Companies put their investments or even reputation at risk if they don’t apply these technologies wisely. Thus, in this article, we are exploring the evolution of AI, comparing traditional AI vs generative AI use cases, discussing strengths and limitations, as well as risks of each.

The Evolution of Artificial Intelligence: From Rules to Generation

During the 1990s and early 2000s, the field of AI saw the rise of machine learning. This time, engineers trained algorithms using large datasets instead of following human-written rules. Such a breakthrough resulted in credit scoring tools, supply chain optimization, and spam filters, marking it as the era of traditional AI.

The 2010s brought innovations in deep learning and speech recognition, which paved the way for real-time translation and autonomous vehicles. Generative Adversarial Networks (GANs) introduced by Ian Goodfellow in 2014 pioneered the idea of having two models compete: a generator that produces data and a discriminator that evaluates it.

Such a discovery enabled the synthesis of highly realistic data. GANs became the foundation for such applications as image generation, style transfer, and super-resolution.

2017 was another breakthrough year, which introduced the Transformer architecture. Being a self-attention model, it allows models to process entire data sequences in parallel, making training faster. Today, Transformer technology is the backbone of LLMs and other multimodal systems.

By 2020, generative AI had emerged as the next leap, with diffusion models being in the spotlight of more advanced image and audio generation. It was exactly the time when many IT companies noticed a potential in this innovative technology. Quantum has been working with AI since 2015, which means we are the leaders of scaling AI.

As opposed to other companies, which have adjusted their brand voice and the vector of their business strategy in response to the growing interest in AI-based solutions, our expertise comes from over a decade of dedicated work and hands-on knowledge of all aspects of AI models.

Unlike previous predictive models that classify or predict, generative AI uses advanced architectures (transformers, GANs, diffusion models) to create. Systems like ChatGPT (OpenAI) and Gemini (Google DeepMind) represent this shift. This transition mirrors the broader shift in the paradigm of AI research: from answering “what is this?” to “what can I create?” (Stanford AI Index Report, 2023).

What is Traditional AI?

Traditional AI is an AI model that focuses on approaches based on analysis, classification, prediction, and optimization. It does not generate new outputs, working within the boundaries of the existing structured inputs.

Traditional AI Systems: Key methods

Tools and Examples

Non-generative AI has laid a solid foundation for multiple business processes that happen seamlessly and remain unnoticed by a regular customer. With over 100 commercial projects to its name, Quantum is at the forefront of successfully integrating traditional AI into various spheres. The company’s work on fraud detection with AI illustrates traditional AI’s strengths.

By analyzing transaction histories with anomaly detection algorithms, financial institutions were able to reduce the number of suspicious transactions without introducing customer friction.

Retail is another sphere where global leaders like Walmart and Amazon benefit from AI usage. Predictive AI helps them anticipate shopping trends to stock on time and minimize waste. Despite its obvious accuracy and prediction, this optimization is limited to the existing information as opposed to generating non-existent data in case of GenAI.

Healthcare is another field that benefited greatly from traditional AI. Healthcare is tightly connected to the analysis of multiple data, which is quite time-consuming and not always cost-efficient. Predictive models are among the best tools that can help detect chronic diseases earlier, as was demonstrated by our advanced blood cancer classification tool.

Other applications of traditional AI include:

Energy: predictive maintenance for turbines and grids.
Logistics: optimization of routes to cut fuel costs.

Such solutions based on traditional AI are not only reducing the burden on hospital resources but also improving diagnostic accuracy. Popular libraries such as TensorFlow, scikit-learn, and PyTorch support such multimodal tasks, making traditional AI cost-efficient and accessible. When the task is prediction-focused, structured, and accuracy-driven, traditional AI is usually the best fit.

What is Generative AI?

Definition and Core Principles

Generative AI focuses on generating new data, which resembles human creative solutions by recognizing specific deep structures within datasets and reorganizing them into original output. Apart from simply producing texts and images from scratch, generative AI has redefined industry processes. For example, GitHub Copilot enhances software development by suggesting code lines, saving engineers precious hours of work. Generative AI thrives in the environment of unstructured data, serving as a catalyst for innovation.

Key Technologies

Large Language Models (LLMs): GPT-4, Gemini, Claude for text and code generation.
Generative Adversarial Networks (GANs): producing images and videos.
Diffusion Models: powering tools like DALL·E 3 and Stable Diffusion for realistic imagery.

How it Works

There are 3 main steps involved in how GenAI works:

Tools and Examples

Generative AI has a breakthrough potential in business. According to PwC, active implementation of GenAI can add approximately $15,7 trillion to the world’s economy. As a result, at Quantum, we see an increase in generative AI usage in business operations. Working in the field of data science and AI since 2015, we actively engage with multiple businesses and consult about potential advanced solutions for their niche. For example, in our blog, we recently wrote about hyper-personalized services that enable the generation of extra revenue and attract new clients.

E-commerce is another niche that maximizes the usage of generative AI for business scaling. AI-supported optimization of online shops is expected to generate a measurable positive impact, as we demonstrated in our solution for e-commerce retailing. Apart from significantly increased automation of routine tasks, engagement rates grew proportionally.

Having introduced a multilingual AI agent, we expanded market reach and ensured 24/7 support is available for all clients. Non-stop sales alongside solid customer support around the clock increased revenue and improved customer satisfaction rates.

Owing to deep learning techniques, at Quantum we were able to deliver GenAI solutions in various spheres, including healthcare and pharmacy, with ultimate benefit for each party:

for patients — improved therapy compliance and efficacy;
for physicians — a more accurate decision-making process;
for healthcare providers — reduction of treatment cost through decreased number of hospitalizations as well as emergency treatments.

Meanwhile, the ability to generate new content helps to automate the creation of marketing copy and product descriptions, freeing human teams to focus on other higher-end tasks. Working beyond the rigid limitations of traditional AI, the creative side of GenAI makes it more and more embedded into operational processes.

Key Differences Between AI and GenAI

Key differences of generative AI vs traditional AI can be summarized according to several key categories: goal, types of data, outputs, cost, and complexity.

Analysis of Differences

Goal: traditional AI improves efficiency and is good at specific scenarios; generative AI sparks innovation and is able to generate new content.
Data: predictive AI depends on structured records; GenAI thrives on multimodal data.
Outputs: one predicts; the other generates.
Costs: open-source predictive models are relatively cheap and do not require a team of tech specialists to support them; training GPT-like systems costs more, as it requires an AI provider.
Complexity: businesses may prefer traditional AI for credibility, while GenAI requires new governance frameworks.

Use Cases: AI in Action

Finance

Both Traditional AI and Generative AI have earned their spot in the finance sector, serving as a reliable assistant.

Top current use cases of AI in fintech

Traditional AI in financial services underpins trading algorithms and identifies anomalies in structured datasets while protecting sensitive data. Above that, machine learning has revolutionized the sphere by automating routine tasks.

Generative AI may create personalized financial reports and summaries, allowing banks as well as individual entrepreneurs to correct their business strategy and offer personalized financial solutions for businesses. Above that, LLMs have become a cornerstone of document processing in finance, serving as a time- and cost-saving tool.

Gen AI in Healthcare

The exploration of machine learning in healthcare demonstrated that traditional AI can significantly enhance efficacy, reduce costs, and time consumption by healthcare specialists. Predictive analysis performed by a traditional AI is especially beneficial for doctors or healthcare analysts who are involved in research. Meanwhile, GenAI helps those healthcare professionals who deal with end clients:

consultants,
receptionists,
nurses.

Taking into account our expertise and experience in offering AI-based solutions in healthcare, we have proved through our multiple projects: predictive AI improves chronic disease diagnosis, improves prognosis, and risk stratification assessment.

Manufacturing

Manufacturing is another high-risk domain that has been experimenting with introducing AI technologies into its processes. From basically replacing human labor to automation of repetitive processes, AI technologies have earned their place in the industry. Safety is one of the paradigms that was redefined with the help of AI.

Computer vision systems with OCR became especially beneficial here, as they increase manufacturing efficiency. Our quality control solution demonstrated 99,9% accuracy in anomaly detection due to real-time video processing based on embedded AI.

Owing to solid analytical capabilities, AI-powered solutions can monitor equipment’s condition and predict its maintenance needs. This is a textbook example of traditional AI excelling at structured forecasting.

Customer Support

One of the best ways to assess the growing trend in GenAI usage is to see the number of GenAI-powered e-commerce sales agents and chatbots. One of the projects our expert team at Quantum delivered was a generative AI chatbot that transformed customer service operations. Unlike traditional scripted bots, it offered empathetic interactions and cut response times while boosting customer satisfaction metrics.

Together, these few cases prove that regardless of the niche or sphere of usage, the real competitive edge lies in combining these two models — prediction and creation.

Strengths and Limitations: Traditional vs Generative AI

Both AI models have their strengths and limitations, which are used to determine the most appropriate tool for specific purposes.

Traditional AI

Strengths: proven reliability, explainability, and relatively low cost. Works well in scenarios it was trained on.
Limitations: requires structured datasets, less efficient for tasks requiring a creative approach.

Generative AI

Strengths: availability of multimodal outputs, creativity, and adaptability in various areas.
Limitations: ethical considerations, relatively high computational costs, and risk of hallucinations when the data becomes too specific.

Economic efficiency is another essential question lying in the long-term perspective of AI usage. Traditional AI often relies on open-source libraries, which are objectively cost-effective. On the contrary, generative AI requires a provider to pay for usage, which significantly increases costs.

As OpenAI CEO reported, training GPT-like AI models may require 1000,000,000$, which increases potential expenditures for businesses looking to implement GenAI-based solutions. Identifying the processes and industries that will benefit from either model’s solutions is one of the keys to both cost- and money-efficient decisions in business.

Risks and Ethical Considerations

Both AI frameworks raise serious challenges:

Privacy: risk of sensitive datasets leaks during training.
Bias: both models (predictive and generative) may strengthen societal biases.
Copyright: generative AI can potentially reproduce copyrighted work, which raises serious legal disputes.

Working with AI for nearly a decade, we realize the complexity of ethical considerations of using both traditional AI and generative AI. Our groundwork and research on ethical AI frameworks emphasize the need for auditable pipelines, bias detection tools, and transparent reporting. For example, one approach is training on synthetic datasets generated under strict governance, reducing exposure to personal data while maintaining performance.

Privacy

The risk of sensitive data leakage has never been more critical than during AI training. Such exposure is not limited to a model’s certain development phase — deployment itself can also possess potential risks. The complex structure of modern generative AI systems, including the vast utilization of vast datasets, makes it difficult to trace the origin of the information. As a result, we have a conflict with current regulations (for example, GDPR).

Various AI tools have been actively introduced into many spheres, including finance and healthcare, where data vulnerability is at its highest. Not only the lack of a solid regulatory basis and procedures, which enable firm control over data transfer and its protection, but also the lack of technological tools to support all of the aforementioned, are in the spotlight of debates.

Several solutions, such as differentiated privacy and federated learning, are currently being investigated as potential countermeasures. They are expected to minimize data leakage, allowing new AI models to train on decentralized datasets.

Bias

Both traditional and generative AI may strengthen biases. One major reason underlying this is that they are trained on historical data that often reflects existing human prejudices and systemic discrimination. Consequently, it can lead to discrimination-based decisions in multiple areas, including:

applications for loans,
hiring processes,
criminal justice.

In case of non-generative AI, this can result in inaccurate risk assessment for certain demographic groups or misalignment. The cause for this is believed to be rooted in inappropriate training: wrong answers and negative behaviors during training produce misalignment in various spheres. For GenAI, bias can be detected in stereotypical or even potentially harmful content generation. Addressing bias requires not just careful data selection but also re-training the model by means of correct algorithms and specific positive reinforcement of “correct” behavior.

Copyright

Generative AI can potentially reproduce copyrighted work, which raises serious legal concerns. It is especially frustrating for content creators, artists, and publishers, who insist that their intellectual property is being repurposed without permission or financial compensation. The concept of “fair use” is being actively discussed in courts worldwide as the legal system attempts to keep pace with technological capability to generate content from millions of source items. In addition to that, there are legal questions concerning the ownership of the output itself: who owns the rights to a piece of content generated by an AI — the developer of the model, the user who prompted it, or is it un-copyrightable?

The Difference Between Generative AI vs Traditional AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

LLMs: Everything you need to know

Ruben Melkonian — Tue, 02 Dec 2025 08:23:10 GMT

An overview of Large Language Models (LLMs), focusing on their types, differences, capabilities, and applications.

What are the Large Language Models?

Large language models (LLMs) are powerful AI models designed to understand and generate human-like text. They use massive amounts of data to learn patterns in language, allowing them to answer questions, generate content, and perform various language-related tasks.

How powerful are LLMs

Large language models work in a few key steps. Initially, they learn from tons of text to understand grammar, context, and general knowledge. Text gets segmented into smaller units called tokens, and neural networks process these tokens to transform input into meaningful output. Afterward, the model undergoes fine-tuning for specific tasks, like answering questions or generating text, and then, during inference, produces output, like generating text or answering questions.

Handling large amounts of data: talking numbers

Language models vary in size and capabilities. For example, GPT-3 was trained on a vast 570 GB of text data. In contrast, the previous generation’s state-of-the-art BERT model had a training dataset of only 16 GB of text.

To give you some perspective, a 1 GB text file can hold roughly 178 million words. Just think about the immense number of words used to train even the smallest one from LLMs! This way, you could input an entire book into these models. Yes, it might be costly, but it’s definitely feasible!

Adaptability to different domains

Large language models possess remarkable versatility and can work with data from any field without delving into domain-specific details. However, to truly excel in a specific topic, they require fine-tuning tailored to that particular domain. This adaptability empowers them to comprehend and produce relevant content across various fields, including finance, healthcare, or even your favorite hobby.

Which tasks do LLMs solve?

LLMs are highly adaptable and can tackle various NLP tasks. These include text generation, translation, text summarization, classification, sentiment analysis, conversational AI, and chatbots. Let’s take a closer look at some of these tasks:

Named Entity Recognition & Relation Extraction

LLMs can identify specific entities in textual data, such as names, places, or organizations, such as the name “Mike” and the institution “MIT,” and establish connections based on their relationships, like “Mike is studying at MIT.” It is essential for organizing and categorizing information effectively, especially if you need to process massive documents.

Sentiment & Intent analytics

Large language models excel in analyzing language to determine the intent behind a user’s query or message and whether it expresses a positive, negative, or neutral sentiment. This capability is crucial for businesses to gauge brand loyalty and make informed decisions.

AI Agents & Virtual Assistants

LLM-powered agents are invaluable for customer success. A deep understanding of language and domain data enables them to engage in natural conversations and intuitive user interactions. As an example, check our LLM-based financial advisory chatbot for instant investment advice.

ChatGPT and GPT-based models

First things first, ChatGPT isn’t a standalone model; it’s a composite system comprising GPT-3.5 Turbo and GPT-4 models.

GPT-3.5 Turbo is a powerful language model known for its impressive ability to understand and generate human-like text. However, some businesses, especially those requiring advanced language understanding, might fail from GPT-3.5 and require GPT-4 integration.

GPT-4 is relevant for industries such as legal services for in-depth case analysis or complex scientific research, where its enhanced capabilities in understanding context and generating more accurate and contextually relevant text can be valuable.

However, there are concerns about data privacy since sensitive data passes through OpenAI servers, and there are no alternative options for keeping it in-house. These considerations are crucial when leveraging these advanced language models for specific applications.

Open-source large language models

Custom open-source LLMs may not match the scale of GPT-4, but they offer the flexibility of fine-tuning for various tasks, which GPT-based models handle as well. However, there is one thing: open-source LLMs are way easier to fine-tune on specific-domain tasks.

Let’s say you need a chatbot therapeutist that can make an initial diagnosis or direct users to specialized doctors based on the same symptoms. In such case, a pre-trained open-source LLM fine-tuned on specific domain data will definitely excel where GPT-4 falls short.

Most open-source LLMs are available on Hugging Face, accessible via the Transformers library. Since this is a rapidly evolving field, the top open-source LLM models change almost monthly.

On the day of writing this article, the Open LLM Leaderboard appeared as follows:

Open LLM Leaderboard

The competitive advantage of open-source LLMs lies in data privacy, as they host the model within their infrastructure. This means you can access the source code, understand the model’s operation, and verify its behavior, which can be crucial for ensuring ethical and unbiased outputs. So, choosing a suitable model for specific data needs requires significant consideration.

Large Language Models deployment

When it comes to deployment, for OpenAI models, the process is straightforward and transparent-you can purchase an API and integrate it into your application.

Deploying open-source LLMs can be costly, especially when handling substantial data loads. It also requires the use of less obvious optimization methods to guarantee speedy performance. However, at Quantum, we’ve implemented several effective strategies in our projects that helped optimize the model operations while managing the costs without compromising the quality.

Working with open-source language models may present challenges, but their capabilities are really worth it. Embracing the potential of open-source language models can result in text analysis and generation solutions that are more accurate, context-aware, and efficient.

Prompt engineering for accurate outputs

Prompt engineering is carefully crafting and refining the prompts or instructions that serve as helpful add-ons transmitted in model requests to guide responses. These prompts specify what to answer, how to respond, which words to use or avoid, the recommended answer length, and the chatbot’s role.

This feature allows us to configure the model as needed, obtain the proper answer, and seamlessly switch between tasks.

You can learn more about prompt strategies for different purposes in our article: “How to build question-answering system using LLM.”

Prompt engineering strategies for LLMs. Source

Looking Ahead of LLM’s Future

The latest advancements in LLM technology will likely begin a new era of automation and efficiency across various industries. We can anticipate LLMs being employed in healthcare, finance, and education solutions to streamline processes and provide valuable insights. However, it comes with the responsibility of ensuring ethical utilization and a solid commitment to protecting user data.

As LLMs keep improving, it’s vital to establish strong data protection and clear AI usage rules. Also, ongoing research should tackle biases and promote fairness in AI to avoid unintended issues and ensure these innovations benefit society.

Originally published at quantumobile.com

AI Cost Reduction Outlook: How to Cut Operational Expenses Smartly

Ruben Melkonian — Sat, 29 Nov 2025 18:02:19 GMT

Achieve significant cost savings with a four-layer AI framework that moves beyond simple automation to optimize workflows and generate predictive forecasts.

How much is your organization losing every quarter by ignoring AI cost reduction potential? According to a recent McKinsey study [1], companies that have already applied gen AI across most critical operations see cost reduction by over 20%.

While 93% of C-level executives plan to use AI to slash expenses in the coming 18 months [2], the gap between intention and execution continues to widen — costing businesses millions in missed opportunities.

Through our work with over 50 businesses seeking to streamline spending and improve performance, we’ve discovered that AI cost efficiency is incomparably successful. Our AI consultants, time and again, observed that companies implementing artificial intelligence technology generally achieve 10x higher ROI than those still dependent on manual workflows.

Having helped organizations adopt AI and machine learning solutions for quite some time, we at Quantum decided to pull back the curtain on how AI reduces costs and explain the practical steps CEOs and CMOs can take to prove AI ROI and start cutting overheads that quietly bleed budgets every day.

How Does AI Reduce Costs: Four Layers of AI Cost Impact

Manual task automation and productivity gains — that’s usually where the AI conversation starts. And ends. As McKinsey’s survey found [3], just 1% of leaders say they’ve reached a stage where AI is fully integrated and actually drives outcomes.

While basic task automation delivers immediate savings, it represents only one of the four layers of AI’s cost optimization potential. Each of them helps companies cut expenses and increase margins, but on different levels. Let us show you how.

Source: Image by the author.

1. Surface Layer: Direct Automation Savings

The surface layer represents what most organizations, to be exact 75% of the surveyed [4], discuss in boardrooms: direct task replacement and immediate labor savings. It’s the “low-hanging fruit” of AI implementation, but also a vital first step for companies that, for some reason, haven’t embraced gen AI solutions yet.

Here, we talk about the digitalization of simple, routine tasks like email management or document scanning, where one human action is replaced with a digital one, and saves X hours. Companies investing at least 20% of their IT budget in automation achieved an average of 22% in cost savings [5], primarily by automating repetitive tasks and minimizing manual errors.

One particularly noticeable case is the seemingly simple automation of multilingual customer support for one of our clients. A Canadian provider of AI-powered customer experience platforms poured millions yearly on large teams handling repetitive queries in rare and mixed languages.

Now, 70% of the inquiries are addressed by a natural language processing (NLP) chatbot platform powered by Google BERT [6]. Thanks to this solution, the company cut 20–25% in customer service expenses within just one year, a six-figure operational spend.

2. Process Layer: Workflow Optimization and Error Reduction

This is the layer where AI reconfigures the entire workflow or process that may involve multiple tasks and systems. It’s where the real cost leverage begins.

Instead of simply automating a handful of tasks, generative AI at this level helps companies get rid of duplicated work and bottlenecks, collapse multi-step approvals, and remove latency from critical processes. And it does all of this with little to no human touch. See the difference?

From our experience, HR and financial process automations top the demand list because they offer clear ROI, standardized workflows, and regulatory compliance benefits that appeal to executives looking for measurable wins and ways to reduce operational costs. Talking about quantifiable outcomes, here’s one of our clients’ stories about an automated document parsing solution [7].

A Sweden-based financial services company needed to completely overhaul its invoice processing workflow, which was full of manual bottlenecks from dealing with thousands of documents daily.

So here’s what was done:

Source: Image by the author.

3. Intelligence Layer: Predictive Insights and Proactive Cost Prevention

Business intelligence solutions powered by generative AI technologies allow companies to spot issues and not-yet-visible cost drivers before they impact the bottom line. That’s what the intelligence layer is about and where the cost of implementing AI begins to pay off in measurable ways.

Realizing the huge strategic and financial upside, 28% of CFOs are already using artificial intelligence to automate forecasting, and another 39% plan to adopt it within the year [8]. The use cases are diverse and usually come with a million-dollar perspective of cost avoidance and efficiency gains.

For example, one oil and gas exploration company uses a machine learning model to predict the location of oil reservoirs [9]. The solution segments data by geographical and seasonal factors and applies feature selection techniques to achieve the desired 70% accuracy in site predictions.

And now, the most interesting part — the actual advantages of cost-effective AI.

The client saves $10–40 million per exploration cycle through more data-driven and precisely targeted site selection. On top of the direct savings from fewer unsuccessful drilling operations, the intelligence-layer solution allowed the company to refine its exploration strategy and identify previously overlooked high-potential areas. They expanded their explorable territory by 300% while decreasing risk exposure by 70%.

4. Ecosystem Layer: Network Effects and Compound Improvements

Traditional cost reduction follows predictable patterns: cut 10% from operations, save 10% overall. Ecosystem-layer AI breaks this linear relationship. Organizations operating at this level create a snowball effect where improvements multiply faster as more partners join their
network.

The cost of implementing artificial intelligence at the ecosystem level becomes insignificant compared to the loss of staying out.

To put it in context, let’s take a medicine service provider and see the mechanics of the network effect.

Let’s look at how the AI-powered decision support system (DSS) helps medical practitioners fight COVID-19 and the associated challenges better [10]. Each of the 50+ hospitals and regional health authorities contributes data to a shared AI-based DSS solution that improves treatment recommendations, protocol compliance, and EMR synchronization for all facilities.

If we go into detail, they collect patient data, e.g., examination results, vital signs, and treatment history, which is automatically synchronized with the central hospital information system. So when one of the hospitals encounters a similar case or pattern, the DSS gives the optimal recommendations that adhere to expert-defined rules.

Source: Table by the author.

The most compelling aspect is that even if competitors invest in the same technology, they won’t be able to replicate the network effect, as they lack the data network that powers optimization.

AI Investment Payoff: Costs, Returns, and Insights

“How much does AI cost?” and “What is the approximate cost of implementing artificial intelligence in business?” These are the two most frequently asked questions Quantum AI consultants have to answer repeatedly. But evaluating the AI development and implementation cost in isolation will miss the full picture, where technology investment evolves into a million-dollar value creation engine.

Having analyzed 50+ enterprise AI projects, here’s a simple table comparing the average development expenses versus potential financial returns.

Source: Table by the author.

Let’s be upfront that the AI development cost may seem pretty high, especially when it comes to custom predictive engine development or data integration pipelines.

But organizations following systematic implementation achieve returns that justify initial expenditure within 18–24 months and create sustainable competitive advantages worth multiples of original investments.

AI Cost Reduction Through a Technical Lens

AI model deployment is only one piece of the puzzle. To see the tangible cost benefits from artificial intelligence, business leaders also need to establish the right technical architecture that optimizes performance while lowering operational expenses.

Our AI experts have outlined four critical technical components that help you ensure your AI initiatives deliver maximum ROI without draining resources.

Source: Image by the author.

Scalable Data Architectures

Data infrastructure is a major, yet frequently overlooked, financial burden in AI projects. Easy-to-scale architecture designed from the start helps companies avoid the expensive “rebuild and migrate” problem as data volumes grow.

The best long-term strategy would be to adopt an elastic infrastructure, no matter whether you choose cloud-native solutions, hybrid data lakes, or distributed pipelines.

AI Model Selection and Tuning

An advanced ML model isn’t always the right fit for every case. Many enterprises initially opt for large-scale deep learning solutions, which often demand huge computing power and specialized hardware.

Yet, what we recommend is to select models proportionate to the task. For example, you don’t need a neural network to classify structured tabular data when classical logistic regression or gradient boosting are more than sufficient to deliver accurate results.

Another practical way to save costs on development is fine-tuning pre-trained models, rather than building one from scratch. Told in confidence, fine-tuned smaller models often outperform large general-purpose models at 10x lower inference costs for domain-specific tasks.

AI Integration into Workflows

A company can realize the real value of artificial intelligence only when it integrates seamlessly into existing business operations. Nothing surprising here. However, many companies approach AI projects as a standalone add-on, and as a result, they struggle with inefficiencies, duplicate tools, and extra spending on middleware.

So, by designing AI around current workflows, organizations reduce friction, decrease redundant manual handoffs, and cut unnecessary licensing expenses.

Continuous Learning with MLOps

Unlike traditional software that often follows a “build and deploy” cycle, AI systems need constant refinement to continue producing relevant outputs.

MLOps (machine learning operations) frameworks help you put this task on automation rails by taking care of model retraining, drift monitoring, and versioning with minimal human intervention. Such an automation dramatically lowers ongoing maintenance costs.

How to Get Started: AI Implementation Roadmap

McKinsey’s recent research reveals a crucial insight: establishing a clearly defined roadmap to drive generative AI adoption has one of the biggest impacts on EBIT [11]. In other words, the fastest way to turn artificial intelligence into a profit engine is to plan its adoption in value-driven phases.

Here are the steps to follow to help businesses adopt AI technology painlessly and with maximum ROI and effectiveness.

Step 1. Define High-Impact Goals

Identify the key pain points where gen AI can save your team time or reduce expenses. To do that, ask yourself these questions:

Where does human decision-making repeatedly slow down the process?
Which processes generate costs that rarely appear on the balance sheet?
Where do delays directly affect customer satisfaction, retention, or revenue?

What’s important here is to start with 1–2 processes most subject to costly bottlenecks and not try to revamp the entire operation. Once you pinpoint them, set clear objectives to better track ROI and put capital into initiatives that deliver the most value.

Step 2. Map Processes & Data Flows

To separate out repetitive and time-draining tasks, you should understand and document how existing business processes move across departments. Create as detailed process maps as possible, highlighting bottlenecks and redundancies.

Organizations that thoroughly map their processes before AI rollout achieve 35% better results because they understand exactly where automation will have maximum impact.

You should also trace data flows, in particular, where information gets delayed, duplicated, or lost between systems. Identify data sources, quality levels, and access points that will feed your AI systems. It’s a must if you want to integrate technology with existing workflows properly.

Step 3. Select Optimal AI Solutions and Architecture

That’s one of the trickiest and decisive tasks requiring special expertise. When choosing the right solution, look for technologies that can bring quick automation wins and scale when your project evolves. You can combine several AI technologies in one solution to get better results and multiply the cost savings.

Step 4. Pilot & Measure

The best practice is to deploy AI solutions in controlled phases. So, start with small pilot programs in low-risk, yet high-impact areas to demonstrate the immediate value. Then, continue with proof-of-concept implementations lasting 30–60 days, followed by department-wide rollouts.

To understand whether AI really works for your business, you should assess the effectiveness of your project. Depending on your goals, it could be lower error rates, saved hours for your team, faster workflows, etc.

Step 5. Scale & Integrate

When you are certain your AI project is stable and proven to be reliable in real-world scenarios, expand it into connected processes and organization-scale workflows. Key points here are establishing data pipelines and governance standards to ensure the quality of data for AI. Now, your model is ready to be integrated with your business apps to enable automation.

Step 6. Monitor & Optimize

For your AI model to stay relevant and perform as expected, it needs updates and refinements over time. In this regard, introducing MLOps practices would be the best choice as it automates most of the core maintenance activities. It’s much easier to prevent performance degradation, as you will be alerted as soon as anomalies are detected.

Final Thoughts on AI Cost Reduction Potential

The fact is, gen AI generates returns unmatched by other technology investments. With artificial intelligence, large enterprises as well as small businesses can gain transformational results that drive lasting competitive advantages on top of direct cost savings.

To sum up, here are several undeniable pieces of evidence of AI cost efficiency:

Artificial intelligence delivers immediate and measurable impact

While most organizations experiment with AI, expert-guided implementations consistently reduce operational costs by 20–35% within the first year through automation of high-volume processes and elimination of manual errors.

AI cuts errors and inefficiencies

Modern ML systems show incredible 95%+ accuracy in data processing, quality control, and decision-making tasks. So, you can rely on them to reduce costly human errors that typically account for 15–20% of operational expenses.

AI optimizes resources and increases ROI

ML algorithms let you make the most out of your current equipment, assets, and workforce capacity. For example, smart scheduling and routing software can streamline processes to gain 3–8x return on AI investments.

AI enables scalable growth without cost increases

Organizations can expand operations, serve more customers, and process higher volumes while maintaining or reducing per-unit costs through intelligent automation.

AI Cost Reduction Outlook: How to Cut Operational Expenses Smartly was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

State-of-the-art models for guided depth super-resolution (GDSR)

Ruben Melkonian — Thu, 27 Feb 2025 04:35:40 GMT

Greetings, computer vision enthusiast! In this article, you will learn about Guided depth super-resolution, a trendy topic in modern CV. The article covers both the basics and cutting-edge approaches evaluated on a real-world dataset. If you are an experienced developer already familiar with the topic, feel free to skip the first section and jump straight to the hardcore. Otherwise, we invite you to begin your journey in understanding GDSR.

What is Guided depth super-resolution (GDSR)?

Let's begin with a simpler concept — image super-resolution. Sounds familiar, right? Indeed, this is about increasing the spatial resolution of an image so that a blurry picture becomes clear. Technically speaking, the process involves generating high-frequency details, which basically means smoothing out the rough texture, like in the image below.

Image 1. Super-resolution of an image of a cat

The second important concept is a depth image. This image contains information about the distance from a certain point of view, in most cases, a camera. In particular, it might be represented as a 4th channel (so-called RGB-D image). In the depth image, each pixel corresponds to the distance from the captured object to the camera in meters (or other units). Capturing depth information is usually achieved by employing one of the 2 sensors: either Time-of-flight (ToF) or Light Detection And Ranging (LiDAR) sensors. Their working principles are similar: both send a ray of light and measure the time it takes for the ray to bounce back. You can read more about these here.

Now, it is not hard to guess what depth super-resolution (DSR) is about. True enough, the goal of the task is to increase the spatial resolution of a depth image. The problem arises rather often since both ToF and LiDAR cameras are pretty expensive, especially in high resolution. Depth information is essential in numerous contexts, for example, in 3D scene reconstruction.

The last thing remains. Namely, what is the "guided" part all about? Here, it actually gets interesting. Quite conveniently, ToF cameras are often equipped with other sensors that capture information about color in high resolution. Thus, in addition to the depth image, we usually have a companion RGB image, which can be utilized to guide the super-resolution process. It is noteworthy that adding the RGB guidance also aids the mathematical framework since DSR is an ill-posed problem per se — it is possible to reconstruct multiple high-resolution depth maps from a low-resolution depth image without a color guide.

With that said, let's formally define GDSR: GDSR aims to reconstruct a high-resolution depth map from a low-resolution observation with the help of a paired high-resolution color image. The image below illustrates the task:

Image 2. GDSR: given a color image and a low-resolution depth, we reconstruct a high-resolution depth map

Traditional approaches

To provide a baseline for more sophisticated solutions, it is useful to look into more simple ones. The most basic super-resolution method is bilinear interpolation, the default in OpenCV. Let's first look at the problem from a different angle to understand how it works. For a given image I with shape m x n, the super-resolved image ISR is like a copy of I, only with higher pixel density, or, in other words, a more fine-grained pixel grid. Thus, we want to estimate the values of a 2D function, given its values in a limited number of points. This is a common mathematical problem called 2D function interpolation. There are multiple solutions to this problem; the simplest one is the above-mentioned bilinear interpolation. It takes the closest pixels to the one we have to estimate and averages their values. As simple as it is, it may work frustratingly well, providing a nice balance between precision and speed. If one is willing to trade off some speed for a higher accuracy, one should consider bicubic interpolation. This approach estimates the value for a pixel using a cubic function of its neighbors. As a result, we get a much smoother image than using bilinear interpolation:

Image 3. Example of bicubic and bilinear interpolation

Another advantage is improved detail preservation.

The main drawback of bicubic interpolation in comparison with its bilinear counterpart is its potential for artifacts, as a cubic function works on a larger scale and can sometimes result in unreasonably high or low values.

As efficient as these solutions might be in specific scenarios, neither bilinear nor bicubic interpolation scales well with the increase of the resolution factor. Indeed, they might be efficient for increasing the resolution by 2–4 times, but if you want to super-resolve the image further, say by 8 or even 16 times, they will fail miserably since they do not use the RGB guidance, and thus cannot infer the correct information beyond a certain threshold. In such a case, you would need more sophisticated solutions, described in the upcoming sections.

Evaluation dataset

Before diving into modern solutions, let's look at the dataset that was used to evaluate them. The data was borrowed from the 3rd Autonomous Greenhouse Challenge. It contains 834 RGB-D image pairs of lettuce plants, where the "D" stands for the depth image. Each image comes in full HD resolution (1080x1920 pixels). For evaluation purposes, the depth images were first downsampled by a factor of 4 and then super-resolved back to full HD for comparison with the ground truth.

The dataset consists of 2 subdatasets. The first is a close-on view of a single lettuce plant, while the second is a birds-eye view encompassing multiple plants in a single image:

Image 4. Pictures from the 3rd Autonomous Greenhouse Challenge demonstrating 2 underlying data distributions

SOTA solutions

Quite predictably, the most accurate approaches involve deep learning.

Structure Guided Network via Gradient-Frequency Awareness

It is a frequent pattern in deep learning when a powerful idea is transferred from one field to another. The transformer architecture is a good example — it rose from solving NLP tasks to aiding numerous computer vision problems, some of which we will see later in this article. Something similar happens here, only on a smaller scale. Indeed, the idea of transferring structure during super-resolution comes from a related but slightly different task: single image super-resolution (SISR). The reason why structure is so important is easier to understand once you have seen it in a specific example:

Image 5. Comparison of 5 different super-resolution methods. Illustrates the efficiency of structure-guided approach (f)

The picture shows 6 high-resolution images: the ground truth (top-left) and the results of 5 super-resolution algorithms. It is easy to see that while all algorithms successfully capture the content, only one (bottom-right) does a nice job of preserving the structure — the other 4 algorithms produced results that are either too smooth or too noisy. But how do we capture the structure? In the original SGNet paper, the concept of "structure" is divided into 2 principal components: the gradient and the frequency of the image. The former represents the change in intensity (or color) between neighboring pixels, while the latter refers to the rate of these changes over larger spatial areas, helping to separate overall image structure into fine and coarse patterns across the entire image. SGNet is trained in a residual fashion: first, the low-resolution depth image is upsampled to the desired dimensions using simple bicubic interpolation; then, a neural network is applied to add gradient and frequency features using the color guide. Also, the loss function is modified to punish the network for missing structure information. For a more detailed explanation, please refer to the original paper.

Model's performance

This model performs exceptionally well on the above-described dataset. After fine-tuning, it works almost twice as good as bicubic upsampling.

Table 1. Comparison of SGNet model’s metrics with baseline algorithm on lettuce dataset

Visually, the result is almost indistinguishable from the ground truth:

Image 6. Comparison of the ground truth depth map against the predicted depth map

DepthAnything + SegmentAnything

Foundation models, like DinoV2 or SegmentAnything, have gained significant popularity in the computer vision field in recent years. These models have been trained on large and diverse datasets to prepare for every imaginable scenario. Although a foundation GDSR model is yet to be trained, other foundation models could be utilized to perform a 0-shot GDSR. In particular, we will use 2 foundation models:

DepthAnything (DAny). This model solves the problem of monocular depth estimation. Namely, it estimates the depth map given a color image. Clearly, the task is significantly more complex than GDSR, so naturally, a straightforward application of DAny would yield inferior results. Besides, there is another caveat: DAny outputs relative depth. In other words, its outputs do not reflect any human-readable units, such as inches or centimeters. Instead, they can only be interpreted relative to one another. For example, if one pixel is at distance 1, and another pixel is at distance 2, then it means that the former is twice as close to the viewpoint as the latter. But the exact distance is not known. The reason for this queer output format is hidden in simple geometry. When projecting a 3D world into a 2D plane (which is basically what a camera does when it takes a photo), the depth information is permanently lost. This is the cost you pay for projection. Thus, retrieving the depth using only the color image is impossible. Of course, the problem becomes entirely solvable with the addition of supplementary information. For example, knowing the distance from as few as 2 pixels is enough to restore the metric depth (depth in absolute units, like meters) for the whole image.
SegmentAnything (SAM). This model, introduced by Meta AI, excels at 0-shot semantic segmentation. It was trained on a massive dataset with 11M images and over 1.1B masks, more than 99% percent of which were generated fully automatically.

To combine these elements into a working solution, we offer the following algorithm, described in the workflow diagram below:

Image 7. Algorithm diagram.

Let's break it down step by step.

Input: high-resolution color image ImHR and a low-resolution depth map DLR, highlighted with a red border.
Apply DAny to the color image to obtain a high-resolution depth estimate: DEHR:= DAny(ImHR). Downsample it to match the size of the ground truth: DELR:= bilinear_donwsample(DEHR).
Learn the linear map f from DELR to the ground truth DLR. This is a linear regression problem, and it can be solved using the ordinary least squares (OLS) method.
Apply map f to the high-resolution estimate to obtain the high-resolution metric depth prediction: DHR:= f(DEHR) (highlighted in green).

The algorithm above summarizes only the top part of the diagram above (excluding the lower branch); however, it can be used as an independent algorithm itself. To enhance its performance, we propose the following modification. We hypothesize that the relative depth is more consistent between the pixels of a single object than between pixels from different objects. Thus, we are willing to learn a separate map fobj for every object in the image (and a separate map fbg for the background).

This leads us to a slightly modified algorithm with the same steps 1–2 but different further steps:

3. Apply SegmentAnything to the high-resolution color map to obtain a high-resolution segmentation map: SHR:= SAM(ImHR).

4. Downsample the segmentation map to match the size of the ground truth depth: SLR:= bilinear_donwsample(SHR).

5. Learn a different linear map fobj, mapping the low-resolution relative depth (DELR) to the low-resolution ground truth depth DLR for each object in the image detected on the previous step (background may be considered as an object as well). One way to do this is to use the OLS method.

VI. Finally, the metric depth prediction is obtained by applying the learned maps to the high-resolution depth estimate: DHRobj:= fobj(DEHRobj).

The main advantage of this algorithm compared with SGNet is that it does not require model training or fine-tuning. Note that the linear regression model learned in the 3rd step of the algorithm is learned during inference for each individual input sample; its weights are not shared between different inputs — rather, they are learned separately. This does slightly negatively affect inference time, but it implies that this approach can be applied without prior training.

Now, let's look at the performance of the proposed method.

Table 2. Foundational model’s metrics on the lettuce dataset, compared with the baseline

As we can see, the approach does not yield great results. It outperforms bicubic only on one dataset split; on the other 2, it performs worse. The reason for its inferior performance likely lies in the efficiency of DepthAnything. Despite being SOTA in-depth estimation, it cannot compete with other alternatives for GDSR, which is inherently a simpler problem, as it uses a low-resolution depth map (which DAny does not). On the bright side, however, the rapid development of depth estimation models may lead us to the point when this method might show competitive performance in the field of GDSR as well by substituting DAny with a more advanced alternative.

Code hacks

Before we jump to conclusions, we will use this opportunity to share 2 implementation details that would allow us to run SGNet on a reasonable-sized GPU. Indeed, SGNet is a rather large model, and it takes around 10.77 gigabytes of GPU memory for training on 256x256 images with a batch size of 1, according to the original paper. Thus, it makes sense to apply two tricks that could save a lot of memory.

Train on slices

The first idea is to perform training and inference on image slices rather than full-sized images, which could be very large, like in our dataset. The code for sliced training can be found on SGNet’s Github page: the authors extract random crops from the image and the depth map of size 256x256 (the crop size of the depth map is 256 / s, where s is the scale factor). As for the inference, we share a code snippet that we used to test the model.

Although the snippet may appear large at first glance, the idea behind it is rather simple. A sliding window of a fixed size (your desired inference size) is moved along the image. For each slice that it defines, model inference is performed; namely, we obtain depth predictions for a given image patch. To combine the results for different patches, we have to count the overlap between them. This is done by averaging the depth values for a pixel from all patches that contain this pixel. Note that it is desirable for the window to move with some overlap since, otherwise, prediction artifacts may be noticeable along the window edges.

Mixed precision

Another common memory-saving trick is to use mixed precision. PyTorch allows to do it very simply, using the library’s context manager:

with torch.cuda.amp.autocast(enabled=True):

There is one notable caveat, however. The thing is, PyTorch (at least at the time of the writing) does not support mixed precision for Complex numbers that are used in SGNet’s implementation. The solution to this problem is simple: mixed precision should be turned off for the Complex number operations. An example implementation can be found in this code snippet.

Conclusions

There are two main takeaways from this research. The first one is that the most efficient way to perform GDSR is to fine-tune a neural network. If you are looking for high accuracy, brace up for a serious data collection process and network training. SGNet is an excellent example of such a network, as it achieves SOTA performance in the field, which is backed by our experiments.

The second takeaway is that the foundation models have not yet developed so well that they can be applied to such complex tasks as GDSR on the fly. Even the combination of such powerful models as SegmentAnything and DepthAnything does not perform well enough to be useful in practice. But there may yet be light at the end of the tunnel! New foundation models in depth estimation are being developed even as this article is being written: StableDiffusion-based Marigold; Depth Pro, which predicts absolute (!) depth; DepthAnything2 — the second generation of DepthAnything… We firmly believe that this approach could bear fruitful results in the near future.

Visit us at DataDrivenInvestor.com

Subscribe to DDIntel here.

Join our creator ecosystem here.

DDI Official Telegram Channel: https://t.me/+tafUp6ecEys4YjQ1

State-of-the-art models for guided depth super-resolution (GDSR) was originally published in DataDrivenInvestor on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to build question answering system using LLM

Ruben Melkonian — Wed, 13 Sep 2023 08:55:53 GMT

Introduction to Large Language Models

In recent years, Language Models (LMs) have significantly impacted natural language processing (NLP), revolutionizing how we interact with computers and pushing the boundaries of what machines can understand and generate. One such groundbreaking development is the advent of Large Language Models (LLMs), which have opened up a realm of possibilities previously unimaginable. LLMs, powered by advanced algorithms and trained on vast amounts of data, can understand, generate, and manipulate human language with astonishing accuracy and creativity.

This article aims to delve into the exciting world of LLMs, highlighting a brief overview of their architecture and application techniques.

Understanding LLMs

LLMs are state-of-the-art artificial intelligence systems that have the remarkable ability to understand and generate human language. At their core, LLMs are built upon advanced deep learning architectures, such as transformer models, which have revolutionized the field of NLP. These models consist of multiple layers of self-attention mechanisms and feed-forward neural networks, allowing them to capture complex language patterns, dependencies, and contextual information.

A typical transformer model consists of four main steps in processing input data.

First, the model performs word embedding to convert words into high-dimensional vector representations. Then, the data is passed through multiple transformer layers. Within these layers, the self-attention mechanism is crucial in understanding the relationships between words in a sequence. Finally, after processing through the transformer layers, the model generates text by predicting the most likely next word or token in the sequence based on the learned context.

In the transformer encoder-decoder architecture, the encoder processes the input data, applying word embedding and multiple transformer layers to capture the contextual information. The decoder then takes the encoded representation and generates text by predicting the next token based on the learned context. (source)

The concept of self-attention enables the model to focus on different parts of the input text to understand the relationships between words and their significance within the context. It works by assigning weights to different words in a given input sequence based on their relevance and importance in the context. Each word in the sequence attends to all other words, calculating a weighted representation of their contributions to its representation. These weights are determined by computing the dot product of the word’s embedding and the embeddings of the other words, followed by applying a softmax function to obtain normalized attention scores.

By attending to relevant words and phrases, LLMs can generate coherent and contextually appropriate responses that follow a given text input with an instruction to an LLM. This text instruction is called a prompt.

As we are encoding the word “it” in the encoder, part of the attention mechanism was focusing on “The Animal”, and baked a part of its representation into the encoding of “it”. (source)

LLMs have greatly enhanced language understanding capabilities. They can accurately comprehend and interpret text inputs by capturing intricate language patterns and contextual cues. Models adapt their pre-trained knowledge to specific tasks through transfer learning, reducing training time. LLMs also possess zero-shot capabilities, generating responses without explicit training. This has profound implications for various NLP tasks such as sentiment analysis, information retrieval, and question answering, where they have proven instrumental in providing accurate and contextually relevant responses.

LLMs have also revolutionized human-machine interactions by enabling more natural and intuitive communication. Through chatbots and virtual assistants powered by LLMs, users can engage in conversations that feel remarkably close to conversing with a human. This has improved customer service experiences, streamlined information retrieval, and facilitated personalized user interactions.

The release of GPT-3 (Generative Pre-trained Transformer 3) marked significant milestones in developing and deploying large-scale language models. Released by OpenAI in June 2020, the model captured widespread attention due to its unprecedented size and capabilities. With 175 billion parameters, GPT-3 demonstrated remarkable language generation and understanding abilities, leading to human-like text responses.

However, proprietary models like GPT-3 have garnered significant attention and demonstrated impressive capabilities, and they do come with several disadvantages. One primary concern is the cost structure associated with proprietary models. GPT-3 can be expensive as it operates on a token-based pricing model, where users are charged for every 1,000 tokens processed.

Another drawback is latency issues that may arise as the models rely on remote servers, introducing processing delays unsuitable for real-time applications. Additionally, a lack of flexibility comes with proprietary models. Their underlying architectures and parameters are typically inaccessible for modification, limiting the ability to fine-tune the models according to specific use cases or domains.

In response to the centralization of LLM power, the open-source community has actively worked on developing alternative models that promote transparency, accessibility, and community-driven development. One notable example is the LLaMA, a family of LLMs in four sizes: 7, 13, 33, and 65 billion parameters. LLaMa is not an instruction-following LLM like ChatGPT, but the idea behind the smaller size of LLaMA is that smaller models pre-trained on more tokens are easier to retrain and fine-tune for specific tasks and use cases. Although the LLaMA was released under “a noncommercial license focused on research use cases,” models with a commercial use license were released quickly.

Falcon is the first fully open-source large language model, and it has outranked all the open-source models released so far, including LLaMA, StableLM, MPT, and more. It has been developed by the Technology Innovation Institute (TII), UAE. So far, the TII has released two Falcon models trained on 40B and 7B parameters.

Techniques of using LLMs

When it comes to inference LLMs, several techniques have emerged to guide the generation of desired outputs. The four major ones are:

Zero-shot prompting
Few-shot prompting
Fine-tuning
Embedding

A brief summary of Large Language Model approaches.

Zero-shot Prompting

Zero-shot prompting means providing a prompt that is not part of the training data to the model, but the model can generate a result that you desire. This capability stems from their extensive pre-training on massive amounts of diverse text data, which equips them with a broad understanding of human language. During pre-training, LLMs learn to capture patterns, contextual relationships, and semantic representations from the data they are trained on. As a result, when presented with a question, LLMs can leverage their acquired knowledge to generate responses that align with the query.

Utilizing zero-shot prompting is an excellent initial step to swiftly assess the capabilities of LLMs and obtain responses without the need for specialized training. This approach only requires the construction of a prompt, making it a straightforward way to evaluate the LLM’s performance and explore its language understanding abilities. However, it’s essential to note that zero-shot prompting may not always yield accurate or desired results. In such cases, few-shot prompting can be a more practical approach.

Let’s demonstrate a use case of zero-shot prompting for named entity recognition (NER). To perform NER using zero-shot prompting, we construct a prompt that specifies the task as well as the entity types to be identified.

Prompt: Identify the entities in the following text and tag them as PERSON, ORGANIZATION, LOCATION, or PRODUCT.

Text: John Smith is a software engineer at Google. He lives in Mountain View, California.

The LLM, equipped with its pre-trained knowledge, comprehends the prompt and applies its understanding to identify the named entities in the text. In this example, the LLM would recognize “John Smith” as a PERSON, “Google” as an ORGANIZATION, and “Mountain View” as a LOCATION. By employing zero-shot prompting, LLMs can effectively perform named entity recognition tasks without requiring specific training on labeled data for each entity type.

Few-shot Prompting

Few-shot prompting presents a set of high-quality demonstrations of the target task, each consisting of both input and desired output. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot prompting often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when input and output text is long.

This technique proves exceptionally advantageous in several situations where fine-tuning or extensive training may not be feasible or efficient. Firstly, when dealing with limited labeled data for a specific task, few-shot prompting becomes valuable. By providing a small set of relevant examples, the model can quickly learn and generalize from this limited data, enabling it to perform well on the given task. Secondly, few-shot prompting is helpful for rapid prototyping and experimentation. It allows developers to iteratively test and refine models without requiring time-consuming fine-tuning processes.

Few-shot prompting demonstrates a wide range of use cases, including enhancing Natural Language Understanding (NLU) tasks like sentiment analysis, entity recognition, and relationship extraction. It improves question-answering systems by generating accurate responses through demonstrations of correct answers. In text summarization, it aids in generating concise and informative summaries. For conversational AI applications, it guides models to produce context-aware and coherent responses, while in data extraction and formatting tasks, it helps extract and organize information into structured formats.

To illustrate the concept of few-shot prompting, let’s consider an example in which we attempt to classify customer feedback as positive or negative. We provide a model with 3 examples of positive/negative feedback, then show it a new piece of feedback that has yet to be classified.

Prompt:

The product is fantastic, delivering excellent quality and exceeding my expectations in every way: positive

I’m disappointed with the poor customer service and lack of responsiveness to inquiries and concerns: negative

I absolutely love this! It’s user-friendly, durable, and provides exceptional value for the price: positive

Unfortunately, the product did not live up to its claims and fell short in terms of performance and durability:

The model sees that the first 3 examples were classified as either positive or negative and uses this information to classify the new example as negative. We can observe that the model has somehow learned how to perform the task by providing it with just 3 examples (i.e., 3-shot). For more complex tasks, we can experiment with increasing the number of demonstrations (e.g., 5-shot, 10-shot, etc.).

Fine-tuning

In some instances, few-shot prompting may not effectively address a specific use case or deliver the desired results. In such situations, fine-tuning becomes a more suitable option to tailor LLM to the targeted application.

Fine-tuning involves adjusting the parameters of a pre-trained model to improve its performance on a particular task. By supplying the model with a curated dataset of relevant examples, fine-tuning allows the LLM to generate more accurate and context-specific responses. This process is especially beneficial for tasks that demand a deeper understanding of domain-specific terminology, jargon, or unique context that may not be sufficiently captured through few-shot prompting, for instance, customer service chatbots.

It is worth mentioning that fine-tuning LLMs presents its own set of challenges. For example, to fine-tune a 65 billion parameters model, we need more than 780 Gb of GPU memory. It is equivalent to ten A100 80 Gb GPUs. In other words, you would need cloud computing to fine-tune your models.

To overcome the issue of memory usage during fine-tuning, Dettmers et al. presented QLoRA: Efficient Fine-tuning of Quantized LLMs. QLoRA employs an efficient approach that enables fine-tuning a 65B parameter model on a single 48GB GPU.

QLoRA utilizes a technique called Low-Rank Adapters (LoRA) which adds a tiny amount of trainable parameters, i.e., adapters, for each layer of the LLM and freezes all the original parameters. We only have to update the adapter weights for fine-tuning, significantly reducing the memory footprint.

The output activations original (frozen) pre-trained weights (left) are augmented by a low-rank adapter comprised of weight matrics A and B (right).

Next QLoRa goes three steps further by introducing: 4-bit quantization, double quantization, and the exploitation of nVidia unified memory for paging.

In a few words, each one of these steps works as follows:

4-bit NormalFloat quantization: This is a method that improves upon quantile quantization. It ensures an equal number of values in each quantization bin. This avoids computational issues and errors for outlier values.
Double quantization: The authors of QLoRa define it as follows: “the process of quantizing the quantization constants for additional memory savings.”
Paging with unified memory: It relies on the NVIDIA Unified Memory feature and automatically handles page-to-page transfers between the CPU and GPU. It ensures error-free GPU processing, especially in situations where the GPU may run out of memory.

Comparison between standard, LoRa, and QLoRa models for fine-tuning an LLM

These steps drastically reduce the memory requirements for fine-tuning while performing almost on par with standard fine-tuning.

Besides all optimized fine-tuning techniques, it’s important to note that the fine-tuning process focuses on teaching the model new tasks or patterns rather than new information. It means there are better solutions than fine-tuning for tasks that require storing and retrieving additional up-to-date knowledge, such as question-answering (QA).

Embeddings

The problem of demand for up-to-date information, which was not presented in training data, can be solved using semantic embeddings.

Semantic embeddings are high-dimensional numerical vector representations of text that capture the semantic meaning of words or phrases. By comparing and analyzing these vectors, similarities and differences between textual elements can be discerned.

Leveraging semantic embeddings for search enables the quick and efficient retrieval of relevant information, particularly within large datasets. Semantic search boasts several advantages over fine-tuning, such as faster search speeds, reduced computational costs, and preventing confabulation or fact fabrication. Owing to these benefits, semantic search is often favored when the objective is to access specific knowledge within a model.

To enhance an LLM with embeddings, the first step involves obtaining a collection of relevant documents containing the necessary information for the task. Subsequently, these texts are divided into coherent smaller chunks, and their embeddings are computed using a specialized model. Proprietary models like OpenAI’s text-embedding-ada-002 and open-source options like instructor-xl can be employed. These embeddings are stored in dedicated vector stores, enabling efficient search and retrieval operations.

Once the necessary preparations are completed, the next stage involves inference. Let’s consider a question-answering task where Wikipedia pages serve as the source of information. Initially, the question is embedded using the same model used for generating the embeddings from external knowledge sources. Subsequently, the top-K similar text chunks are retrieved using the resulting query vector and then are provided as input context along with the question to the LLM.

Information retrieval system

By utilizing this additional context, the enhanced LLM demonstrates its capability to answer questions based on information that may not have been present in its training data or is private. This ability to leverage external knowledge and context allows the LLM to handle a broader range of queries and address information gaps. It is a powerful tool for tasks that require accessing and generating responses based on external or private information.

Conclusion

Large Language Models have emerged as powerful tools in modern NLP, revolutionizing the way we approach and solve complex language-related problems. These models offer out-of-the-box solutions for various tasks, providing quick and accurate responses with their inherent language understanding capabilities. Furthermore, LLMs have the flexibility to tackle more sophisticated and nuanced challenges through fine-tuning and leveraging contextual embeddings. Their ability to generalize from limited examples, incorporate external knowledge, and adapt to specific domains makes them invaluable in diverse applications.

How to Speed up Drone Navigation System Development with Simulator

Ruben Melkonian — Fri, 14 Jul 2023 00:02:12 GMT

The Challenge: the call for drone autonomy

Drones have become increasingly significant across multiple industries, from agriculture to delivery services. One critical aspect of drone operations is their ability to fly autonomously, using AI for optical navigation; yet it demands vast data for training and testing to ensure that the AI system can adapt to diverse real-life scenarios.

To overcome these obstacles, we have embraced a simulator environment for both data collection and system training in various scenarios. This approach eliminates the need for flight permissions and offers full control over weather and lighting conditions, leading to quicker and more cost-effective development.

However, this method has also brought about new challenges that demand further research and adjustments. In this article, we will delve deeper into these emerging obstacles and examine potential solutions.

And that’s how we did it

For our task, we experimented with various simulators and ultimately chose AirSim. This open-source, cross-platform simulator not only provides a highly realistic representation of drone behavior in both physical and visual aspects but also offers crucial APIs for data retrieval and UAV control.

AirSim boasts a broad range of pre-existing drone models, faithfully replicating their equipped cameras, depth sensors, GPS modules, and other hardware configurations, enabling us to gather diverse and authentic imagery to train our algorithm effectively.

Another significant advantage of this tool is the extensive customization options available for the environment. By carefully fine-tuning weather and lighting conditions, we can recreate a wide array of natural scenarios within a finite timeframe, which would be otherwise unattainable in real-world settings. Moreover, this approach eliminates the need for physical travel, expensive data collection, and most potential risks associated with real-world testing.

Simulations offer the flexibility to fly in diverse environments, weather conditions, and scenarios

A Quick Guide to Drone Navigation

Data: the cornerstone of success

Amassing extensive data plays a huge role in the development of drone autonomy. Using a simulation for this purpose offers some fundamental advantages:

It’s cost-effective compared to relying solely on physical drones. The expenses associated with building, maintaining, repairing, and replacing physical hardware can be significantly reduced or eliminated with simulators.
Simulations offer a controlled and safe environment to explore different scenarios without the risk of equipment damage or harm, making them highly beneficial for training. Users can practice flying drones and gain valuable experience without risks.
Physical drones face limitations regarding flight time, weather conditions, or restricted airspace. Simulations offer the flexibility to fly in diverse environments, weather conditions, and scenarios that might otherwise be inaccessible or unfeasible in the real world. It widens the scope of experimentation and expands the possibilities for data collection.
Drone simulations allow for thorough testing of various scenarios, facilitating the evaluation and improvement of drone performance. Simulating responses to different weather conditions, obstacles, or complex flight paths can be challenging or risky to replicate in the physical world. Simulations provide a controlled setting to conduct such tests.
Simulations gather extensive and precise flight parameters, including altitude, speed, acceleration, and control inputs, enabling more comprehensive research and development. This data is invaluable for analysis, performance evaluation, and enhancing drone designs and flight algorithms.

Artificial Intelligence: Training for Excellence

Drone navigation relies on visual data to perceive and navigate the surroundings effectively. By training AI models to recognize objects, landmarks, and environmental features, drones can navigate safely and avoid collisions. Training enables the creation of detailed maps, accurate real-time localization, and intelligent decision-making for efficient path planning and drone navigation during missions.

Model training also facilitates adaptation to changing conditions. By exposing the model to diverse lighting conditions, weather variations, and different terrains during training, the AI learns to generalize and perform robustly in unpredictable scenarios.

Finally, continuous algorithm training is a crucial task for improving and optimizing the optical navigation module. As the AI collects more data and receives feedback from real-world operations, the system should be regularly retrained to enhance performance, improve object recognition, and incorporate new features. This iterative training process ensures that the drone’s optical navigation continually evolves, maintaining high efficiency and effectiveness in various environments.

Test flights: fine-tuning drone performance

Drone flight simulator environments provide a virtual setting for training, evaluating, and fine-tuning drone performance, allowing for comprehensive testing and analysis.

The environment accurately models the physics and dynamics of the drone, including flight controls, propulsion systems, and aerodynamics, to ensure that the drone’s behavior closely resembles that of a physical drone, providing a realistic flight experience.

In the simulator, we can test different flight behaviors, such as takeoffs, landings, and aerial maneuvers under various conditions, to improve the system in a safe and controlled environment. We also can assess how the drone handles obstacles, wind gusts, and other challenging factors, identifying areas for further optimization.

AirSim provides extensive flight data, such as altitude, speed, trajectory, and sensor readings, that let us assess the drone’s performance, identify patterns or anomalies, and make data-driven decisions for enhancing flight algorithms and optimizing drone operations.

New challenges

The gap between virtual and real-world data

The limitations and lack of control over physical conditions make the real world a suboptimal data source. In contrast, a virtual environment provides greater flexibility to manipulate and execute numerous scenarios, including modifying weather and lighting conditions, experimenting with different terrain types, and introducing unexpected obstacles that need to be avoided.

While AirSim may not perfectly replicate the real world, there are strategies to address this challenge. Data augmentation techniques can be employed, incorporating satellite imagery and utilizing transfer learning to align simulator data with real-world geographic information. This involves comparing simulated sensor readings with accurate ground truth data derived from satellite imagery.

We can achieve more precise and robust training and testing by creating a virtual environment closely resembling real-world geography. Integrating satellite imagery into the simulation with further transfer of acquired knowledge back to the physical world results in significant cost and time savings, facilitating the development of AI optical navigation.

Other issues and how to deal with them

Sensor Fidelity. Discrepancies between simulated and real-world sensors can be addressed by adjusting sensor parameters in the simulator to closely match real-world characteristics. Additionally, you can try sensor calibration to align virtual outputs with physical sensor data.
Environment Differences. A simulator environment can be customized to closely resemble natural conditions: lighting, wind, and other dynamic elements. Data augmentation techniques will help introduce variability and simulate real-world scenarios, and integration of real-world data, such as satellite imagery or point cloud maps, into the simulator will add accuracy
Motion and Dynamics. Fine-tune the drone dynamics model in the simulator using real-world flight data to improve alignment with real-world drones’ flight dynamics and control responses. Employ advanced control algorithms that adapt to the differences between simulated and real-world dynamics. Leverage machine learning techniques to learn the mapping between the simulator and real-world drone behaviors.
Data Collection. Overcome challenges related to obtaining real-world drone data by collecting limited real-world data and augmenting it with simulated data to create a more diverse dataset. Utilize transfer learning techniques to leverage pre-trained models on similar tasks or domains to overcome limitations in real-world data availability.

Conclusion

Using a simulator proves to be a practical solution in developing and testing AI-based optical navigation systems. Simulators offer a controlled and repeatable environment, eliminating the costs and logistical challenges associated with real drone flights.

Simulators excel at creating highly realistic 3D models of the drone’s surroundings, enabling the AI system to undergo training and navigation exercises in a virtual world that closely mirrors reality. Through simulations, an array of scenarios can be generated, encompassing diverse weather conditions, lighting variations, and obstacles to empower the AI system to learn and adapt to various situations.

However, it is crucial to accept that simulators may not perfectly replicate the real-world environment, and there may be disparities between the data collected in a simulator and that obtained from a physical drone.

Drop us a line if you want to leverage the possibilities of simulators for drone navigation systems.

Subscribe to DDIntel Here.

Visit our website here: https://www.datadriveninvestor.com

Join our network here: https://datadriveninvestor.com/collaborate

How to Speed up Drone Navigation System Development with Simulator was originally published in DataDrivenInvestor on Medium, where people are continuing the conversation by highlighting and responding to this story.