Top LinkedIn Content on Mobile User Experience

Practical insights for better UX • Running “Measure UX” and “Design Patterns For AI” • Founder of SmashingMag • Speaker • Loves writing, checklists and running workshops on UX. 🍣

222,847 followers 1y

🤖 How To Design Better AI Experiences. With practical guidelines on how to add AI when it can help users, and avoid it when it doesn’t ↓ Many articles discuss AI capabilities, yet most of the time the issue is that these capabilities either feel like a patch for a broken experience, or they don't meet user needs at all. Good AI experiences start like every good digital product by understanding user needs first. 🚫 AI isn’t helpful if it doesn’t match existing user needs. 🤔 AI chatbots are slow, often expose underlying UX debt. ✅ First, we revisit key user journeys for key user segments. ✅ We examine slowdowns, pain points, repetition, errors. ✅ We track accuracy, failure rates, frustrations, drop-offs. ✅ We also study critical success moments that users rely on. ✅ Next, we ideate how AI features can support these needs. ↳ e.g. Estimate, Compare, Discover, Identify, Generate, Act. ✅ Bring data scientists, engineers, PMs to review/prioritize. 🤔 High accuracy > 90% is hard to achieve and rarely viable. ✅ Design input UX, output UX, refinement UX, failure UX. ✅ Add prompt presets/templates to speed up interaction. ✅ Embed new AI features into existing workflows/journeys. ✅ Pre-test if customers understand and use new features. ✅ Test accuracy + success rates for users (before/after). As designers, we often set unrealistic expectations of what AI can deliver. AI can’t magically resolve accumulated UX debt or fix broken information architecture. If anything, it visibly amplifies existing inconsistencies, fragile user flows and poor metadata. Many AI features that we envision simply can’t be built as they require near-perfect AI performance to be useful in real-world scenarios. AI can’t be as reliable as software usually should be, so most AI products don’t make it to the market. They solve the wrong problem, and do so unreliably. As a result, AI features often feel like a crutch for an utterly broken product. AI chatbots impose the burden of properly articulating intent and refining queries to end customers. And we often focus so much on AI that we almost intentionally avoid much-needed human review out of the loop. Good AI-products start by understanding user needs, and sparkling a bit of AI where it helps people — recover from errors, reduce repetition, avoid mistakes, auto-correct imported files, auto-fill data, find insights. AI features shouldn’t feel disconnected from the actual user flow. Perhaps the best AI in 2025 is “quiet” — without any sparkles or chatbots. It just sits behind a humble button or runs in the background, doing the tedious job that users had to slowly do in the past. It shines when it fixes actual problems that it has, not when it screams for attention that it doesn’t deserve. Useful resources: AI Design Patterns, by Emily Campbell https://www.shapeof.ai AI Product-Market-Fit Gap, by Arvind Narayanan, Sayash Kapoor https://lnkd.in/duEja695 [continues in comments ↓]

28 Comments

Aishwarya Srinivasan

615,339 followers 10mo

Here is why leaderboards can fool you (and what to do instead) 👇 Benchmarks are macro averages, and your application is a micro reality. A model that’s top-3 on MMLU or GSM-Plus might still bomb when asked to summarize legal contracts, extract SKUs from receipts, or answer domain-specific FAQs. That’s because: 👉 Benchmarks skew toward academic tasks and short-form inputs. Most prod systems run multi-turn, tool-calling, or retrieval workflows the benchmark never sees. 👉 Scores are single-shot snapshots. They don’t cover latency, cost, or robustness to adversarial prompts. 👉 The “average of many tasks” hides mode failures. A 2-point gain in translation might mask a 20-point drop in structured JSON extraction. In short, public leaderboards tell you which model is good in general, not which model is good for you . 𝗕𝘂𝗶𝗹𝗱 𝗲𝘃𝗮𝗹𝘀 𝘁𝗵𝗮𝘁 𝗺𝗶𝗿𝗿𝗼𝗿 𝘆𝗼𝘂𝗿 𝘀𝘁𝗮𝗰𝗸 1️⃣ Trace the user journey. Map the critical steps (retrieve, route, generate, format). 2️⃣ Define success per step. Example metrics: → Retrieval → document relevance (binary). → Generation → faithfulness (factual / hallucinated). → Function calls → tool-choice accuracy (correct / incorrect). 3️⃣ Craft a golden dataset. 20-100 edge-case examples that stress real parameters (long docs, unicode, tricky entities). 4️⃣ Pick a cheap, categorical judge. “Correct/Incorrect” beats 1-5 scores for clarity and stability 5️⃣ Automate in CI/CD and prod. Gate PRs on offline evals; stream online evals for drift detection. 6️⃣ Iterate relentlessly. False negatives become new test rows; evaluator templates get tightened; costs drop as you fine-tune a smaller judge. When you evaluate the system, not just the model, you’ll know exactly which upgrade, prompt tweak, or retrieval change pushes the real-world metric that matters: user success. How are you’re tailoring evals for your own LLM pipeline? Always up to swap notes on use-case-driven benchmarking Image Courtesy: Arize AI ---------- Share this with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI insights and resources!

25 Comments

Kuldeep Singh Sidhu

Senior Data Scientist @ Walmart | BITS Pilani

15,293 followers 3mo

How Reliable Are Your Offline Recommender System Tests? New Research Reveals Critical Biases Offline evaluation remains the dominant approach for benchmarking recommender systems, but researchers from Universidade Federal de Minas Gerais and University of Gothenburg have exposed fundamental reliability issues in how we sample data for these evaluations. The core problem: users only interact with items they're shown (exposure bias), and evaluations typically use only a sampled subset of items rather than full catalogs (sampling bias). These compounding biases can severely distort which models appear to perform best. The Framework The research introduces a systematic evaluation across four dimensions: - Resolution: can the sampler distinguish between competing models? - Fidelity: does sampling preserve full evaluation rankings? - Robustness: do results remain stable under different exposure conditions? - Predictive power: do biased samples recover ground-truth preferences? Key Technical Findings Using the KuaiRec dataset with complete user-item preferences, the team simulated multiple exposure policies (uniform, popularity-biased, positivity-biased) at varying sparsity levels (0-95%), then tested nine sampling strategies including uniform random, popularity-weighted, positivity-weighted, and propensity-corrected approaches like WTD and Skew. The results challenge conventional wisdom. Larger sample sizes don't guarantee better evaluation-what matters is which- items get sampled. Under high sparsity (90-95%), many samplers produce excessive tie rates between models, losing discriminative power. Bias-aware strategies like WTD, WTDH, and Skew consistently outperformed naive baselines, maintaining stronger alignment with ground truth even under severe data constraints. Perhaps most striking: even the "Exposed" sampler (using all logged items) showed degradation under biased logging, while carefully designed smaller samples often proved more reliable. Practical Implications For practitioners: your choice of negative sampling strategy fundamentally impacts which models you'll select. The research suggests prioritizing methods that account for exposure patterns, particularly in sparse data regimes. The paper's code and complete experimental framework are publicly available, enabling teams to audit their own evaluation pipelines.

Magdalena Picariello

ROI from GenAI in 3-6 Months | ex-IBM, Lecturer

9,575 followers 9mo

In the last 90 days I spoke to 12 CXO. They all said one thing: GenAI doesn't deliver business value. The reason? It’s not because of model choice. Not because of bad prompts. But because they skip the most important part: LLM evaluation This is why evals matter. In one Datali project, testing took us from 60% to 92% accuracy. Not by luck and blind trying. But by building a rigorous, automated testing pipeline. Here’s the boring but harsh truth: You don’t write a perfect system prompt and test it. You write tests first and discover prompts that pass them. This what you get: 1// You gain crystal clear visibility - the perfect picture of what works and what doesn’t. You see how your system behaves across real-world inputs. You know where failures happen and why. You can plan risk mitigation strategies early 2// You iterate faster. Once you're testing thoroughly, you can run more experiments, track their results and revisit what worked best. Even months later. You catch problems early. You refine prompts, add data or fine-tune with confidence. You iterate faster from PoC → MVP → production, adjusting to user feedback without guesswork. 3// You build better products in less time. The better means here: Higher accuracy → less hallucination, better task handling. More stability → no surprises in production, fewer user complaints. 4// You reach the desired business impact: ROI, KPIs and cost savings. This is the combined result of previous actions. They drive your KPIs. If your system is accurate, stable and aligned to the user’s goals - that’s everything you need. Shorter development cycles = faster time to market Fewer bugs = lower support costs Focused iterations = less wasted dev time It’s priceless. But you can get it only with the right approach.

117 Comments

Jack R.

CX Designer at Rondesignlab, Co-Founder at Rondesignlab

12,294 followers 1mo

This image explains UX better than most presentations ever will. In the park, the city designed a clean, “correct” path. But people still walk across the grass. Why? Because users always choose the easiest and most convenient way, not the one we expect them to follow. That worn-out lawn is a perfect metaphor for digital products. When there is no qualified UX partner involved early, teams design based on assumptions. As a result, users bypass the funnel, take unnecessary steps, or leave the product completely. Every “trampled lawn” in a product is lost money. I once worked with a US-based e-commerce app. From the team’s point of view, the checkout flow followed all the best practices. It looked clean and logical. But users had to re-enter delivery details after adding items to the cart. No bugs. No crashes. Just friction. The result? Cart abandonment was over 70%. After we restructured the user flow, not by changing colors or buttons but by fixing the logic of user behavior, conversion increased without any extra marketing spend. We never tell our clients to ignore UX best practices. But the truth is, even best practices do not always work. UX should be based on real user needs, not on rules, trends, or textbooks. What works perfectly in one product can completely fail in another. UX is not about making things nice or modern. UX is about solving a user’s problem in the simplest possible way. If a product makes users think too much, adds extra steps, or goes against natural behavior, users will stop using it and they will not pay for it. Studies show that up to 88% of users do not return to a website or app after a poor user experience. Not because the product is bad, but because it is inconvenient. Good UX protects revenue. Good architecture saves budget. Great products do not fight human behavior, they work with it. If this path led where people actually needed to go, the grass would still be green.

40 Comments

Mohsen Rafiei, Ph.D.

UXR Lead (PUXLab)

11,445 followers 2mo

To me, UX is nothing more and nothing less than the psychology of people interacting with their environment, and If you remove the psychology layer from UX, you risk optimizing the wrong problems, and potentially losing millions of dollars in the process. Let me explain. I recently tried the Apple Vision Pro and returned it after a few days, not because of any shortcomings in usability or interaction design. At the feature level, the UX is impressive. The interactions are polished, the visuals are refined, and the system reflects rigorous design work, testing, and iteration. This is not a critique of UX competence. If anything, it highlights a different issue: even with world-class UX teams, it is possible to spend enormous effort optimizing the wrong layer of the problem. What this experience surfaced for me is that rigorous UX studies and experiments, while essential, are not enough on their own. Without correctly framing the underlying behavioral and psychological questions, teams risk perfecting low-level usability while missing the bigger picture of human behavior. In my case, the issue was not how well the product worked. It was understanding when and why I should use it instead of my phone or laptop. That question sits above interaction design. It is a behavioral question. People naturally compare new tools to what already fits their habits. When two products serve similar functions, users default to the one that requires less effort, less attention, and less cognitive commitment. That is not a design flaw. It is a psychological pattern. This is where deep psychological understanding matters. Without it, UX risks becoming a process of refinement rather than relevance. You can reduce weight, improve comfort, refine gestures, and still fail to change behavior if you are solving a problem users do not experience as distinct or urgent. Usability studies are excellent at answering whether something works. Psychology helps answer whether it should exist in a user’s daily life at all. This is why UX cannot be reduced to interfaces, screens, or experiments alone. UX is ultimately the psychology of people interacting with their environment. It requires understanding attention, habit formation, motivation, effort tolerance, and context. Without that foundation, even excellent UX teams can unintentionally optimize details while overlooking the behavioral reality users live in. The Apple Vision Pro is just one example of this broader pattern. Strong feature-level UX paired with weak behavioral framing leads to confusion, not adoption. When psychology informs strategy and strategy guides usability, UX stops being about polishing interactions and starts being about shaping meaningful behavior. P.S. This feels less like a failure of execution and more like a reminder that understanding human behavior must come before deciding what to optimize.

17 Comments

Anshul Mamgain🇮🇳

Founder & CEO at StarNext 🇮🇳

14,143 followers 1mo

Every real user interaction rewrites the script developers imagined. That’s why bugs appear even after thorough testing: Users behave unpredictably and break assumed input flows Test cases miss edge scenarios; coverage is never perfect Different devices, OS versions, and environments create new issues Developers design for ideal usage; reality is far from ideal Timing and concurrency problems surface only under real load Third-party APIs behave differently in production Real-world data is messy, unlike clean test data Automation tests what it’s told — not what users invent Manual QA often catches what CI/CD pipelines miss Users uncover bugs through creative, unexpected usage How to reduce this gap: Add real-world exploratory testing (QA or crowd testing) Track real user behavior with analytics and error monitoring Expand coverage using fuzz testing, load testing, and edge cases Test across multiple devices and environments Collect user feedback early via beta releases and feature flags Write defensive code that gracefully handles bad inputs Bottom line: Great software isn’t built only by writing better code — it’s built by respecting how unpredictably humans use it. Agree💯

4 Comments

Bahareh Jozranjbar, PhD

UX Researcher at PUX Lab | Human-AI Interaction Researcher at UALR

9,253 followers 2mo

Most conversations about AI alignment still focus on capability. Can the model answer correctly. Can it solve hard problems. Can it reason. That framing made sense when AI systems were tools used in narrow, supervised settings. It breaks down once these systems are deployed into messy, real environments where incentives conflict, supervision is partial, and pressure is constant. Behavioral evaluation of AI alignment starts from a different assumption. Alignment is not something you infer from benchmark scores or training techniques. It is something you observe in behavior. The core question is not what the model knows, but what it does when the situation is ambiguous, socially charged, or strategically tempting. This is why behavioral evaluation treats AI systems like black box agents rather than static software. Instead of testing them once on clean datasets, we probe them repeatedly under stress. We look at whether a model tells the truth or agrees with a confident but wrong user. Whether it holds its ground when challenged by fake authority. Whether it behaves differently when it believes it is being monitored. Whether harmful actions emerge even when no one explicitly asks for them. Whether it exploits loopholes in reward structures rather than following the intent of the task. What makes this approach important is that many alignment failures are not visible in standard evaluations. A model can pass safety benchmarks and still flatter users, fake alignment, reward hack, or shift behavior across contexts. These failures only appear when you test behavior under realistic conditions, not when you ask the model to recite policies or perform well defined tasks. In that sense, behavioral evaluation is closer to psychology than to traditional software testing. It is about measuring tendencies, tradeoffs, and decision patterns over time. Alignment is not a checkbox you pass once. It is an empirical property that must be continuously tested as models scale, gain autonomy, and interact with humans and other agents.

6 Comments

Sairam Sundaresan

AI Engineering Leader | Author of AI for the Rest of Us | I help engineers land AI roles and companies build valuable products

113,771 followers 5mo

99% accurate, 0% useful. Your AI passes all tests and silently fails in production Here's how metrics can hide the truth: 🔸 A fraud detection model shows 99% accuracy. In reality, fraud only happens in 1% of cases. The model just predicts "not fraud" every time. Looks great on paper, but catches zero fraud. 🔸 A recommendation system with a high hit rate. Shows users what they already clicked. Drives engagement, kills discovery and growth. 🔸 A chatbot with low perplexity. Sounds fluent. Still hallucinates facts because the metric only measures smoothness, not truth. AI teams are moving fast. But without fluency in model evaluation, leaders are stuck making blind calls, wasting budgets, and missing red flags. Here are 40 Must Know Eval Terms Every Leader Must Know: 📌 Core Classification Metrics 👉 Accuracy ↳ % predictions correct - looks nice but lies in imbalanced data. 👉 Precision ↳ Of the flagged cases, how many were truly positive (fewer false alarms). 👉 Recall (Sensitivity) ↳ Of actual positives, how many did we catch (fewer misses). 👉 F1 Score ↳ The sweet spot between precision and recall. 👉 Confusion Matrix ↳ A table of wins, misses, and who they happened to. 📌 Regression & Forecasting 👉 MAE ↳ Average error size - simple, understandable. 👉 RMSE ↳ Penalizes big mistakes more harshly. 👉 R² ↳ How well the model explains variation. 👉 MAPE ↳ Average % error - great for comparing models. 👉 Baseline Model ↳ A dumb model to beat - like guessing the average. 📌 Ranking & Recommendations 👉 Recall@k ↳ Are the right answers showing up in the top-k? 👉 Hit Rate ↳ Did at least one good suggestion make it through? 👉 NDCG ↳ Prioritizes getting the best stuff at the top. 👉 Coverage ↳ % of the catalog that's ever recommended. 👉 Diversity ↳ Encourages varied recommendations, not just repeats. 📌 LLMs & Text Generation 👉 BLEU / ROUGE ↳ How well output overlaps with the reference. 👉 BERTScore ↳ Measures meaning-match, not just words. 👉 Perplexity ↳ How predictable outputs are (lower is better). 👉 Human Evaluation ↳ People grading fluency, correctness, and relevance. 👉 Toxicity / Safety Rate ↳ % of outputs that cross safety lines. 📌 Retrieval-Augmented Generation (RAG) 👉 Retrieval Recall ↳ % of useful documents found. 👉 Retrieval Precision ↳ % of retrieved docs that actually helped. 👉 Groundedness ↳ Did the model stick to the facts? 👉 Hallucination Rate ↳ % of answers that make stuff up. 👉 Answer Relevance ↳ Does the response directly answer the user? In the AI era, metrics are your dashboard. Without them, you're driving blind. Start leading with confidence. Stop nodding at metrics that might be fooling you and your team. (Remaining terms continued in the comments) ♻️ Repost to help other leaders decode model evaluation. ➕ Follow me, Sairam, for AI engineering that makes sense. --- Want to go deeper? Gradient Ascent breaks it down weekly. 21k+ leaders from 154 countries already tune in.

78 Comments

Bryan Zmijewski

ZURB Founder & CEO. Helping 2,500+ teams make design work.

12,728 followers 1y

Look at what they do, not just what they say. User behavior is how users interact with and use software. It includes things like: → how people navigate the interface → which features people use most often → the order in which people perform tasks → how much time people spend on activities → how people react to prompts or feedback Product managers and designers must understand these behaviors. Analyzing user behavior can enhance the user experience, simplify processes, spot issues, and make the software more effective. Discovering the "why" behind user actions is the key to creating great software. In many of my sales discussions with teams, I notice that most rely too heavily on interviews to understand user problems. While interviews are a good starting point, they only cover half of the picture. What’s the benefit of going beyond interviews? → See actual user behavior, not just reported actions → Gain insights into unspoken needs in natural settings → Minimize behavior changes by observing discreetly → Capture genuine interactions for better data → Document detailed behaviors and interactions → Understand the full user journey and hidden pain points → Discover issues and opportunities users miss → Identify outside impacts on user behavior Most people don't think in a hyper-rational way—they're just trying to fit in. That's why when we built Helio, we included task-based activities to learn from users' actions and then provided follow-up questions about their thoughts and feelings. User behaviors aren't always rational. Several factors contribute to this: Cognitive Biases ↳ Users rely on mental shortcuts, often sticking to familiar but inefficient methods. Emotional Influence ↳ Emotions like stress or frustration can lead to hasty or illogical decisions. Habits and Routine ↳ Established habits may cause users to overlook better options or new features. Lack of Understanding ↳ Users may make choices based on limited knowledge, leading to seemingly irrational actions. Contextual Factors ↳ External factors like time pressure or distractions can impact user behavior. Social Influence ↳ Peer pressure or the desire to conform can also drive irrational choices. Observing user behavior, especially in large sample sizes, helps designers see how people naturally use products. This method gives a clearer and more accurate view of user behavior, uncovering hidden needs and issues that might not surface in interviews. #productdesign #productdiscovery #userresearch #uxresearch

8 Comments

LinkedIn respects your privacy

Mobile User Experience

Explore categories

Mobile User Experience

More in Mobile User Experience

More User Experience topics

Explore categories