Say you are building a news aggregator (like Google News). One of the biggest problems you'll face is de-duplicating articles across millions of documents. Naive O(n^2) comparisons will crush you at scale. MinHash + LSH is how you actually solve it. MinHash converts a large set into a small, fixed-size signature, such that the similarity between two signatures approximates the Jaccard similarity of the original sets. Jaccard similarity is simply set intersection divided by set union; a measure of how much two sets overlap. It is a fast, probabilistic way to estimate "how alike are these two documents?" without comparing them word by word. The first step is shingling, where you break each document into overlapping n-grams (say, 3-word sequences), and then run MinHash on that shingle set. MinHash gives you a compact signature, typically 100-200 hash values. The key property is that the probability that two signatures share the same minimum hash value equals the Jaccard similarity of their original shingle sets. This way, you estimate similarity without ever comparing raw text. But you still have the comparison problem. Even with compact signatures, comparing every pair is expensive. That's where LSH (Locality Sensitive Hashing) comes in. You split each signature into b bands of r rows each, and hash each band into a bucket. Two documents that are similar enough will likely land in the same bucket for at least one band, and only those candidate pairs get compared. This approach collapses billions of comparisons down to millions, and it is what systems like Google News and early web crawlers used to deduplicate content at scale. Several Google papers and engineering blogs from the early 2000s reference this exact approach. Pretty simple and neat. As is almost always true at scale, you do not need a perfect similarity detection system. A fast, good-enough one is preferred, given that the cost is ultimately the forcing function.
Productivity
Explore top LinkedIn content from expert professionals.
-
-
𝗗𝗮𝘁𝗮 𝗴𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲 𝗶𝘀 𝗼𝗻𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗺𝗼𝘀𝘁 𝗺𝗶𝘀𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗼𝗼𝗱 𝘁𝗼𝗽𝗶𝗰𝘀 𝗶𝗻 𝗲𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲. Because most people explain it from the inside out: policies, councils, standards, stewardship. But the business does not buy any of that. The business buys outcomes: → trustworthy KPIs → vendor and partner data you can actually use → faster financial close → fewer reporting escalations → smoother M&A integration → AI you can deploy without creating risk debt Most AI programs fail for boring reasons: nobody owns the data, quality is unknown, access is messy, accountability is missing. 𝗦𝗼 𝗹𝗲𝘁’𝘀 𝘀𝗶𝗺𝗽𝗹𝗶𝗳𝘆 𝗶𝘁. 𝗗𝗮𝘁𝗮 𝗴𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲 𝗶𝘀 𝗳𝗼𝘂𝗿 𝘁𝗵𝗶𝗻𝗴𝘀: → ownership → quality → access → accountability 𝗔𝗻𝗱 𝗶𝘁 𝗯𝗲𝗰𝗼𝗺𝗲𝘀 𝘃𝗲𝗿𝘆 𝗽𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝘄𝗵𝗲𝗻 𝘆𝗼𝘂 𝘁𝗵𝗶𝗻𝗸 𝗶𝗻 𝟰 𝗹𝗮𝘆𝗲𝗿𝘀: 1. Data Products (what the business consumes) → a named dataset with an owner and SLA → clear definitions + metric logic → documented inputs/outputs and intended use → discoverable in a catalog → versioned so changes don’t break reporting 2. Data Management (how products stay reliable) → quality rules + monitoring (freshness, completeness, accuracy) → lineage (where it came from, where it’s used) → master/reference data alignment → metadata management (business + technical) → access controls and retention rules 3. Data Governance (who decides, who is accountable) → data ownership model (domain owners, stewards) → decision rights: who can change KPI definitions, thresholds, and sources → issue management: triage, escalation paths, resolution SLAs → policy enforcement: what’s mandatory vs optional → risk and compliance alignment (auditability, approvals) 4. Data Operating Model (how you scale across the enterprise) → domain-based setup (data mesh or not, but clear domains) → operating cadence: weekly issue review, monthly KPI governance, quarterly standards → stewardship at scale (roles, capacity, incentives) → cross-domain decision-making for shared metrics → enablement: templates, playbooks, tooling support If you want to start fast: Pick the 10 metrics that run the business. Assign an owner. Define decision rights + escalation. Then build the data products around them. ↓ 𝗜𝗳 𝘆𝗼𝘂 𝘄𝗮𝗻𝘁 𝘁𝗼 𝘀𝘁𝗮𝘆 𝗮𝗵𝗲𝗮𝗱 𝗮𝘀 𝗔𝗜 𝗿𝗲𝘀𝗵𝗮𝗽𝗲𝘀 𝘄𝗼𝗿𝗸 𝗮𝗻𝗱 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀, 𝘆𝗼𝘂 𝘄𝗶𝗹𝗹 𝗴𝗲𝘁 𝗮 𝗹𝗼𝘁 𝗼𝗳 𝘃𝗮𝗹𝘂𝗲 𝗳𝗿𝗼𝗺 𝗺𝘆 𝗳𝗿𝗲𝗲 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://lnkd.in/dbf74Y9E
-
You’re not burned out—you’re just taking breaks the wrong way. Here’s how to fix it, based on science. Want to perform better? Take better breaks. Breaks today are where sleep was 15 years ago—underrated and misunderstood. But how you take a break matters. Most people think more work = more productivity. But research shows that strategic breaks are the real key to staying sharp. The problem? Most of us take breaks that don’t actually help. Scrolling alone at your desk? Not it. Here’s how to take a break that actually works: Move, don’t sit – Walk, stretch, or get outside instead of staying glued to your chair. Movement resets your brain. Go outside, not inside – Fresh air and sunlight restore energy and boost creativity. Be social, not solo – Breaks are more effective when taken with someone else. Fully unplug – Leave your phone. No work talk. No emails. No scrolling. Just a real reset. Try this: Take a 10-minute walk outside with a colleague. Talk about anything but work. Leave your phone at your desk. Watch how much better you feel—and perform. Breaks aren’t a luxury. They’re a performance tool. Treat them like it. Got a break routine that works for you? Drop it below Or send this to someone who needs a real break.
-
I get asked about tools (drugs) for focus all the time. Remember: You can train focus. It’s like a workout. Set a timer for two to three hours. Force yourself to work the entire time. Every time you skip to something else, add 10 minutes. One bathroom break allowed. Next time is easier. People hate this answer, but it’s the only nonpharmacologic way I know to build focus as a skill. Quit seeking perfect conditions, internally and externally. The mental friction means you’re getting better. Don’t forget that. Some people will call this masochistic, but honestly, that’s a weak excuse. Unless you love doing something, it’s going to be hard to focus. But there’s so much power in learning to do it anyway. This skill builds fast. Unfortunately, it also degrades fast. In the world of immense distraction we live in, it takes more and more effort to recover this skill. The payoff gets bigger and bigger, however. Most people are drifting into the noise. Don’t be one of them.
-
Atlassian has been fully distributed for almost five years. We don’t have all the answers, but we’ve learned a lot about how to keep teams thriving across time zones—and we’re applying those insights every day. ➡️ Asynchronous work: Async tools are at the core of how we operate. Confluence is our virtual hub where we share stories, celebrate new hires, and collaborate effortlessly. We also use Loom to share videos and give feedback on our own time—avoiding those dreaded “this could have been an email” moments. In fact, we’ve saved nearly half a million meetings using Loom! ➡️ Designing workdays: We’ve learned to structure workdays for focus, collaboration, and meetings (only when absolutely necessary). Teams work across no more than two time zones, ensuring at least four hours of overlap to get things done together. ➡️ Intentional connection: Data shows that real connection happens when teams meet regularly—not sporadically in an office. We provide Intentional Togetherness Gatherings (ITGs), curated experiences, and focused in-person time to collaborate. ➡️ Adapting for different needs: It’s not one-size-fits-all. For example, new hires and grads often benefit from more frequent in-person meetups, so we make sure to offer opportunities for them to connect early on. https://lnkd.in/g2sSbe3v
✂️ Loom
youtube.com
-
🔎 𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝗶𝗻𝘀𝗶𝗱𝗲 𝗮𝗻 𝗮𝗰𝘁𝘂𝗮𝗹 AMD 𝗰𝗵𝗶𝗽! 😲 Here's a bit of a Ryzen processor made on TSMC's 7-nanometer node. You can see the web of interconnects, the metal wires that connect the transistors (that bottom layer) on a chip to harness their computing power. The image was taken with a new 𝗽𝘁𝘆𝗰𝗵𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰 𝗫-𝗿𝗮𝘆 𝗹𝗮𝗺𝗶𝗻𝗼𝗴𝗿𝗮𝗽𝗵𝘆 (𝗣𝘆𝗫𝗟) technique out of the PSI Paul Scherrer Institut, University of Southern California and ETH Zürich. The technique currently has 4 nanometer resolution and the scientists have a path to get to 1 nm resolution. The cool thing about this technology is its non-destructive imaging power to help find defects in chips. Today’s chips are so complicated that electrical tests alone can no longer pinpoint where a defect is: chipmakers use a mix of optical imaging and other methods to zero in on potential problem areas. They then image such areas with a slow but very high-resolution scanning electron microscope. Finally they might take a slice of a chip for further imaging with a transmission electron microscope (TEM). When they find the flaw, they can then go back and correct their design. But with PyXL, they have another tool to pinpoint defects without destroying the chip. ✨
-
Either you control it, or it will control you! Our bodies and minds have limits, and ignoring the need for rest can lead to significant consequences. When we push ourselves too hard without taking regular breaks, we risk burnout, decreased productivity, and health problems. This forced downtime often occurs at the worst possible moments, disrupting our personal and professional lives. So, please: Schedule Regular Breaks: Integrate short breaks into your daily routine. For example, use the Pomodoro Technique—work for 25 minutes, then take a 5-minute break. After four cycles, take a longer break of 15-30 minutes. Prioritise Sleep: Ensure you get 7-9 hours of sleep each night. Good sleep hygiene, such as a regular bedtime and limiting screen time before bed, can improve sleep quality. Take Vacations: Plan and take regular vacations to recharge. Even short getaways can significantly impact your mental and physical health. Listen to Your Body: Pay attention to signs of fatigue, stress, and burnout. If you feel overwhelmed, take a step back and rest, even if it's just for a few hours. Incorporate Wellness Activities: Engage in activities that promote relaxation and well-being, such as exercise, meditation, hobbies, or spending time in nature. Set Boundaries: Learn to say no and set boundaries to protect your time and energy. Avoid overcommitting and ensure you have time for rest and recovery. By proactively scheduling breaks and prioritising self-care, you can maintain your health, enhance productivity, and avoid inconvenient and disruptive forced breaks.
-
💎 Accessibility For Designers Checklist (PDF: https://lnkd.in/e9Z2G2kF), a practical set of cards on WCAG accessibility guidelines, from accessible color, typography, animations, media, layout and development — to kick-off accessibility conversations early on. Kindly put together by Geri Reid. WCAG for Designers Checklist, by Geri Reid Article: https://lnkd.in/ef8-Yy9E PDF: https://lnkd.in/e9Z2G2kF WCAG 2.2 Guidelines: https://lnkd.in/eYmzrNh7 Accessibility isn’t about compliance. It’s not about ticking off checkboxes. And it’s not about plugging in accessibility overlays or AI engines either. It’s about *designing* with a wide range of people in mind — from the very start, independent of their skills and preferences. In my experience, the most impactful way to embed accessibility in your work is to bring a handful of people with different needs early into design process and usability testing. It’s making these test sessions accessible to the entire team, and showing real impact of design and code on real people using a real product. Teams usually don’t get time to work on features which don’t have a clear business case. But no manager really wants to be seen publicly ignoring their prospect customers. Visualize accessibility to everyone on the team and try to make an argument about potential reach and potential income. Don’t ask for big commitments: embed accessibility in your work by default. Account for accessibility needs in your estimates. Create accessibility tickets and flag accessibility issues. Don’t mistake smiling and nodding for support — establish timelines, roles, specifics, objectives. And most importantly: measure the impact of your work by repeatedly conducting accessibility testing with real people. Build a strong before/after case to show the change that the team has enabled and contributed to, and celebrate small and big accessibility wins. It might not sound like much, but it can start changing the culture faster than you think. Useful resources: Giving A Damn About Accessibility, by Sheri Byrne-Haber (disabled) https://lnkd.in/eCeFutuJ Accessibility For Designers: Where Do I Start?, by Stéphanie Walter https://lnkd.in/ecG5qASY Web Accessibility In Plain Language (Free Book), by Charlie Triplett https://lnkd.in/e2AMAwyt Building Accessibility Research Practices, by Maya Alvarado https://lnkd.in/eq_3zSPJ How To Build A Strong Case For Accessibility, ↳ https://lnkd.in/ehGivAdY, by 🦞 Todd Libby ↳ https://lnkd.in/eC4jehMX, by Yichan Wang #ux #accessibility
-
The silent productivity killer you've never heard of... Attention Residue (and 3 strategies to fight back): The concept of "attention residue" was first identified by University of Washington business professor Dr. Sophie Leroy in 2009. The idea is quite simple: There is a cognitive cost to shifting your attention from one task to another. When our attention is shifted, there is a "residue" that remains in the brain and impairs our cognitive performance on the new task. Put differently, you may think your attention has fully shifted to the next task, but your brain has a lag—it thinks otherwise! It's relatively easy to find examples of this effect in your own life: • You get on a call but are still thinking about the prior call. • An email pops up during meeting and derails your focus. • You check your phone during a lecture and can't refocus afterwards. There are two key points worth noting here: 1. The research indicates it doesn't seem to matter whether the task switch is "macro" (i.e. moving from one major task to the next) or "micro" (i.e. pausing one major task for a quick check on some minor task). 2. The challenge is even more pronounced in a remote/hybrid world, where we're free to roam the internet, have our chat apps open, and check our phones all while appearing to be focused in a Zoom meeting. With apologies to any self-proclaimed proficient multitaskers, the research is very clear: Every single time you call upon your brain to move away from one task and toward another, you are hurting its performance—your work quality and efficiency suffer. Author Cal Newport puts it well: "If, like most, you rarely go more than 10–15 minutes without a just check, you have effectively put yourself in a persistent state of self-imposed cognitive handicap." Here are three strategies to manage attention residue and fight back: 1. Focus Work Blocks: Block time on your calendar for sprints of focused energy. Set a timer for a 45-90 minute window, close everything except the task at hand, and focus on one thing. It works wonders. 2. Take a Breather: Whenever possible, create open windows of 5-15 minutes between higher value tasks. Schedule 25-minute calls. Block those windows on your calendar. During them, take a walk or close your eyes and breathe. 3. Batch Processing: You still have to reply to messages and emails. Pick a few windows during the day when you will deeply focus on the task of processing and replying to these. Your response quality will go up from this batching, and they won't bleed into the rest of your day. Attention residue is a silent killer of your work quality and efficiency. Understanding it—and taking the steps to fight back—will have an immediate positive impact on your work and life. If you enjoyed this or learned something, share it with others and follow me Sahil Bloom for more in future! The beautiful visualization is by Roberto Ferraro.
-
Do you feel guilty about taking time off? I used to spend weekends, trips, and lunch breaks (!!) terrified that I was falling behind. I had to constantly fight the compulsion to get back to my inbox. Now I remind myself: Your mental health is the foundation for your ability to do great work. We often think of vacations or breaks as rewards we need to earn. This is backward thinking. Your wellbeing is what allows you to achieve your goals. A successful career depends on you having rested enough to be creative, show up for others, and make good decisions. It sounds obvious but it bears repeating: When you fail to take the time you need to recharge, you set yourself up to fail.