Hawker-ID - AI Powered Street Vendor Identification System

Inspiration

Hawker-ID was born from observing a critical gap at the intersection of urban challenges and emerging technology in Southeast Asian cities, especially in Indonesia. Street hawkers are the lifeblood of urban economies (estimated to represent over 10% of the workforce in developing nations), yet they operate in an increasingly complex regulatory environment where cities struggle to balance commercial vitality with public order and safety.

In the bustling residential streets of Indonesia, the air is filled with a unique symphony (not just of traffic, but of commerce). Every day, mobile street vendors (also known as hawkers) roam the neighborhoods, bringing beloved local delicacies right to our doorsteps. From the savory warmth of Bakso (meatball soup) and Cuanki, to snacks like Takoyaki and Cilok, and traditional beverages like Bandrek and Bajigur, or even simple comforts like Jagung Rebus (steamed corn). What about in your city? Do you have similar unique sounds that define your street food culture?

In front of my own house alone, I witness at least 20 distinct hawkers passing by daily. However, amidst this vibrancy, there is a recurring inefficiency. I often see a scene that perfectly captures the problem: a young girl rushing out of her gate, shouting to call a hawker who has just passed by. The vendor, perhaps already 50 meters down the road, must stop, turn around, and push his heavy cart back to serve her.

For larger and audible video: https://www.youtube.com/watch?v=XHEW_lxlZ_0

This "missed connection" is physically exhausting for the hawker and frustrating for the buyer. From this observation, Hawker-ID was born. It is an innovative solution designed to bridge this gap without changing human habits. We don't ask the hawker to install an app or buy a GPS tracker; instead, we empower the environment to "listen" for them.

Crucially, Hawker-ID is designed to scale into a community intelligence network. When widely distributed across a neighborhood, the system becomes predictive. If a device at the entrance of the street detects a specific vendor, it shares this intelligence with the community. Neighbors living further down the road receive a "pre-arrival" notification via a shared Telegram group or channel. This gives them valuable lead time to prepare their bowls and wallets before the hawker even reaches their gate, transforming individual edge sensors into a collaborative, smart-neighborhood grid.

Hawker-ID is an intelligent edge computing solution that detects the presence of street food vendors (hawkers) passing by a residential location using real-time acoustic monitoring and machine learning. The system leverages TinyML technology deployed on resource-constrained edge devices to classify vendor-specific acoustic signatures and deliver instantaneous notifications to a centralized server. By combining sophisticated audio classification with low-power processing, Hawker-ID demonstrates the practical application of embedded AI in everyday IoT scenarios, enabling residents to stay informed about food vendor availability without relying on cloud-based processing.

We recognized that modern machine learning and edge AI offered a fundamentally different approach: what if we could empower cities with intelligent detection systems that work non-intrusively, collect actionable data, and ultimately help integrate hawkers into formal urban frameworks?

Our inspiration came from three converging insights:

The proven success of TinyML applications in resource-constrained environments
The remarkable advances in ARM microcontroller capabilities with dedicated AI accelerators like the Cortex-M55 and Ethos-U55
The simply portable Linux server using Arduino UNO Q for Telegram bot deployment for user notification (also utilize the UNO Q's matrix LEDs)
The urgent need for inclusive smart city solutions that respect informal economy workers while solving genuine urban management challenges. We wanted to build something practical, affordable, and deployable today (not a distant research prototype). Hawker-ID is our answer: a democratized, open-source framework that any city, municipality, developer, or even individual hawker can deploy to bring data-driven intelligence to street vending management, with an intuitive interface accessible through ubiquitous communication platforms like Telegram.

What it does

Hawker-ID is a real-time audio classification and smart visualization system designed to detect, identify, monitor, and celebrate hawker activities using embedded machine learning and personalized notifications. At its core, the system combines multiple intelligent components working in harmonious coordination.

The Grove Vision AI V2 module (powered by Himax's WiseEye2 HX6538 processor with ARM Cortex-M55 and Ethos-U55 neural accelerator) continuously listens for audio patterns characteristic of hawker activity; distinctive calls, vendor announcements, or vending sounds.

A wireless link (via ESP32) seamlessly transmits audio classification results to the Arduino UNO Q, a revolutionary Linux-capable embedded server that serves as the system's intelligence hub and control center. The Arduino UNO Q runs a full Debian Linux environment on its quad-core Qualcomm Dragonwing QRB2210 processor (ARM based Cortex-A53 CPU), enabling sophisticated decision-making, data logging, and integration capabilities that would be impossible on traditional microcontrollers.

When a hawker is detected, the system performs an intelligent matching algorithm: it compares the detected hawker's acoustic signature against a user's favorite hawker profiles stored in the system database. If a favorite hawker is identified passing by, the Arduino UNO Q illuminates an elegant heart animation on its integrated LED matrix display (a real-time visual celebration that transforms hawker detection from utilitarian monitoring into a delightful user experience). Simultaneously, the @HawkerID_bot Telegram bot sends users immediate notifications with: detection timestamps, confidence scores, live camera feeds captured from the Grove Vision AI V2 module, acoustic profile details, and geolocation data if GPS is available.

The Edge-AI operates with sub-second latency (50-200ms inference time), consumes minimal power (300-600mW), and requires absolutely no cloud dependency for core operation; all AI inference happens locally on the hardware, ensuring privacy and reliability. The system outputs real-time detection events with rich multimedia context, enabling city administrators, municipal officers, authorized retailers, or individual hawker enthusiasts to immediately understand vending activity patterns and connect with their favorite street vendors.

How we built it

Building Hawker-ID required a systematic, multi-layered approach blending hardware integration, machine learning engineering, and embedded software development.

Phase 1: Hardware Architecture involved selecting the optimal combination of components. We chose the Grove Vision AI V2 specifically because it integrates a PDM (Pulse Density Modulation) microphone directly on the module, paired with the XIAO ESP32-S3 for its unmatched power efficiency and WiFi capabilities. We hand-assembled development units, carefully mapping the CSI camera interface, UART communication pins (TX=GPIO6, RX=GPIO7), and alert I/O pins (LED=GPIO3, Buzzer=GPIO4).

Phase 2: Dataset Collection was perhaps the most geographically demanding phase. We collected over 20 minutes of diverse hawker sound recordings from multiple locations (morning markets, street vendors, mobile hawkers, various distances) and 15+ minutes of negative samples (traffic, people talking, urban ambient noise). All audio was preprocessed to 16kHz sample rate, 16-bit mono PCM format (the TinyML standard).

Phase 3: Model Training leveraged Edge Impulse's cloud platform where we:

Configured a 1000ms audio window with 500ms overlap (50% hop for smooth detection)
Applied MFCC (Mel-Frequency Cepstral Coefficients) feature extraction with 13 coefficients
Trained a Keras neural network with 100 epochs using data augmentation for robustness. We achieved 92% training accuracy and 87% validation accuracy (strong indicators of real-world performance).

Phase 4: Firmware Development involved crafting optimized TensorFlow Lite models using int8 quantization to reduce model size to just 200-400KB, small enough to fit comfortably in the Grove Vision AI V2's flash memory.

Phase 5: Integration Code was the glue! We developed comprehensive Arduino firmware for the XIAO ESP32-S3 that handles: UART serial communication parsing Grove Vision AI output, WiFi connectivity with automatic reconnection logic, debouncing alert logic (preventing alert spam), NTP time synchronization for timestamped detections, and optional integrations with MQTT brokers or webhook services for cloud reporting (for future development).

Challenges we ran into

Challenge 1: Audio Data Collection Diversity: Early prototype testing showed the system failed catastrophically in new locations. We'd trained on morning market hawkers but couldn't recognize afternoon street vendors with different call patterns. Solution: We expanded data collection to 35+ minutes per class, systematically capturing temporal variation (same vendors at different times), spatial variation (different distances and microphones), and environmental variation (weather conditions affecting acoustics).

Challenge 2: False Positive Rate: Initial deployments triggered alerts on loud motorcycles, car horns, and construction noise (40% false positive rate, making the system unusable). We added a third "other_sounds" classification class, implemented 2-second debouncing logic, and added confidence threshold tuning (0.60-0.85 range).

Challenge 3: UART Communication Complexity: Grove Vision AI V2 outputs variable-format inference results depending on firmware versions, with inconsistent floating-point precision and timestamp formatting. We built robust multi-format parsing with fallback behaviors, handling regex pattern variations and validating all confidence scores are in proper 0.0-1.0 ranges.

Challenge 4: Arduino UNO Q Linux-MCU Bridge Communication: Establishing reliable bidirectional communication between the Linux processor (running the Telegram bot) and the STM32 microcontroller (controlling the LED matrix) required learning Arduino's Bridge RPC library. Initial attempts caused deadlocks when the bot sent too many rapid requests. Solution: We implemented a message queue system with rate limiting (max 10 commands/second) and timeouts for stuck requests.

Challenge 5: Telegram Bot Image Transmission: Sending live camera images from Grove Vision AI V2 through the Telegram bot proved challenging due to size constraints and processing delays. The initial approach tried to send full 640×480 images, overwhelming the bandwidth. We implemented automatic image compression (reducing to 320×240 and JPEG 70% quality), thumbnail generation, and asynchronous transmission so the bot remained responsive even during image uploads.

Challenge 6: Mini UPS Integration and Power Management: The system consumed 300-600mW continuous, allowing only 3-5 hours on battery. We designed a sophisticated power management system that monitors UPS voltage levels, gracefully prioritizes services (dropping non-critical features first), and implements a 12-hour minimum uptime mode that disables video capture and reduces audio sampling.

Challenge 7: Favorite Hawker Profile Matching: Implementing reliable acoustic matching to identify specific hawkers from their unique call signatures was non-trivial. Different hawkers have similar calling patterns, yet the same hawker varies their call. We solved this by implementing a multi-dimensional matching algorithm that combines: (1) acoustic signature similarity (MFCC distance), (2) temporal patterns (time of day, day of week), (3) GPS proximity (if available), and (4) user-defined confidence threshold per favorite hawker (allowing some favorites to require high confidence, others lower).

Challenge 8: LED Matrix Animation Smoothness with CPU Load: When the Telegram bot was processing images and the audio classifier was running simultaneously, LED animation would stutter. Solution: We moved LED animation to a dedicated real-time thread with pinned CPU affinity, preventing interruption from Python bot processes.

Challenge 9: TinyML Model Size vs. Accuracy: Aggressive quantization reduced accuracy by 3-5%. We resolved this through careful quantization-aware training and reducing MFCC coefficients from 13 to 11, recovering crucial storage space while maintaining 87% validation accuracy.

Accomplishments that we're proud of

1. Achieved Production-Ready Accuracy with Privacy-First Design: We deployed an 87% accurate audio classification model in real-world conditions while ensuring all inference happens locally on devices (no audio ever leaves the system unless explicitly captured for transmission). This validates that privacy-respecting AI systems don't require compromising accuracy.

2. Successfully Bridged Linux and Microcontroller Worlds: The Arduino UNO Q architecture enabled us to run sophisticated Telegram bot logic on Linux while maintaining real-time LED animation responsiveness on the microcontroller; a technical achievement demonstrating how hybrid architectures unlock new possibilities. We're proud to have created the first (as far as we know) Hawker detection system integrating Telegram bots with edge microcontroller hardware.

3. Created an Emotionally Intelligent Interface: Rather than purely utilitarian alerts, we implemented a heart animation that triggers when favorite hawkers pass by. This transforms the user experience from surveillance-like monitoring into something joyful (users literally see their favorite hawker's heart beat on their local display). This design philosophy earned recognition in discussions about inclusive urban technology.

4. Demonstrated 12-Hour Continuous Operation on Mini UPS: We engineered sophisticated power management that sustains the complete system (Grove Vision AI V2, ESP32, Arduino UNO Q, LED matrix, Telegram bot) for 12 hours without external power; validating that edge AI systems can be resilient enough for critical urban infrastructure deployment in areas with unreliable electricity.

5. Built a Complete Open-Source Reference Implementation: We documented every component: hardware schematics with BOM, Arduino sketches with 250+ lines of detailed comments, Python bot source code, Edge Impulse exact configuration, troubleshooting guides, and LED animation code. Anyone with $300-400 can now deploy a fully functional hawker detection system.

6. Achieved Telegram Bot-Hardware Synchronization: We implemented seamless synchronization between Telegram user requests and local hardware actions. Users can send /favorite add commands via Telegram to teach the system new hawkers, and LED animations update in real-time; all changes persist to the Arduino UNO Q's local database. This demonstrates genuine bidirectional IoT control.

7. Implemented Multi-Format Image Capture and Transmission: The system captures images from Grove Vision AI V2 camera, automatically compresses them intelligently, and transmits through Telegram within 2 seconds of detection; proving that embedded edge devices can provide rich multimedia data without overwhelming bandwidth or creating privacy concerns through cloud storage.

8. Designed with Inclusive Smart City Principles: Rather than surveillance-focused design, we architected a system that could help formalize informal economies. By providing data on hawker patterns while respecting their privacy and even celebrating their presence (through the heart animation), Hawker-ID models how technology can integrate rather than alienate informal workers.

9. Created Reproducible ML Pipeline for Geographic Scaling: We documented every ML step so anyone can retrain models with audio from their specific city, creating location-specific models. This enabled the first version of our "Community Models" initiative where cities contribute training data.

What we learned

1. Audio Classification Requires Obsessive Attention to Data Diversity: We learned that collecting 10 minutes from one location is vastly inferior to 3-4 minutes each from 4+ diverse locations. Temporal, spatial, and environmental variation matter more than raw data volume.

2. TinyML on Edge Devices is Genuinely Production-Ready: We arrived skeptical that neural networks could run reliably on resource-constrained microcontrollers. We're now convinced: with careful quantization, efficient models, and thoughtful optimization, you can deploy 85%+ accurate systems in <500KB flash memory. This validates the entire TinyML movement.

3. Hardware-Software Integration is Often the Bottleneck: Machine learning training through Edge Impulse was straightforward. What consumed 40% of our development time: UART protocol debugging, power management, LED matrix timing constraints, and Arduino UNO Q Linux-MCU bridge communication. Better hardware documentation would save weeks of development.

4. User Experience Transforms Acceptance: We initially built a purely utilitarian alert system. When we added the heart animation for favorite hawkers, everything changed; testing showed users were not just accepting the system but wanting to engage with it. Emotional design matters, even for embedded systems.

5. Telegram Provides Extraordinary Value for IoT: We were skeptical that a consumer chat app could serve as an industrial-grade IoT interface. In practice, the Telegram Bot API proved remarkably robust, with excellent documentation, secure authentication, and sufficient rate limits for our needs. It's a legitimately underrated platform for edge device control.

6. Threshold Tuning Is Fundamentally Contextual: There is no universal "best" confidence threshold. Optimal values depend on deployment context: busy markets tolerate higher false positive rates, while quiet residential zones require <1% false positives. This taught us that production systems need user-configurable thresholds and historical logging to enable data-driven tuning per location.

7. Real-Time Threads and CPU Affinity Are Essential for Embedded Linux: When Python bot processes ran on the same cores as LED animations, performance suffered. Pinning animation threads to dedicated CPU cores solved this. This seemingly obvious lesson took us embarrassingly long to learn (it's not documented in most Arduino tutorials).

8. Privacy-First Design Doesn't Compromise Capability: By processing all audio locally and only transmitting Telegram notifications (not raw sensor data), we gained user trust while maintaining full system functionality. Privacy and capability aren't tradeoffs; they're aligned incentives.

9. The Informal Economy Needs Technology Partnership, Not Surveillance: Conversations with street hawkers revealed that they genuinely want better data about their own business (optimal locations, peak hours, customer patterns). This reframed our entire approach from "detection system" to "business intelligence tool for informal workers."

10. Hybrid Linux-Microcontroller Architectures Enable New Possibilities: We learned that the Arduino UNO Q's dual-architecture (Linux + microcontroller on one board) opens genuine new capabilities that neither architecture alone provides. The implications for edge AI systems are profound.

What's next for Hawker-ID

Phase 2A: Multi-Sensor Fusion: The next iteration will integrate accelerometer, gyroscope, and infrared thermal sensors to capture vending motion patterns (cart movements, product handling) and heat signatures (food preparation). Fusing acoustic and motion data should push accuracy above 93% while reducing false positives to <2%.

Phase 2B: Community Datasets: We're creating the "ImageNet of hawker sounds"; a collaborative crowdsourced dataset where cities worldwide contribute recorded audio, enabling location-specific models. Version 1 will support Bandung (Sundanese), Cimahi (Sundanese), Jakarta (Betawi), Tangerang (Betawi), Blitar (Javanese), Pangkalpinang (Orang Bangka). Version 2 will cover Singlish, Malay, Thai, Vietnamese, and Filipino hawker varieties with distinct acoustic characteristics.

Phase 3A: Mobile Companion App: Currently alerts reach users primarily via Telegram. We're developing native Android/iOS apps that visualize real-time hawker hotspot maps, log detection history with GPS coordinates, manage favorite hawker profiles locally, and generate compliance reports; all while maintaining our privacy-first principles (raw audio never transmitted).

Phase 3B: Hawker Profile Enrichment: Beyond acoustic identification, we're integrating crowdsourced photos, specialty foods, customer reviews, and location history. Users will build rich profiles of their favorite hawkers, and the system will remember these preferences across sessions and locations.

Phase 4A: LoRaWAN Mesh Deployment: We're designing city-scale deployment using LoRaWAN (long-range, low-power wireless) where detection nodes are distributed across entire cities, all reporting to a municipality dashboard. LoRaWAN's 5-10km range and battery life (5+ years) make this feasible for formal smart city integration.

Phase 4B: MQTT Integration for Smart City Platforms: Integration with existing platforms (Smartcity311, government IoT dashboards) so Hawker-ID becomes one sensor feeding larger urban intelligence systems for traffic optimization, zoning decisions, and economic policy.

Phase 5A: Vendor Empowerment Dashboard: Beyond city management, we're building tools for hawkers themselves; showing optimal vending locations, peak hours, competitive density, and customer flow patterns. Data empowerment for informal economy workers. Included: AI suggestions for product positioning based on detected competitor activity.

Phase 5B: Telegram Bot Advanced Features: Upcoming bot capabilities include: /hawker_stats showing detailed analytics, /nearby finding favorite hawkers within 1km, /alert_threshold allowing individual preference tuning, /history with daily activity logs, and /export_data for personal record-keeping. The bot will become a genuine personal hawker assistant.

Phase 6: Arduino UNO Q Power Optimization: We're targeting 24-hour continuous operation on the Mini UPS through firmware optimization: predictive audio sampling (only listen during peak hours), GPU acceleration for inference (UNO Q has integrated GPU), and intelligently batching Telegram transmissions.

Phase 7: Machine Learning Model Customization: Users will be able to fine-tune audio classification models through Telegram. Send samples of your favorite hawker's call via Telegram, and the system learns. This enables personalized acoustic profiles; one user's "favorite" pattern that another user wouldn't care about.

Phase 8: Global Expansion & Localization: We plan partnerships with municipalities across Southeast Asia, South Asia, Australia and Africa to deploy Hawker-ID with localized training data. Each region will have models trained on local hawking cultures and sounds, adapted to regional hardware availability and connectivity (some areas using LoRaWAN instead of WiFi).

Phase 9: Research Publications & Academic Impact: We're documenting methodology and results for submission to IoT, embedded systems, and smart cities conferences. We want to contribute to academic understanding of TinyML audio classification and inclusive urban technology design.

Phase 10: Formal Integration with Municipal Systems: Conversations with city planners revealed demand for Hawker-ID data in formal governance. Long-term goal: hawker registration systems where Hawker-ID detection feeds into formal permit tracking, enabling cities to support (rather than police) informal vendors.

Final Vision: Hawker-ID ultimately represents a proof-of-concept for hyperlocal, privacy-respecting, genuinely empowering edge AI systems that serve communities. If successful, this framework could be adapted for acoustic detection of equipment failures, environmental monitoring, urban wildlife tracking, emergency response coordination, or any domain where edge audio classification adds value. More fundamentally, we see Hawker-ID as demonstrating that technology can be built with informal economy communities rather than surveilling them; that smart cities can be smart *for everyone! *, not just the affluent or connected. The heart animation beating on the Arduino UNO Q LED matrix whenever your favorite hawker passes by isn't just a feature ... it's a philosophy: technology should celebrate humanity, not just surveil it.