Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem#4876
Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem#4876njzjz merged 4 commits intodeepmodeling:develfrom
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe batch size parsing logic in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DeepmdDataSystem
User->>DeepmdDataSystem: Initialize with batch_size rule ("max" or "filter")
DeepmdDataSystem->>DeepmdDataSystem: Parse batch_size string
alt "max" rule
DeepmdDataSystem->>DeepmdDataSystem: For each system, compute batch_size = max(1, rule_value // natoms)
else "filter" rule
DeepmdDataSystem->>DeepmdDataSystem: Remove systems with natoms > rule_value
DeepmdDataSystem->>DeepmdDataSystem: If no systems left, raise error
DeepmdDataSystem->>DeepmdDataSystem: For each remaining system, compute batch_size = max(1, rule_value // natoms)
end
DeepmdDataSystem-->>User: Data system initialized with new batch size logic
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.12.2)deepmd/utils/data_system.py155-155: Yoda condition detected Rewrite as (SIM300) 168-168: Yoda condition detected Rewrite as (SIM300) 186-186: No explicit Set (B028) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/data_system.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
deepmd/utils/data_system.py
155-155: Yoda condition detected
Rewrite as words[0] == "max"
(SIM300)
168-168: Yoda condition detected
Rewrite as words[0] == "filter"
(SIM300)
182-182: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (2, 3.12)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4876 +/- ##
==========================================
- Coverage 84.34% 84.30% -0.04%
==========================================
Files 702 702
Lines 68585 68621 +36
Branches 3573 3572 -1
==========================================
+ Hits 57847 57851 +4
- Misses 9597 9631 +34
+ Partials 1141 1139 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…tem (deepmodeling#4876) # Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem ## Problem - Using `batch_size: "max:..."` or `"filter:..."` in configs caused: - `RuntimeError: unknown batch_size rule max` during the PyTorch path (neighbor statistics). - Docs mention these rules and PyTorch `DpLoaderSet` already supports them, so behavior was inconsistent across layers. ## Cause - The common data layer `DeepmdDataSystem` only implemented `"auto"` and `"mixed"` for string `batch_size`, missing `"max"` and `"filter"`. - PT training performs neighbor statistics via `DeepmdDataSystem` before real training, so it failed early when those rules were used. ## Fix - Implement `"max:N"` and `"filter:N"` in `DeepmdDataSystem.__init__` to mirror `DpLoaderSet` semantics: - `max:N`: per-system `batch_size = max(1, N // natoms)` so `batch_size * natoms <= N`. - `filter:N`: drop systems with `natoms > N` (warn if any removed; error if none left), then set per-system `batch_size` as in `max:N`. - After filtering, update `self.data_systems`, `self.system_dirs`, and `self.nsystems` before computing other metadata. ## Impact - Aligns the common layer behavior with PyTorch `DpLoaderSet` and with the documentation. - Prevents PT neighbor-stat crashes with configs using `"max"`/`"filter"`. ## Compatibility - No change to numeric `batch_size` or existing `"auto"/"auto:N"/"mixed:N"` rules. - TF/PT/PD paths now accept the same `batch_size` rules consistently in the common layer. ## Files Changed - `deepmd/utils/data_system.py`: add parsing branches for `"max:N"` and `"filter:N"` in `DeepmdDataSystem.__init__`. ```python elif "max" == words[0]: # Determine batch size so that batch_size * natoms <= rule, at least 1 if len(words) != 2: raise RuntimeError("batch size must be specified for max systems") rule = int(words[1]) bs = [] for ii in self.data_systems: ni = ii.get_natoms() bsi = rule // ni if bsi == 0: bsi = 1 bs.append(bsi) self.batch_size = bs elif "filter" == words[0]: # Remove systems with natoms > rule, then set batch size like "max:rule" if len(words) != 2: raise RuntimeError("batch size must be specified for filter systems") rule = int(words[1]) filtered_data_systems = [] filtered_system_dirs = [] for sys_dir, data_sys in zip(self.system_dirs, self.data_systems): if data_sys.get_natoms() <= rule: filtered_data_systems.append(data_sys) filtered_system_dirs.append(sys_dir) if len(filtered_data_systems) == 0: raise RuntimeError(f"No system left after removing systems with more than {rule} atoms") if len(filtered_data_systems) != len(self.data_systems): warnings.warn(f"Remove {len(self.data_systems) - len(filtered_data_systems)} systems with more than {rule} atoms") self.data_systems = filtered_data_systems self.system_dirs = filtered_system_dirs self.nsystems = len(self.data_systems) bs = [] for ii in self.data_systems: ni = ii.get_natoms() bsi = rule // ni if bsi == 0: bsi = 1 bs.append(bsi) self.batch_size = bs ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for "max" and "filter" batch size rules, allowing more flexible control over batch sizing and filtering of data systems based on atom counts. * **Bug Fixes** * Improved error handling for incorrect batch size string formats and cases where no systems remain after filtering. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem
Problem
batch_size: "max:..."or"filter:..."in configs caused:RuntimeError: unknown batch_size rule maxduring the PyTorch path (neighbor statistics).DpLoaderSetalready supports them, so behavior was inconsistent across layers.Cause
DeepmdDataSystemonly implemented"auto"and"mixed"for stringbatch_size, missing"max"and"filter".DeepmdDataSystembefore real training, so it failed early when those rules were used.Fix
"max:N"and"filter:N"inDeepmdDataSystem.__init__to mirrorDpLoaderSetsemantics:max:N: per-systembatch_size = max(1, N // natoms)sobatch_size * natoms <= N.filter:N: drop systems withnatoms > N(warn if any removed; error if none left), then set per-systembatch_sizeas inmax:N.self.data_systems,self.system_dirs, andself.nsystemsbefore computing other metadata.Impact
DpLoaderSetand with the documentation."max"/"filter".Compatibility
batch_sizeor existing"auto"/"auto:N"/"mixed:N"rules.batch_sizerules consistently in the common layer.Files Changed
deepmd/utils/data_system.py: add parsing branches for"max:N"and"filter:N"inDeepmdDataSystem.__init__.Summary by CodeRabbit