Skip to content

jla-gardner/carbon-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

22.9 Million Carbon Atom Dataset

DOI

See the preprint Synthetic Data Enable Experiments in Atomistic Machine Learning for more details.

The Data

Image Image Image Image

The dataset contains 546 uncorrelated carbon trajectories at a variety of densities (ranging from 1.0 gcm-3 to 3.5 gcm-3) and temperatures.

As can be seen from the examples above (generated using Ovito), this dataset captures a wide variety of chemical environments and features, including carbon nano-tubes, graphitic films, buckyball-esque clusters, cubic and hexagonal diamond and tetrahedral amorphous carbon.

Each atomic environment has been labelled with a "local-energy" (together with a force) by the C-GAP-17 potential, and these are included in the .extxyx files as per atom quantities. These can be accessed using, for instance, the Atomic Simulation Environment package (ase):

from ase.io import read


trajectory = read("results/density-1.0-T-2000.extxyz", index=":")
structure = trajectory[0]
local_energies = structure.get_array("gap17_energy")

The density, anneal temperature, trajectory id and timestamp for each structure is given as a per-structure quantity in the header of each .extxyz entry.

density = structure.info["density"]          # in gcm-3
temperature = structure.info["temperature"]  # in K
trajectory_id = structure.info["run_id"]     # integer
timestamp = structure.info["time"]           # in ps

Generation Procedure

Image

Each trajecotry was seeded using a structure generated using the ./generate_structure.py script, which generates random structures for a given density using a hard-sphere constraint.

The LAMMPS molecular dynamics package, together with the C-GAP-17 potential was then used to perform a melt-quench-anneal simulation, with temperature profile as depicted above. Snapshots were taken at 1ps intervals, for a total of 210 snapshots per trajectory. (546 trajectories _ 210 snapshots _ 200 atoms = 22.9 million atomic enviroments).

Citation

If you use this dataset in your research, please cite the following:

@misc{Gardner-22,
  title = {Synthetic Data Enable Experiments in Atomistic Machine Learning},
  author = {Gardner, John L. A. and Beaulieu, Zo{\'e} Faure and Deringer, Volker L.},
  year = {2022},
  number = {arXiv:2211.16443},
  eprint = {2211.16443},
  eprinttype = {arxiv},
  primaryclass = {physics},
  doi = {10.48550/arXiv.2211.16443},
  archiveprefix = {arXiv}
}