11gig text file, need count the total occurances of "zinc", split into 8 files
The file format is at the bottom. I need to count the total occurances of "zinc", then divide this number by 8. There are approx 2 million molecule structures.
Then I need to read through and split like this:
$$$$
BEGIN A NEW FILE HERE!
ZINC00000023
-OEChem-02270520293D
These should be copied to new files.
ZINC00000018
-OEChem-02270520293D
30 30 0 0 0 0 0 0 0999 V2000
5.1382 -8.4614 1.2497 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5911 -7.4490 0.2381 C 0 0 0 0 0 0 0 0 0 0 0 0
4.8673 -7.9436 -1.1821 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0904 -7.2130 0.5049 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4039 -6.1722 -0.3941 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8414 -4.7515 -0.0601 C 0 0 0 0 0 0 0 0 0 0 0 0
3.9831 -4.3251 -0.1578 O 0 0 0 0 0 0 0 0 0 0 0 0
1.7160 -4.0621 0.3664 N 0 0 0 0 0 0 0 0 0 0 0 0
0.5998 -4.8779 0.3179 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9802 -4.5316 0.7076 S 0 0 0 0 0 0 0 0 0 0 0 0
1.0013 -6.1288 -0.1406 N 0 0 0 0 0 0 0 0 0 0 0 0
1.7692 -2.6611 0.7833 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0444 -2.5979 2.2561 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2248 -2.0297 3.1483 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3608 -6.9013 -0.2930 H 0 0 0 0 0 0 0 0 0 0 0 0
4.9863 -8.1080 2.2751 H 0 0 0 0 0 0 0 0 0 0 0 0
4.6428 -9.4331 1.1481 H 0 0 0 0 0 0 0 0 0 0 0 0
6.2134 -8.6128 1.1065 H 0 0 0 0 0 0 0 0 0 0 0 0
5.1392 -6.5124 0.3880 H 0 0 0 0 0 0 0 0 0 0 0 0
4.6303 -7.1765 -1.9250 H 0 0 0 0 0 0 0 0 0 0 0 0
5.9263 -8.1945 -1.3071 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2794 -8.8384 -1.4126 H 0 0 0 0 0 0 0 0 0 0 0 0
2.5597 -8.1695 0.4000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9654 -6.9138 1.5551 H 0 0 0 0 0 0 0 0 0 0 0 0
2.6164 -6.3866 -1.4481 H 0 0 0 0 0 0 0 0 0 0 0 0
2.5830 -2.1607 0.2473 H 0 0 0 0 0 0 0 0 0 0 0 0
0.8303 -2.1629 0.5227 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9779 -3.0298 2.6089 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4857 -2.0107 4.2011 H 0 0 0 0 0 0 0 0 0 0 0 0
0.2851 -1.5776 2.8502 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 16 1 0 0 0 0
1 17 1 0 0 0 0
1 18 1 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
2 19 1 0 0 0 0
3 20 1 0 0 0 0
3 21 1 0 0 0 0
3 22 1 0 0 0 0
4 5 1 0 0 0 0
4 23 1 0 0 0 0
4 24 1 0 0 0 0
5 11 1 0 0 0 0
5 6 1 0 0 0 0
5 25 1 0 0 0 0
6 7 2 0 0 0 0
6 8 1 0 0 0 0
8 9 1 0 0 0 0
8 12 1 0 0 0 0
9 10 2 0 0 0 0
9 11 1 0 0 0 0
11 15 1 0 0 0 0
12 13 1 0 0 0 0
12 26 1 0 0 0 0
12 27 1 0 0 0 0
13 14 2 0 0 0 0
13 28 1 0 0 0 0
14 29 1 0 0 0 0
14 30 1 0 0 0 0
M END
$$$$
ZINC00000023
-OEChem-02270520293D
Then I need to read through and split like this:
$$$$
BEGIN A NEW FILE HERE!
ZINC00000023
-OEChem-02270520293D
These should be copied to new files.
ZINC00000018
-OEChem-02270520293D
30 30 0 0 0 0 0 0 0999 V2000
5.1382 -8.4614 1.2497 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5911 -7.4490 0.2381 C 0 0 0 0 0 0 0 0 0 0 0 0
4.8673 -7.9436 -1.1821 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0904 -7.2130 0.5049 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4039 -6.1722 -0.3941 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8414 -4.7515 -0.0601 C 0 0 0 0 0 0 0 0 0 0 0 0
3.9831 -4.3251 -0.1578 O 0 0 0 0 0 0 0 0 0 0 0 0
1.7160 -4.0621 0.3664 N 0 0 0 0 0 0 0 0 0 0 0 0
0.5998 -4.8779 0.3179 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9802 -4.5316 0.7076 S 0 0 0 0 0 0 0 0 0 0 0 0
1.0013 -6.1288 -0.1406 N 0 0 0 0 0 0 0 0 0 0 0 0
1.7692 -2.6611 0.7833 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0444 -2.5979 2.2561 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2248 -2.0297 3.1483 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3608 -6.9013 -0.2930 H 0 0 0 0 0 0 0 0 0 0 0 0
4.9863 -8.1080 2.2751 H 0 0 0 0 0 0 0 0 0 0 0 0
4.6428 -9.4331 1.1481 H 0 0 0 0 0 0 0 0 0 0 0 0
6.2134 -8.6128 1.1065 H 0 0 0 0 0 0 0 0 0 0 0 0
5.1392 -6.5124 0.3880 H 0 0 0 0 0 0 0 0 0 0 0 0
4.6303 -7.1765 -1.9250 H 0 0 0 0 0 0 0 0 0 0 0 0
5.9263 -8.1945 -1.3071 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2794 -8.8384 -1.4126 H 0 0 0 0 0 0 0 0 0 0 0 0
2.5597 -8.1695 0.4000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9654 -6.9138 1.5551 H 0 0 0 0 0 0 0 0 0 0 0 0
2.6164 -6.3866 -1.4481 H 0 0 0 0 0 0 0 0 0 0 0 0
2.5830 -2.1607 0.2473 H 0 0 0 0 0 0 0 0 0 0 0 0
0.8303 -2.1629 0.5227 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9779 -3.0298 2.6089 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4857 -2.0107 4.2011 H 0 0 0 0 0 0 0 0 0 0 0 0
0.2851 -1.5776 2.8502 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 16 1 0 0 0 0
1 17 1 0 0 0 0
1 18 1 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
2 19 1 0 0 0 0
3 20 1 0 0 0 0
3 21 1 0 0 0 0
3 22 1 0 0 0 0
4 5 1 0 0 0 0
4 23 1 0 0 0 0
4 24 1 0 0 0 0
5 11 1 0 0 0 0
5 6 1 0 0 0 0
5 25 1 0 0 0 0
6 7 2 0 0 0 0
6 8 1 0 0 0 0
8 9 1 0 0 0 0
8 12 1 0 0 0 0
9 10 2 0 0 0 0
9 11 1 0 0 0 0
11 15 1 0 0 0 0
12 13 1 0 0 0 0
12 26 1 0 0 0 0
12 27 1 0 0 0 0
13 14 2 0 0 0 0
13 28 1 0 0 0 0
14 29 1 0 0 0 0
14 30 1 0 0 0 0
M END
$$$$
ZINC00000023
-OEChem-02270520293D
