In NumPy, dtype defines the type of data stored in an array and how much memory each value uses. It controls how raw memory bytes are interpreted, making NumPy operations fast and efficient. Understanding dtype helps manage performance and memory usage.
This example shows how to find the data type of a NumPy array.
import numpy as np
arr = np.array([1, 2, 3])
print(arr.dtype)
Output
int64
Explanation: attribute arr.dtype returns the data type of the array elements, which is int in this case because all values are integers.
Creating a dtype Object
A dtype object is an instance of the numpy.dtype class. You can create it using the np.dtype() function.
import numpy as np
print(np.dtype(np.int16))
Output
int16
Explanation:
- np.int16 represents a 16-bit integer
- np.dtype() converts it into a dtype object
Understanding Byte Order and Size
NumPy stores data as raw bytes in memory, where byte order controls how bytes are arranged and size defines memory usage per value. This ensures correct and efficient data handling across systems.
import numpy as np
dt = np.dtype('>i4')
print("Byte order:", dt.byteorder)
print("Size:", dt.itemsize)
print("Data type:", dt.name)
Output
('Byte order:', '>')
('Size:', 4)
('Data type:', 'int32')
Explanation:
- > indicates big-endian byte order, meaning the most significant byte is stored first in memory
- i4 represents a 4-byte integer data type
- itemsize shows how many bytes are used to store one value, which is 4 bytes for int32
Together, this means the array stores 32-bit integers using big-endian byte order, with each value occupying 4 bytes in memory.
Common Type Specifiers in NumPy
Some commonly used type codes are:
- i1, i2, i4, i8 -> signed integers (Can store both negative and positive numbers e.g., -5, 0, 10)
- u1, u2, u4, u8 -> unsigned integers (Can store only non-negative numbers e.g., 0, 5, 100)
- f4, f8 -> floating-point numbers (Store decimal values e.g., 3.14, -0.75)
- c8, c16 -> complex numbers (Store numbers with real and imaginary parts e.g., 2+3j)
- a, U -> strings (Store text values e.g. "NumPy")
Difference Between type and dtype
In NumPy, type and dtype serve different purposes and often confuse beginners. The type describes what the object itself is (for example, a NumPy array), while dtype describes the kind of data stored inside the array.
import numpy as np
a = np.array([1])
print("type:", type(a))
print("dtype:", a.dtype)
Output
('type:', <type 'numpy.ndarray'>)
('dtype:', dtype('int64'))
Explanation:
- type(a) shows that a is a NumPy array object
- a.dtype shows that the array stores 64-bit integers
- One object can have only one type, but its dtype controls the stored data
Structured Arrays Using dtype
Sometimes a single array needs to store multiple types of data together, such as a name and numerical values. NumPy handles this using structured arrays, where each element behaves like a small record with named fields.
Defining a Structured dtype
In this example, we define a custom data type with two fields: a text field and a numeric field.
import numpy as np
dt = np.dtype([('name', np.unicode_, 16), ('scores', np.float64, (2,)) ])
print(dt['name'])
print(dt['scores'])
Output
<U16
('<f8', (2,))
Explanation:
- name is a string field that can store up to 16 characters
- scores stores two floating-point values as a small array
- Each field is defined using a field name and its data type
Creating a Structured Array
Here, we create an array where each row follows the structured dtype defined above.
import numpy as np
dt = np.dtype([ ('name', np.unicode_, 16), ('scores', np.float64, (2,)) ])
data = np.array([ ('Emma', (8.5, 7.0)), ('Lucas', (6.0, 7.5)) ], dtype=dt)
print(data[1])
print("Scores:", data[1]['scores'])
print("Names:", data['name'])
Output
(u'Lucas', [6. , 7.5])
('Scores:', array([6. , 7.5]))
('Names:', array([u'Emma', u'Lucas'], dtype='<U16'))
Explanation:
- Each element of data contains both name and scores
- data[1] accesses the second record
- Fields are accessed using field names, such as data['name']
- Structured arrays behave like rows in a table with named columns