bigframes.pandas#

The primary entry point for the BigQuery DataFrames (BigFrames) pandas-compatible API.

BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API powered by the BigQuery engine. The bigframes.pandas module implements a large subset of the pandas API, allowing you to perform large-scale data analysis using familiar pandas syntax while the computations are executed in the cloud.

Key Features:

Example usage:

>>> import bigframes.pandas as bpd

Initialize session and set options.

>>> bpd.options.bigquery.project = "your-project-id"

Load data from a BigQuery public dataset.

>>> df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")

Perform familiar pandas operations that execute in the cloud.

>>> top_names = (
...     df.groupby("name")
...     .agg({"number": "sum"})
...     .sort_values("number", ascending=False)
...     .head(10)
... )

Bring the final, aggregated results back to local memory if needed.

>>> local_df = top_names.to_pandas()

BigQuery DataFrames is designed for data scientists and analysts who need the power of BigQuery with the ease of use of pandas. It eliminates the “data movement bottleneck” by keeping your data in BigQuery for processing.

Functions

clean_up_by_session_id(session_id[, ...])

Searches through BigQuery tables and routines and deletes the ones created during the session with the given session id.

close_session()

Start a fresh session the next time a function requires a session.

col(col_name)

Generate deferred object representing a column of a DataFrame.

concat(...)

Concatenate BigQuery DataFrames objects along a particular axis.

crosstab(index, columns[, values, rownames, ...])

Compute a simple cross tabulation of two (or more) factors.

cut(x, bins, *[, right, labels, session])

Bin values into discrete intervals.

deploy_remote_function(func, **kwargs)

Orchestrates the creation of a BigQuery remote function that deploys immediately.

deploy_udf(func, **kwargs)

Orchestrates the creation of a BigQuery UDF that deploys immediately.

from_glob_path(path, *[, connection, name])

Create a BigFrames DataFrame that contains a BigFrames ObjectRef column from a global wildcard path.

get_default_session_id()

Gets the session id that is used whenever a custom session has not been provided.

get_dummies(data[, prefix, prefix_sep, ...])

Convert categorical variable into dummy/indicator variables.

get_global_session()

Gets the global session.

merge(left, right[, how, on, left_on, ...])

Merge DataFrame objects with a database-style join.

qcut(x, q, *[, labels, duplicates])

Quantile-based discretization function.

read_arrow(pa_table)

Load a PyArrow Table to a BigQuery DataFrames DataFrame.

read_csv(filepath_or_buffer, *[, sep, ...])

Loads data from a comma-separated values (csv) file into a DataFrame.

read_gbq(, columns, configuration, ...)

Loads a DataFrame from BigQuery.

read_gbq_function(function_name[, ...])

Loads a BigQuery function from BigQuery.

read_gbq_model(model_name)

Loads a BigQuery ML model from BigQuery.

read_gbq_object_table(object_table, *[, name])

Read an existing object table to create a BigFrames ObjectRef DataFrame.

read_gbq_query(, columns, configuration, ...)

Turn a SQL query into a DataFrame.

read_gbq_table(, columns, max_results, ...)

Turn a BigQuery table into a DataFrame.

read_json(path_or_buf, *[, orient, dtype, ...])

Convert a JSON string to DataFrame object.

read_pandas(...)

Loads DataFrame from a pandas DataFrame.

read_parquet(path, *[, engine, write_engine])

Load a Parquet object from the file path (local or Cloud Storage), returning a DataFrame.

read_pickle(filepath_or_buffer[, ...])

Load pickled BigFrames object (or any object) from file.

remote_function([input_types, output_type, ...])

Decorator to turn a user defined function into a BigQuery remote function.

reset_session()

Start a fresh session the next time a function requires a session.

to_datetime(-> bigframes.series.Series)

This function converts a scalar, array-like or Series to a datetime object.

to_timedelta(arg[, unit, session])

Converts a scalar or Series to a timedelta object.

udf(*[, input_types, output_type, ...])

Decorator to turn a Python user defined function (udf) into a [BigQuery managed user-defined function](https://cloud.google.com/bigquery/docs/user-defined-functions-python).

Classes

DataFrame([data, index, columns, dtype, ...])

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

DatetimeIndex([data, dtype, name, session])

Immutable sequence used for indexing and alignment with datetime-like values

Index([data, dtype, name, session])

Immutable sequence used for indexing and alignment.

MultiIndex([data, dtype, name, session])

A multi-level, or hierarchical, index object for pandas objects.

NamedAgg(column, aggfunc)

Create new instance of NamedAgg(column, aggfunc)

Series([data, index, dtype, name, copy, session])

One-dimensional ndarray with axis labels (including time series).

Module Attributes

pandas.NA = <NA>
pandas.BooleanDtype = <class 'pandas.core.arrays.boolean.BooleanDtype'>
pandas.Float64Dtype = <class 'pandas.core.arrays.floating.Float64Dtype'>
pandas.Int64Dtype = <class 'pandas.core.arrays.integer.Int64Dtype'>
pandas.StringDtype = <class 'pandas.core.arrays.string_.StringDtype'>
pandas.ArrowDtype = <class 'pandas.core.dtypes.dtypes.ArrowDtype'>
pandas.options = <bigframes._config.global_options.Options object>
pandas.option_context = <class 'bigframes_vendored.pandas._config.config.option_context'>