cuDF - GPU DataFrames

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:

import cudf, io, requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

Output:

size
1    21.729201548727808
2    16.571919173482897
3    15.215685473711837
4    14.594900639351332
5    14.149548965142023
6    15.622920072028379
Name: tip_percentage, dtype: float64

For additional examples, browse our complete API documentation, or check out our more detailed notebooks.

Quick Start

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.

Installation

CUDA/GPU requirements

  • CUDA 11.0+

  • NVIDIA driver 450.80.02+

  • Pascal architecture or better (Compute Capability >=6.0)

Conda

cuDF can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel:

For cudf version == 21.06 :

# for CUDA 11.0
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=21.06 python=3.7 cudatoolkit=11.0

# or, for CUDA 11.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=21.06 python=3.7 cudatoolkit=11.2

For the nightly version of cudf :

# for CUDA 11.0
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=11.0

# or, for CUDA 11.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=11.2

Note: cuDF is supported only on Linux, and with Python versions 3.7 and later.

See the Get RAPIDS version picker for more OS and version info.

GitHub

https://github.com/rapidsai/cudf

Source: https://pythonawesome.com/a-gpu-dataframe-library-for-loading-and-otherwise-manipulating-data/