pandas & NumPy · AI / ML · Code with Animation

What are pandas and NumPy?

NumPy is the numerical array library everything in ML is built on. pandas sits on top for tabular data — loading CSVs, cleaning columns, grouping, and joining. Together they are how you wrangle raw data into the clean arrays a model can learn from.

Why it matters

Real ML projects spend most of their time on data, not models. The unglamorous work of loading, cleaning, and reshaping is where projects succeed or fail. Fluency in pandas and NumPy is the day-to-day reality of the job, far more than designing novel architectures.

What to learn

NumPy arrays and vectorized operations
pandas Series and DataFrames
Reading and writing CSV, JSON, and Parquet
Selecting, filtering, and transforming columns
Handling missing data
Grouping and aggregating
Merging and joining datasets

Common pitfall

Writing Python loops over rows of a DataFrame instead of vectorized operations. Row-by-row loops are dramatically slower and harder to read. pandas and NumPy are built to operate on whole columns and arrays at once — reach for vectorized methods, and treat an explicit row loop as a sign you missed a better way.

Resources

Primary (free):

Practice

Load a real CSV dataset into pandas, handle its missing values, create a new column from existing ones without a loop, and compute a grouped summary. Join it with a second small table. Done when every transformation is vectorized rather than a row-by-row loop.

Outcomes

Load and save data across common formats.
Clean missing data and transform columns.
Group, aggregate, and join datasets.
Prefer vectorized operations over row loops.