Learn
Video

New Sample Dataset: 13F SEC

4 min watch·January 2026·VerbaGPT

The SEC Form 13F dataset is one of the most interesting publicly available financial datasets — and one of the messiest to work with. Every quarter, institutional investment managers with over $100M in assets under management are required to disclose their holdings. The result is a database of tens of millions of rows describing who owns what, and how much.

In this video, we walk through loading the 13F holdings data into VerbaGPT and using the Data Notes feature to automatically generate documentation for a dataset that comes with almost none.

What's in the Dataset

The holdings table alone contains 16+ million rows. Key fields include ACCESSION_NUMBER, CIK (the filer's SEC identifier), FILING_DATE, NAMEOFISSUER, CUSIP, and VALUE. The dataset sounds straightforward — but there's a critical catch.

The VALUE column changed units in 2023: pre-2023 values are in thousands of dollars; from 2023 onward, they are in actual dollars. VerbaGPT caught and documented this automatically.

This is exactly the kind of subtle, business-critical data nuance that gets missed when analysts dive into querying before understanding the data. It's also the kind of thing that lives in someone's head, not in any documentation — until now.

How Data Notes Work

Instead of querying the data immediately, VerbaGPT was first tasked with investigating the structure: examining the schema, sampling rows, cross-referencing the SEC's official documentation, and generating a coherent data dictionary. The model identified field patterns, inferred the dataset's structure, and flagged anomalies like the unit change.

The result is a set of Data Notes — a contextual layer that describes the dataset in plain English. From that point on, every query VerbaGPT makes uses this context automatically. It knows what CIK means, what the VALUE column represents, and that the unit shift requires a conversion when comparing pre- and post-2023 data.

Key Takeaways

This video pairs directly with the written tutorial below — which goes deeper into the philosophy of iterative documentation and why waiting for "perfect" data docs is the wrong mental model.


Related

Tutorial

Stop Waiting for Perfect Data Documentation