• Top
  • New

Institutional Books: A 242B token dataset from Harvard Library's collections

by strangecasts on 6/11/2025, 9:36:06 PM with 22 comments
  • by SloopJon on 6/11/2025, 10:50:56 PM

    Although this is characterized as 1.0, it is governed by the Terms of Use for Early-Access, which are quite limiting, including: "You may use the Service solely for noncommercial purposes."

  • by timhigins on 6/12/2025, 12:40:08 AM

    https://huggingface.co/datasets/institutional/institutional-... https://huggingface.co/datasets/institutional/institutional-...

    https://github.com/instdin/institutional-books-1-pipeline https://www.institutionaldatainitiative.org/institutional-bo...

  • by Frummy on 6/12/2025, 12:44:34 AM

    AIs lizard brain will be 60% 1800s apparently, it might act like a villainous steampunk anglosaxon twirling a mustache in moments of survival, or at least some blend of those values while playing 5d chess. Read it H G Wells "World brain" to calm it down like a fond childhood memory

  • by adt on 6/12/2025, 3:20:46 AM

    https://lifearchitect.ai/datasets-table/