Show simple item record

dc.contributor.authorGan, Yulu
dc.contributor.authorGalanti, Tomer
dc.contributor.authorPoggio, Tomaso
dc.contributor.authorMalach, Eran
dc.date.accessioned2024-09-30T17:39:24Z
dc.date.available2024-09-30T17:39:24Z
dc.date.issued2024-09-27
dc.identifier.urihttps://hdl.handle.net/1721.1/157074
dc.description.abstractOriginally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theoretically demonstrate that ARDTs can compute complex functions, such as simulating automata, Turing machines, and sparse circuits, by leveraging "chain-of-thought" computations. Our analysis provides bounds on the size, depth, and computational efficiency of ARDTs, highlighting their surprising computational power. Empirically, we train ARDTs on simple language generation tasks, showing that they can learn to generate coherent and grammatically correct text on par with a smaller Transformer model. Additionally, we show that ARDTs can be used on top of transformer representations to solve complex reasoning tasks. This research reveals the unique computational abilities of ARDTs, aiming to broaden the architectural diversity in language model development.en_US
dc.description.sponsorshipThis material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.en_US
dc.publisherCenter for Brains, Minds and Machines (CBMM)en_US
dc.relation.ispartofseriesCBMM Memo;149
dc.titleOn the Power of Decision Trees in Auto-Regressive Language Modelingen_US
dc.typeArticleen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record