A quest for very long context: Part 1

Apr 30, 2023

A million or a billion tokens - How far can we go?

7 Comments

Interesting work, so theoretically, do you think that pythia's archtecture could allow us to fine-tuned it to no-limited context window? (If we have enough computing resources)

Expand full comment

Hey, I think there are limits on the size of embedding; models with bigger embedding size potentially have more capacity to generalize for much longer context, though i have not come across any research on the matter as of now.

Expand full comment

Is there an article part 2?

I am interested in your project. Just kindly let me know if you want to open source your work someday.

Expand full comment

Hey. Thanks for your reply. I was working on the part 2 but there is a lot of research that came recently, kind of makes my work obsolete 🥲

also all the source code is available on my github/gists.

Expand full comment

Thank you. I think megabytes could be a promising direction to go. Where can i find your github gists

Expand full comment

you can find all of my recent experiments here: https://gist.github.com/naxalpha

Expand full comment

Thank you!

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts