Interesting work, so theoretically, do you think that pythia's archtecture could allow us to fine-tuned it to no-limited context window? (If we have enough computing resources)
Hey, I think there are limits on the size of embedding; models with bigger embedding size potentially have more capacity to generalize for much longer context, though i have not come across any research on the matter as of now.
Interesting work, so theoretically, do you think that pythia's archtecture could allow us to fine-tuned it to no-limited context window? (If we have enough computing resources)
Hey, I think there are limits on the size of embedding; models with bigger embedding size potentially have more capacity to generalize for much longer context, though i have not come across any research on the matter as of now.
Is there an article part 2?
I am interested in your project. Just kindly let me know if you want to open source your work someday.
Hey. Thanks for your reply. I was working on the part 2 but there is a lot of research that came recently, kind of makes my work obsolete 🥲
also all the source code is available on my github/gists.
Thank you. I think megabytes could be a promising direction to go. Where can i find your github gists
you can find all of my recent experiments here: https://gist.github.com/naxalpha
Thank you!