Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.




I'm not saying it's an n-gram Markov model or that you should store them as a lookup table. Markov models are just a mathematical concept that don't say anything about storage, just that the state change probabilities are a pure function of the current state.

You say state can be anything, no restrictions at all. Let me sell you a perfect predictor then :) The state is the next token.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: