-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : refactor llama_build_graph to reduce code duplication #3382
Comments
Something I am thinking we should consider in the scope of this issue is decoupling the // current llm_build_llama()
struct ggml_tensor * inp_tokens = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens);
ggml_allocr_alloc(lctx.alloc, inp_tokens);
if (!ggml_allocr_is_measure(lctx.alloc)) {
memcpy(inp_tokens->data, batch.token, n_tokens*ggml_element_size(inp_tokens));
}
ggml_set_name(inp_tokens, "inp_tokens");
// ------
// new llm_build_llama()
struct ggml_tensor * inp_tokens = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens);
ggml_set_name(inp_tokens, "inp_tokens");
// new llm_setup_llama()
ggml_tensor * inp_tokens = ggml_get_tensor(ctx, "inp_tokens");
ggml_allocr_alloc(lctx.alloc, inp_tokens);
if (!ggml_allocr_is_measure(lctx.alloc)) {
memcpy(inp_tokens->data, batch.token, n_tokens*ggml_element_size(inp_tokens));
} Having build functions that do not rely on the state of the allocator would facilitate some things around estimating the required memory. (cc @slaren for thoughts) |
I think it would be good to pre-allocate all the input and output tensors in a different buffer. In this way, these tensors would always be allocated and the calls to This was already in the first version of |
I don't write much cpp but I'm happy to take a stab at this |
With the support of new model architectures, we start to observe a lot of repeating patterns in the code for building their compute graphs. We should find a way to refactor and reuse the repetitive code. We should also consider splitting the implementation in separate source files if necessary.
https://github.com/ggerganov/llama.cpp/blob/0e76a8992c8200237bbc6471a53fb8796b3872f7/llama.cpp#L3997-L4026
Open to ideas and suggestions
The text was updated successfully, but these errors were encountered: