Like it or not, we live in the age of AI, and it can be both exciting and frustrating. On one hand, AI can help us unlock almost unlimited capabilities for the apps we build; if in the past, tasks like image recognition or text classification could be a dealbreaker for the average developer, today, customers almost assume one would be able to pull things like this off in ridiculously short timeframes.
On the other hand, however, the landscape of AI development is probably best described as "hostile". We move at incredible speeds, new models and approaches drop and then die out before we even have time to properly use them, lots of tutorials are actually disguised ways of selling us something, and lots of information is still being gatekeeped by more seasoned developers, hidden behind buzzwords like "agentic AI" , "RAG" and so on.
This can be particularly harsh if you're a JavaScript developer, especially working on frontend, like me, with Angular. "Do I need to learn Python to do AI?", "I'm not a backend developer, how can I build apps with AI?", and "How do I even get started, there's so much stuff out there!" are all questions I've heard from developers in my community. Well, to be honest, those are questions I've asked myself as well.
So, for the past couple of months I've been working with the Gemini API, building different apps and tools, and now is the time I dispel some myths for other Angular developers and help them begin their AI journey too. So, hereby, we begin a series of articles on how to build AI-powered apps with Angular and Gemini.
This is going to be a ground-up tutorial, broken down into atomic topics that will help you get started without issue. The good news is, no prior knowledge is assumed! If you are an Angular developer who has no idea how people build apps on top of AI models, this is where you start!
About the article series
In this series, we will cover the following topics:
- Getting started with Gemini API: Accessing the, API, making requests, creating chats, and the most important configuration options.
- Using embeddings: Learn about how we can utilize LLMs for more than just generating text
- Building RAGS: Learning about retrieval augmented generation and how to build RAG apps with Gemini and an Angular frontend
- Using multimodal capabilities: How to work with images, audio, and video in Gemini
- Building agentic AI apps: How to build apps that can reason, plan, and execute tasks on their own
- Slightly touching machine learning: How we can actually forgo LLMs entirely and build way more reliable AI tools tailored for very specific tasks
- A lot more!
Please do not assume this is going to be just a 7 article series, it is very possible I will break down the topics into way smaller chunks, so could end up with a lot more articles than just 7!
So, let's start our journey into AI + Angular!
Getting started with Gemini API
Before we proceed, we must understand how exactly people build apps on top of LLMs. If you did not know that, and just assumed developers make API calls to OpenAI or Gemini or whatnot, well, you were entirely correct! (I swear I didn't generate this last phrase with AI :D).
However, it is even better than that, since Gemini (and other LLM providers, but we focus only on Gemini) offer a specialized SDK that makes it incredibly easy to work with the API. To get started, let's generate a new Angular app, and install the Gemini SDK inside it.
npm i @google/genai
This will install the Google Generative AI SDK, which we will then use to make requests to the Gemini API in a way that is better than spamming fetch
calls.
Now, before we proceed, we need to get over the first obstacles newcomers have to face, which often, in my experience, result in people being intimidated and giving up; that is getting an API token, and setting up billing, which lots of less experienced developers associate with spending money without seeing it, making them hesitant and afraid of sudden large charges.
However, I have great news, since Google offers both a free tier, and some pretty decent models that are very cheap! So, let's do the following steps:
- Go to the Google Cloud Console and log in with your Google account.
- Create a new project, give it a name that you will remember
- Then head over to Google AI Studio: https://studio.google.cloud.com/
- Find the "Get API key on the left sidebar" to navigate to this page
- Create a new API key associating the key with the project you created in step 2
- Copy the API key somewhere safe, we will need it in a moment
Now, since we have the API key, we might be tempted to just create an Angular service, create the API instance with our key, and start making requests. Please do not do this! Think about it for a moment, if we put the API key in our frontend code, anyone can just open the devtools, find the key, and start making requests on our behalf, which is both a security and monetary risk.
Instead, we are going to build a small backend that will handle the AI part for us, and an Angular service that will make requests to our backend. This way, our API key is safe, and we can also implement additional logic in our backend, like caching, rate limiting, and so on.
Don't be too hesitant here, we are not going to build a complex backend, but essentially a thin wrapper between our frontend and the Gemini API. We are going to use Express.js for this; if you're not familiar with it at all, keep reading the following sections; if you are you can skip to the final code example at the end of the section and continue from there.
Building a small Express.js backend
Express.js is a minimal Node.js web framework that allows us to build backend apps via Node.js. To get started, we can, in the same Angular project we just started, install express:
npm install express cors
Then, we can create a new file called server.js
in the root of our project, and add the following code:
const express = require('express'); // importing express
const app = express();
const cors = require('cors'); // to handle CORS
app.use(cors()); // enable CORS
app.use(express.json()); // to parse JSON bodies
app.get('/', (req, res) => {
res.send('Hello!');
});
app.listen(3000, () => {
console.log('Server is running on port 3000');
});
Afterwards, we can open http://localhost:3000
in our browser, and we should see "Hello!" displayed.
Short recap: we created an express app, and declared a route, which, when visited via a browser or a direct API call like fetch
, will return the text "Hello!". We also set up the app to listen on port 3000, which is where we can access it.
Now, we want to actually create an endpoint that will allow us to play with the Gemini API. Before we do that, we need to figure out where to put our API key, since, again, we cannot put it into the source code itself (what if we want to push it to Github, for example?). The best way to do this is via environment variables, which could already be familiar for you even if you have been working with Angular exclusively your entire life.
A popular way to do that in NodeJS is via .env
files and using the dotenv
package. So, let's install it:
npm install dotenv
Then, create a new file called .env
in the root of your project, and add the following line:
GEMINI_API_KEY=your_api_key_here
Then, we will slightly modify our server.js
file to load the environment variables from the .env
file:
const express = require('express');
require('dotenv').config(); // load environment variables from .env file
// rest of the code stays the same for now
Now, to see this in action, let's modify our "Hello!: endpoint and make it actually generate some content via Gemini. First, we will import the GenAI SDK< create an instance of the API client, and then use it to generate some text:
const genAI = new GoogleGenAI({});
app.get('/', async (req, res) => {
const response = await genAI.models.generateContent({
model: 'gemini-1.5-pro', // specify the model to use
contents: 'Give me a random greeting',
});
res.json(response); // return the response as JSON
});
Now, what we see here is very simple without even an explanation; we create an instance of GoogleGenAI, and then in our endpoint, we call the generateContent
method, specifying the model we want to use (in this case, gemini-1.5-pro
, which is a pretty capable model for most tasks), and the content we want to generate. The response from the API is then returned as JSON.
One interesting thing here is that we did not specify the API key anywhere in the code. This is because the GenAI SDK automatically picks up the API key from the environment variable GEMINI_API_KEY
, which we set in our .env
file. This is a very convenient feature, as it allows us to keep our API key out of the source code completely.
Now, let's head to http://localhost:3000
in the browser once more to examine the response. We might see something like this:
app.use(express.json()); // to parse JSON bodies
{
"sdkHttpResponse": {
"headers": {
<A lot of header here>
}
},
"candidates": [
{
"content": {
"parts": [
{
"text": "Howdy!\n"
}
],
"role": "model"
},
"finishReason": "STOP",
"avgLogprobs": -0.0416780412197113
}
],
"modelVersion": "gemini-1.5-pro-002",
"usageMetadata": {
"promptTokenCount": 5,
"candidatesTokenCount": 3,
"totalTokenCount": 8,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 5
}
]
}
}
There might be other fields in the response too, but we mainly care about these 3 top level fields, so let's quickly explore them:
sdkHttpResponse
: This contains the raw HTTP response from the API, including headers and status code. This can become very useful if the model, for whatever reason, returns an error, and we want to either debug it or show a proper message to the user.candidates
: This is the main star of the show, as it contains the actual generated content from the model. In this case, we asked for a random greeting, and the model responded with "Howdy!". Thecandidates
array can contain multiple responses, which will become important when we start streaming responses instead of picking the finalized oneusageMetadata
: This contains information about the token usage for the request, which can be useful for monitoring and optimizing costs. We will explore this a bit later.
Now, since we have our backend set up, we can now create an Angular service that will make requests to our backend instead of directly to the Gemini API.
Creating an Angular service to interact with our backend
For now, we only covered a very small portion fo what we can do with Gemini API, so let's take it a step further and define an endpoint that actually takes some input from the user and responds to it, instead of just generating a greeting:
app.post('/generate', async (req, res) => {
const { prompt } = req.body; // get the prompt from the request body
if (!prompt) {
return res.status(400).json({ error: 'Prompt is required' });
}
try {
const response = await genAI.models.generateContent({
model: 'gemini-1.5-pro',
contents: prompt,
});
res.json(response);
} catch (error) {
console.error('Error generating content:', error);
res.status(500).json({ error: 'Failed to generate content' });
}
});
While this a bit more code than we had previously, it does not do anything complex, and simply takes a prompt
from the request body, and uses it to generate content via Gemini. If the prompt is missing, it returns a 400 error, and if there's any error during the generation, it returns a 500 error, pretty standard stuff.
Now, we can go ahead and create an Angular service that will make requests to this endpoint.
export type GeminiResponse = {
candidates: {
content: {
parts: {
text: string
}[]
}
}[];
}
@Injectable({providedIn: 'root'})
export class GenAIService {
readonly #http = inject(HttpClient);
generateContent(prompt: string) {
return this.#http.post<GeminiResponse>('http://localhost:3000/generate', {prompt}).pipe(
// map the response to just return the generated text
map(
response => response.candidates[0]?.content.parts[0].text || 'No response',
)
)
}
}
As we can see, the GeminiResponse
type we created is a quite intimidating on its own, even more so considering we omitted most of the fields, leaving behind only the part that actually contains the generated text. However, don't be scared by it, simply copy it and keep, since 90% of the time you will only care about the inner parts
field that contains the actual data.
Now, we can set up a component to use this service and display the generated content:
@Component({
template: `
<div class="container">
<h2>AI Text Generator</h2>
<form #textForm="ngForm" (ngSubmit)="generateResponse()" class="form">
<div class="input-group">
<label for="prompt">Enter your prompt:</label>
<textarea
id="prompt"
name="prompt"
[(ngModel)]="prompt"
required
placeholder="Type your prompt here..."
rows="4"
class="textarea">
</textarea>
</div>
<button
type="submit"
[disabled]="!textForm.form.valid"
class="submit-btn">
Generate Response
</button>
</form>
@let response = generatedResponse();
<div class="response-section">
<h3>Response:</h3>
<div
[class.response-box]="response.error === null"
[class.error-box]="response.error !== null">
{{ response.text }}
</div>
</div>
</div>
`,
})
export class GenerateTextComponent {
readonly #genAI = inject(GenAIService);
prompt = signal('');
generatedResponse = signal<{text: string, error: string | null}>({
text: '',
error: null,
});
generateResponse() {
// I would be very very happy to do this via resources
// but they do not yet support POST requests
// P.S. read more about resources in my article: https://www.angularspace.com/meet-http-resource/
this.#genAI.generateContent(this.prompt()).subscribe({
next: (response) => this.generatedResponse.set({
text: response, error: null,
}),
error: () => this.generatedResponse.set({
text: '', error: 'Error generating text',
})
});
}
}
As we can see, on the frontend part this is reasonably simple, as we just invoke our service, make the HTTP request, store the data in a signal, and display it. Nothing too fancy, and nothing too complex.
At this point, we might want to get excited and jump into more complex stuff, like making chats, streaming responses and so on, however, I suggest we make a sideways move and explore a little bit some of the configuration options we have when making requests to Gemini, since this will help us a lot in the future.
Configuring Gemini API
Configuring models
Let's go back for a moment, and remember that we selected a specific model to make requests to, namely gemini-1.5-pro
. While this is a very capable model by itself, we might want to explore other models, since different tasks might require using more (or sometimes, surprisingly, less!) powerful models.
We can do this by creating an endpoint that specifically lists the available models:
app.get('/models', async (req, res) => {
try {
const response = await genAI.models.list();
res.json(response);
} catch (error) {
console.error('Error listing models:', error);
res.status(500).json({ error: 'Failed to list models' });
}
});
Now, we can create a component with an httpResource
that allows us to see the list of models:
@Component({
template: `
<div class="models-container">
<h2>Available Models</h2>
@if (modelsResource.isLoading()) {
<div class="loading">Loading models...</div>
}
@if (modelsResource.error()) {
<div class="error">Error loading models: {{ modelsResource.error() }}</div>
}
@if (modelsResource.value(); as models) {
<ul class="models-list">
@for (model of modelsResource.value()?.pageInternal; track model.name) {
<li class="model-card">
<h3>{{ model.displayName }}</h3>
<p><strong>Name:</strong> {{ model.name }}</p>
</li>
}
</ul>
}
</div>
`,
})
export class ModelsListComponent {
modelsResource = httpResource<{pageInternal: {name: string, displayName: string}[]}>(
() => 'http://localhost:3000/models'
);
}
Now, if we open this component, we will see a big list (around 50) of Gemini models, each tailored for different tasks. Some models are better at reasoning, useful for tasks involving complex instructions and logic, some are simply better as conversational and work faster, some are specialized for image or video generation, and some are embedding models (we will learn about those in later articles of this series).
Choosing a model is a challenging task, and often requires some trial and errors, but in general it's important to keep in mind that choosing a model usually comes down to balancing the following three factors:
- Cost of the model: more capable ones are usually more expensive
- Speed of generation: models that have reasoning capabilities are usually slower unless we disable the thinking mode, but can be better for solving tasks instead of just text generation
- The task at hand we're trying to solve will influence the model choice, very often we do not need to latest and shiniest models, just a simple one that can do the job decently, saving us money and the user's time
You can read way more about the models, their capabilities and pricing in the official documentation.
Now, let's take a look at what actually goes into pricing the model usage
How much will you spend
Let's go back to our very first example, and take a look at the usageMetadata
field in the response:
{
"usageMetadata": {
"promptTokenCount": 5,
"candidatesTokenCount": 3,
"totalTokenCount": 8,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 5
}
]
}
}
As we can see, the usageMetadata
field contains information about the token usage for the request. If you're unaware what "tokens" are in the context of LLMs, keep reading, but if you know what that is, feel free to skip the next 3 paragraphs.
To understand what tokens are, we need to understand (not deeply, at a very high level), how LLMs work. A large language model generates text by predicting the next "token" in a series of tokens. A token can be a word, a part of a word, a punctuation mark, or a special symbol.
To quickly and visually understand what tokens are, we can use the OpenAI tokenizer tool. For example, if we input the text "Hello, world!", we will see that it is broken down into 4 tokens: "Hello", ",", " world", and "!".
LLMs work by taking your text, breaking it down into tokens, and then predicting the next token based on the previous ones. This is how they generate coherent and relevant text. It is important to know that tokens are essentially fixed, so in different contexts, the same text will always be broken down into the same tokens.
Understanding what tokens are, we can now understand how Gemini API pricing works. The amount you will be charged for using a model is based on the number of tokens processed during your requests. This includes both the tokens in your input (the prompt you send to the model) and the tokens in the output (the text generated by the model). If we revisit the Gemini API models list page, we will see that each model has a different price for input and output tokens (with output tokens usually being more expensive).
While the prices might seem intimidating at first, it's important to take note that those are prices for generating 1 million (!) tokens, which, for our local, learning-oriented app is simply a grotesquely big amount. It is roughly equal to 750,000 words, which is double the word count of the entire "Lord of the Rings" trilogy! So, for most learning and prototyping purposes, you will be spending just a few cents, if anything at all (and that is if you go out of the free tier limits).
Now, as we gathered an understanding of how the API pricing works, let;s finally explore some parameters we can use to modify the responses we get from an LLM at a very high level before finalizing the first part of this series.
LLM response configuration
Congratulations, you have arrived at the buzzword section of this article! Here, we will explore some of the most important options LLMs can take, of which you have undoubtedly heard of, even if maybe not comprehending them fully. These are temperature
, topP
, topK
, maxOutputTokens
.
Here's a high level overview:
temperature
: This parameter controls the randomness of the model's output.
Previously, we explained that LLMs generate text by predicting the next token based on the previous ones. This was a bit of an oversimplification; LLMs don't come up and say "the next token is cat
!". Instead, they generate probabilities for each possible token first, so, for instance, they may say something like "there's a 30% chance the next token is cat
, a 25% chance it's dog
, a 15% chance it's fish
, and so on". And usually, they do not simply pick the most probable token. Think about it, if they simply always chose the most probable token, they would become robotic and repetitive, which is not what is usually desired.
Instead, they often try to pick some tokens that might be less probable, but still make sense in the context. This is where temperature
comes into play; it controls how much randomness we want in the token selection process. A low temperature (e.g., 0.2) makes the model pick the highest probable words more often, resulting in a more predictable text, while a higher temperature (e.g., 0.8) makes the model pick less probable words more often, resulting in more creative text.
Note:
temperature
is not some hard science setting; you might have heard somewhere that a temperature of "0" makes the model "deterministic" (oh boy do AI folks love buzzwords), but in reality, it is still not what we would mean by that words in a usual, literary context, since a slight variation in the input prompt (like a missing comma) might still result in a vastly different LLM output. Temperature can be useful from time to time, depending on the task, but is not a silver bullet to fix your LLM's output
-
topK
: This parameter is used to make a hard limit for choosing the next token; if temperature allowed us to pick less probable tokens out of all possible tokens,topK
actually just limits the set of tokens to choose from, so if we set the limit to, say, 3, the model will only consider the 3 most probable tokens when picking the next one. To be honest, most people usually do not bother with this, since it is a quite blunt tool and there is no way to predict if actually the 3 (or 7, or 12) tokens are actually the relevant ones, or if some lower-valued token is actually better -
topP
: This parameter is a bit more complex and more useful thantopK
. Instead of limiting the number of tokens to choose from, it limits the cumulative probability of the tokens to choose from. "Cumulative" here simply means "adding up the probabilities until we reach a certain threshold". For example, if we settopP
to 0.9, the model will consider the most probable tokens until their combined probability reaches 90%. This allows for a more dynamic selection of tokens, as the number of tokens considered can vary based on their probabilities. -
maxOutputTokens
: This parameter simply limits the maximum number of tokens the model can generate in its response. This is useful to prevent the model from generating excessively long responses, which can be costly and time-consuming. For example, if we setmaxOutputTokens
to 50, the model will stop generating text after producing 50 tokens.
Important: the
maxOutputTokens
just bluntly limits the amount of generated tokens, it does not make the model generate shorter text. It is supposed to be a more of a safety net to ensure your LLM does not go ballistic and cost you a fortune. For generating shorter content, you should look into writing prompts that guide the model to try to be shorter (again, not an exact science, but usually works decently)
All the parameters we mentioned are available in the Gemini API SDK, and we can now modify our text-generation endpoint to accept them as well:
const response = await genAI.models.generateContent({
model: 'gemini-1.5-pro',
contents: prompt,
config: {
topP: 0.5,
temperature: 0.1,
maxOutputTokens: 50,
}
});
If we now retry the same messages we put into our Angular app's generate page, we might see more coherent and fixed responses, and, since maxOutputTokens
is set to such a low value, some messages might be cut-off.
Conclusion
Wow, I bet this was a lot of information. However, this article might also leave you feeling that we have just scratched the surface (which is true). So, let's recap
- We learned how LLMs generate text, learned about tokens, configurations parameters like temperature,
topP
and so on - We learned how to create a Google Cloud project, get a Gemini API key, and use the SDK to make text generation requests
- We learned about the diverse set of models we can utilize in the future for different tasks
- We did all of this while requiring minimally more knowledge than any Angular developer possesses
If this was exciting, wait until you hear about the next article! In the second one, we are going to
- Learn how to stream responses to make up for a better UX
- Learn how to create chats, and maintain context between messages
- Learn a bit of prompting to secure better responses from the model
- Touch on structured outputs which can help us solve more direct tasks than just text generation
I hope you enjoyed this article, and see you in the next one!
Small Promotion
My book, Modern Angular, is now in print! I spent a lot of time writing about every single new Angular feature from v12-v18, including enhanced dependency injection, RxJS interop, Signals, SSR, Zoneless, and way more.
If you work with a legacy project, I believe my book will be useful to you in catching up with everything new and exciting that our favorite framework has to offer. Check it out here: https://www.manning.com/books/modern-angular
P.S There is one chapter in my book that helps you work LLMs in the context of Angular apps; that chapter is already kind of outdated, despite the book being published just earlier this year (see how insanely fast-paced the AI landscape is?!). I hope you can forgive me ;)

