start from the load and work your way backwards, all the way to the source.
Agreed, this is the best method.
If you want 1W into 8Ω (for a sinewave) then some basic calculations show you will need peak values for voltage and current of:
Voltage: +/-4V across the load, and
Current: +/-0.5A into the load.
The +/- signs are there to remind us that we need both polarities, positive and negative, for a sinewave waveform (sound or music). Just note that, for a sinewave, 2W peak power is required to deliver 1W average power.
Since you have a 9V power supply (a battery?) that leaves only 0.5V remaining as "headroom" for the amplifier to work correctly in each polarity - and that is assuming that the power supply is split perfectly 50/50, ie: that both the biasing and the big dc-blocking capacitor (between the load and amplifier) are perfect, such there is effectively +4.5V and -4.5V available to drive the power into the load. From experience, I know that the biasing and dc-blocking will not be perfect, and you will end up with less than +/-4.5V to drive the load.
So achieving 1W with low-distortion from a 9V battery into an 8Ω load will be challenging, so let's set a more realistic target of 0.5W into an 8Ω load - you will probably prefer to hear music at 0.5W at low distortion, rather than 1W at high distortion. For this power level we will need peak values for voltage and current of:
Voltage: +/-2.83V across the load, and
Current: +/-0.35A into the load.
Design Procedure
Step 1: Power Dissipation in Output Stage
Assuming this power is being delivered by a push-pull output stage biased at the mid-point of the supply voltage (Vcc=9V), then each output transistor will suffer a power dissipation of about 0.25W average (0.5W peak). Looking at the datasheet for the transistors you used (2N3904, 2N3906), the thermal impedance from junction to ambient is 200C/W (no heatsink):

This means the internal temperature of the transistor (the "junction") will be at 50C above ambient (0.25W * 200C/W = 50C). For a BJT this is OK, so let's proceed.
Step 2: Voltage gain
Your signal source is a guitar pickup, having an output amplitude of ~300mV (let's assume that is peak, ie: 600mV peak-to-peak) with an internal impedance of 10kΩ. So as to not load our signal source too much, let's set our amplifier input impedance to be 10 times that of the source impedance, so 100kΩ (even this is too much of a load for real hi-fi, but for the purposes of this discussion let's proceed with this for now). The voltage gain we need will be:
Vout peak = 2.83V
Vin peak = 0.30V
Av (gain) = 2.83 / 0.3 = 9.4, so quite modest.
It is interesting to see the power gain we seek:
Input power = 300mV peak into 100kΩ = 0.9 micro-watts peak.
Output power = 1W peak into 8Ω.
Power gain ~ 1.11 million - yes, over one million!
Which brings me to the point I wanted to make: what we are really amplifying here is power rather than voltage.
Step 3: DC blocking capacitor
The lowest frequency for a guitar is about 70Hz. Let's calculate the dc-blocking capacitor required to have a cut-off frequency of 70Hz with a resistor of 8Ω:

The cutoff frequency is given by:-

...from which we compute C as 284uF, so select an electrolytic capacitor of any value larger than this, say, 330uF 16V. At full power and low frequency, there will be an AC voltage across the capacitor since its "impedance" is not zero (from basic circuit theory, any component carrying current will have a voltage across it). We can say there is a voltage drop across this capacitor, which means the amplifier must supply both the load voltage and the voltage drop across the capacitor. Let's get an estimate of what this voltage drop will be, we can use the formula for the impedance of a capacitor:-

Plugging in our values of 70Hz and 330uF we get:
Zc = j6.9Ω yes, that is similar to the load resistance.
To deliver 0.5W to the 8Ω load at low frequency, the output voltage of the amplifier must swing higher than the load voltage to compensate for the voltage drop of the blocking capacitor. The amplifier has to push the required current (0.35A peak) through the series connection of the blocking capacitor (330uF) and the load (8Ω); using basic circuit theory for a complex load we can calculate this as follows:
Zload = 8Ω + j6.9Ω,
which has magnitude = Sqrt (8^2 + 6.9^2) = 10.56Ω
To calculate the voltage across this load we convert the load current peak to its RMS value, then multiply by this impedance:
V = 0.35A * 0.707 * 10.56Ω = 2.61V rms.
Now we convert this back to peak we get: 3.7V peak.
So, if we wanted to have a low distortion frequency response down to 70Hz, the output of the amplifier needs swing 0.9V higher than the voltage at the load in order to overcome the voltage drop of the blocking capacitor. So you can hopefully see that we are getting closer to our hard limit of 4.5V peak that is available from our 9V battery.
Step 4: Base Current Required to Drive the Output Push-Pull Stage
The load current is 350mA peak (for 0.5W average into 8Ω). Assuming the output transistors are as you have indicated (2N3904 & 2N3906) the datasheet advises that current gain falls sharply, for both types, as Ic increases over 20mA, refer charts below (note that the vertical scale (Hfe) is logarithmic).


The datasheet states that at Ic=100mA Hfe = 30 (refer below). The graph above tells us that Hfe drops from 0.3 (normalised) at Ic=100mA, to less than 0.1 (normalised) at Ic=200mA. So Hfe at 200mA will be one-third what it was at 100mA, ie: 10. And 300mA is off the chart - so it will be even lower again. This suggests that these transistors may not be suitable - we ideally want Hfe to remain over 50 for the entire load range up to full-rated power.

However, for the purposes of this discussion let's persist with these transistors, and let's assume that Hfe=10 at Ic=350mA, so at peak output current we expect the base current to be:
Ibase(peak) = 350mA/10 = 35mA.
To source this current from a load resistor in the collector of the voltage amplifier BJT, then the resistor value will be quite low, about:
1.0V / 35mA = 28Ω.
With such a low resistance it will be difficult to design a C-E single BJT voltage amplifier with sufficient gain.
There are at least two solutions to this problem. One solution is to have an active load (constant current source) as the load of the C-E stage. The other solution is to apply a bootstrap. The circuit below shows the bootstrap method, but even this required an extra transistor (Q3) to ensure sufficient base drive to Q1 without excessive loading of the voltage amplifier stage (VAS).
0.5W Amplifier with Boot-Strap

Description:
Q1 & Q3 form a "Sziklai pair" and together with Q2 forms the push-pull putput stage of the amplifier. D2 and D4 provide some bias for the output stage.
Q3 ensures there is sufficient base current to drive Q1. Q4 performs the same function for Q2 as Q3 does for Q1 (in addition to being the voltage amplifier). C6 provides the bootstrap function which dramatically reduces the swing of current in R34 as the output voltage swings, which greatly increases the voltage gain at signal frequencies of the voltage amplifier formed by Q4.
Q5 (emitter-follower) buffers the input signal, and together with the boot-strap of C5, R27 & R2, greatly increases the input impedance of the amplifier. DC bias of the output is set to ~4.5V by the bias network R32, R1, & R25, as well as R3 which provides the feedback at DC.
Overall voltage gain is set by R3, R5 & C3. At signal frequencies, the impedance of C3 goes to zero, so the voltage gain is set by the ratio:
Av = R3 / R5 = 1k /120 = 8.33.
Update 2023-11-13: Improved 0.5W Amplifier with Boot-Strap:

Adding one more BJT allows significant improvement in performance. Referring to the circuit above, one BJT was added to make the output stage fully symmetrical, ie: Sziklai pairs for both the top and bottom locations. For this application (very low supply rail voltage) the Sziklai pairs are preferred over Darlington, due to one Vbe drop rather than two from VAS output to the load. Bias via D2 & D4 remains unchanged.
For Q5, the voltage amplifier, this change removes the burden of supplying heavy base current, so the entire VAS can be re-designed to focus purely on it's primary role: voltage amplification. To that end, two other changes of note:
- Separate DC and AC feedback paths. DC feedback is set by R8, C3, & R3, and set the DC bias (at [Q1_Vout]) = 4.5V ie: the midpoint of VCC. AC feedback is provided by R9 acting into R3 & C3, and is taken directly from the load after the DC-block capacitor, C1. This allows the amplifier to compensate for voltage drop across C1, which extends the low-frequency response, and allows for a smaller value for C1. However, there is a limit to what can be achieved here, of course, primarily due to the limited output voltage swing available from a 9V supply.
- Bias network (R32, R1 & R25) changed. This was necessary to permit the AC feedback, and is quite simple: R3 & R8 had to be larger value to permit high value for R9 (low loading effect on output of Q6). So bias voltage at base of Q6 had to be moved down toward 0V.
Minor Changes:
- Added high-frequency gain roll-off, refer C8.
- C5 is larger, due to smaller values of R2 & R27 that were necessary due to changes in bias.
0.5W Amplifier with Active Load
Update 2023-Nov-13, 2200hrs
Here is the schematic of a design using an active load in the voltage amplifier stage (VAS).

Changes compared with the previous schematic (boot-strap):
- Boot-strap capacitor (C6) removed. This has been replaced with a new BJT configured as a constant current source of about 1mA: Q7, R7, D5, D6 & R10.
- Bias changed to "ampflied diode": Q8, R5, R6. This allows fine control over bias voltage (change R5), which will be necessary in any practical implementation to control the quiescent bias current in Q3 & Q4.
- Input signal is passed through a band-pass filter, about 20Hz to 100kHz, formed by C6, R21, R20, R13, & C8. This is to limit the frequency range of the signal reaching the amplifier input at Q6.
- Bias current of Q6 set by R24=2k2 (was 1k). [Q1_Vout] bias voltage is set to 3 * Vbe above base voltage of Q6 (R24, R12, R11).
- V7 sets the bias voltage now - this is just for convenience, if this design was be built then R18 must be adjusted to get the same voltage as V7.
- AC feedback network changed for a mid-band voltage gain of 9 (refer the "strange" value of R9), and some high-frequency roll-off (R19, C9). This gain was chosen so that clipping was avoided with input of 300mV peak, but power is 450mW, a little below the target value of 500mW.
- C1 was increased from 330uF to 470uF. This was to extend the low end of the frequency response.
Performance
Waveforms are presented at three frequencies:
mid-band = 1kHz
Low = 75Hz
High = 10kHz
Output power is nominally 450mW into 8Ω at mid-band.
The average power dissipation in Q1 & Q2 is about 270mW in all cases, so these should run at an acceptable temperature without a heatsink.
Images below: Actual output voltage is shown compared with ideal output, compare [Vload] with [Vin * 9]. Also shown is [Q1_Vout - Vload]; this is the voltage across C1, which is significant at 75Hz. This demonstrates the benefit of the separate feedback paths, AC & DC, which compensate for this voltage drop.
F=1kHz. Pload = 453mW

F=75Hz. Pload = 475mW

F=10kHz. Pload = 420mW

1W Amplifier using Bridged Twin Amplifiers
Update 2023-Nov-25, 0800hrs
Here is a design that uses two identical amplifiers arranged as a bridged pair. The main benefit is that there is plenty of voltage swing available, so it can easily deliver 1W into 8ohms from a single 9V battery. In fact, it can deliver up to 4W to the load provided the output stage (and the battery!) can handle the extra current.
The other major benefit: the big, bulky (and perhaps sonically disagreeable) DC blocking capacitor between the amplifier and the load is eliminated.
Each amplifier is identical, and has the standard topology: a diff-pair input stage followed by VAS stage using an active load, followed by the Sziklai output stage voltage-follower as presented before. The diff pair simplifies the feedback network and makes it easier to configure two amplifiers in a bridge arrangement. The active load in the VAS stage was mentioned previously.
Image below: The overall schematic, note:
- The connection of the output of Amplifier 1 to the inverting input
of Amplifier 2.
- The connection of the input of Amplifier 2 to 0V via a low-value resistor, rather than to the input signal.

Image below: Detail of one amplifier.

Image below: Frequency response, -3dB points at 20Hz and 120kHz:

Image below: Waveforms for 1kHz @1W into the load.
Note that each output transistor now dissipates 0.5W which may be too high (double the dissipation of the previous designs), so I suggest either putting transistors in parallel to share the load current (which will require resistors in the emitter and base circuits to force good current sharing), or use transistors of higher power rating, such as BD139, BD140.

8 Ω speaker at its full potential(there used to be cheap(ish) 18" drivers. (And giant enclosures, just after the extinction of dinosaurs.)) there's the potential of that 9 V battery, too: give everyone a hint about both. More important: What is the purpose of designing yet another discreet BJT guitar amplifier? \$\endgroup\$