For the most common types (SAR and SR ADCs reading a 0-VREF or ±VREF range), it's simply a fraction of that range.
In other words, the output is an integer representing the fixed-point fraction of the input as a fraction of VREF.
This is most apparent when the result is left-justified, i.e. a 12-bit result is zero-padded on the right so that its value is 0...0xfff0 (where "0x10000" would be full scale, but is not representable in 16 bits, so we naturally have a number system where the full range from zero to (full scale - 1 LSB) is representable -- identical to the ADC's reading). We can treat this as a fraction, by multiplying it by another number (representing some other unit in the digital domain; a voltage calibration factor, say), and shifting the appropriate number of bit positions to maintain a 16-bit result. Say our VREF is 5V and we want to represent that as 4.12 bit fixed point (so, integral part from 0 to 15, a bit over 3 digits worth of fraction -- this would be a X.XXX reading), we'd multiply by (uint32_t)(5.0f * (1 << 12)) (replace 5.0f with the exact calibration factor) to get a 32-bit result*, then shift-right the result by 16 to get the calibrated output.
*Recall that number of bits is additive under multiplication, so a 16x16 gives 32-bit result. This is unfortunately confused by several languages, like C, where the result has the same size as the largest argument. Thus, forcing one argument to 32-bit size is necessary in C.
We might instead use the raw ADC value (right-justified), and we could treat that as e.g. a 4.12-bit fixed-point value directly; or we could accumulate 16 successive samples (as a decimation or sliding-average filter, take your pick), which sums up to the same range -- but gets us a few more ENOB (effective number of bits) due to filtering out some of the quantization noise (assuming some suitability conditions apply). This reduces the frequency response of course, so assumes the sample rate is higher than the required signal bandwidth by as much. In any case, this gives us a 16-bit result. The statistical gain from N samples is lg(N)/2 bits, so we expect a confident 14-bit result, with two extra bits "for flavor". (These bits can be helpful down the line as conserving rounding errors.)
(Mind that filtering does not improve the systematic error of the ADC, or overall system, it only reduces the noise floor due to sample-to-sample variation.)
One of the upsides to this is, by handling a purely ratiometric value throughout the calculation, well first of all, anything that is itself ratiometric, is trivial -- if you have a bunch of analog status inputs (e.g. thermistors, potentiometers, Hall effect sensors, etc.), and you run them all from the ADC VREF (or a buffered copy thereof), their results are perfectly proportional, and VREF tolerance is irrelevant!
But also, even if the values carry units, if you don't have to do any other unit relations internally, you can apply the calibration factor at the very end, and now the gain of your entire signal chain, minus the output itself, is perfectly predictable: you don't have to worry about saturation anywhere along the chain, that can't be anticipated from any sequence of possible ADC readings. Potential saturation is independent of calibration factors!
One of the downsides of number formats like this, is handling overflow, and implementing saturating arithmetic where applicable. You need to ensure all manipulations are done with adequate overhead, and/or with gain strictly less than or equal to 1 so that overflow (wraparound) does not occur; or to detect overflow at every step of the calculation and saturate accordingly.
Intermediate 32-bit calculations are typical; even on 8-bit systems like AVR, largely because C doesn't have a native 24-bit type as such. (Yes, there are __uint24 and such types, but what I mean is, avr-gcc emits 32-bit operations regardless. You can construct 24-bit operations just fine in ASM, of course.) Saturating arithmetic is also a pain in C, as operations don't produce an intrinsic carry result; it's slightly less painful in ASM, but far more tedious (as ASM is wont to be).