However, your prices are high. I just checked, and a dsPIC 33FJ64GP802 in QFN package sells for $33.68 USD on microchipdirect for 1000 pieces. The price will be lower for real production volumes.
There are two signalling states on a CAN bus, recessive and dominant. Recessive is what happens when nothing is driving the bus. Both lines are pulled together by a total of 60 Ω. A normal canCAN bus as implemented by common chips like the MCP2551, should have 120 Ω terminators at both ends, hence a total of 60 Ω pulling the two differential lines together passively. The dominant state is when both lines are actively pulled apart, somewhere around 900mV from the recessive state if I remember right. Basically, this is like a open collector bus, except that it's implemented with a differential pair. The bus is in recessive state if CANH-CANL < 900mV and dominant when CANH-CANL > 900mV. The dominant state signals 0, and the recessive 1.
The CAN bus arbitration rules mean you have to be watching the bus partway thru every bit you are sending as a 1 to make sure someone else isn't sending a 0. This check is usually done about 2/3 of the way into the bit, and is the fundamental limitation on CAN bus length. The slower the bits rate, the more time there is for the worst case propagation from one end of the bus to the other, and therefore the longer the bus can be. This check must be done every bit where you think you own the bus and are sending a 01 bit.
Another problem is bit rate adjustment. All nodes on a bus must agree on the bit rate, more closely than with RS-232. To prevent small clock differences from accumulating into largesignificant errors, each node must be able to do a bit that is a little longer or shorter than its nominal. In hardware, this is implemented by running a clock somewhere around 9x to 20x faster than the bit rate. The cycles of this fast clock are called time quanta. There are ways to detect that the start of new bits is wandering with respect to where you think they should be. Hardware implementations then add or skip one time quanta in a bit to re-sync. There are other ways you could implement this as long as you can adjust to small diferences in phase between your expected bit times and actual measured bit times.
Either way, these mechanisms require various things be done at various times within a bit. This sort of timing will get very tricky in firmware, or will require the bus to be run very slowly. Let's say you implement ata time quanta system in firmware at 20 kbits/s. At the minimum of 9 time quanta per bit, that would require 180 kHz interrupt. That's certainly possible with something like a dsPIC 33F, but will eat up a significant fraction of the processor. At the max instruction rate of 40 MHz, you get 222 instruction cycles per interrupt. It shouldn't take that long to do all the checking, but probably 50-100 cycles, meaning 25-50% of the processor will be used for CAN and that it will need to preempt everything else that is running. That prevents many applications these processors often run, like pulse by pulse control of a switching power supply or motor driver. The 50-100 cycle latency on every other interrupt would be a complete show stopper for many of the things I've done with chips like this.