Thanks for the answers! I had to sort them out by question number to really get a big picture of what you were all saying. I also considered answering to everyone individually but that seemed counter-productive.
1) I expected that sort of complicated/not-so-sure answer. I figure topology will play an important role as well as cultural unity. The 500 years pattern seems a good basis.
2) I think I mis-asked. I meant how many people would speak a proto-language, not how many would make a tribe. For instance, I guess PIE was spoken by more than 50 people. Actually, Count Iblis' "I'd expect there to be lots of tribes speaking very similar languages" and Boskobènet's "Linguistic diversity would then depend on how much contact these tribes have." are what I was expecting.
The reason I ask is that, for example, NW North America had, to the extent of my knowledge, a very high linguistic diversity in a rather small region, compared to central North America where it seems the diversity was less in a bigger region. That may be due to the topology of NW NA which I'm not sure of to be honest.
So, based on what I read in Diamond's book and some other knowledge, mountains make for high diversity regions while plains for less diverse regions due to the relative isolation of tribes.
3) I totally forgot about stratums. I think they can add a lot of flavor to a family... Especially since the one I'll be focusing on in the next month is supposed to expand big time, eventually replacing or displacing other populations (and being replaced in some places).
The only issue is that it makes things even more complicated and intricate
4) I guess it also depends on the extent of the species. I have a few different species in my conworld, although 4 of them are from the same family; they don't live in the same areas (for now), so knowing their full extent at the (arbitrary) birth of language might point to some clues...
... which leads me to a question I believe is quite difficult and not to be resolved in the near future: how could it be explained that there's so different (and unrelated) language families? If we take the Out of Africa hypothesis, it would make sense that all languages are ultimately related or at least existed in some form in Africa. Else I guess it would suggest that language developped independently everywhere else.
(Though perhaps it's more gradual and wave-y: humanoids spread OoA, eventually language develops in a few places but not fully yet, then movements of population spread the development of language among all humans.)
Basically I have trouble finding a common ground between the fact of many unrelated (keyword) language families and a unique origin for a species. On the other hand, a multi-regional origin and a uniform development doesn't add up either.
That said I'm not very knowledgeable in that area, probably saying completely worthless stuff here.
6) Yeah, a regular proto-language is what I'm going for. Simple enough to create and interesting to then develop, as I don't plan on using it for anything else.
And yeah, I could use pretty maps and tables to cover up stuff still in the making