The above video goes away if you are a member and logged in, so log in now!
@user112: "Wouldn't they fit in the cache?": probably yes, but there's also bitstream data and audio decoding...
"But leave there a possibility to use VUs for IDCT, I'm looking into it.": sure
@user112: ...well, the idea doesn't work. It's ~25% slower, so 1.7 stuff is the fastest one. So, I'm giving up with it...
IDCT and VUs curiosity
IDCT and VUs curiosity –
First of all, thank you EEUG for your very impressive work in SMS! It is going to be some months before I am in any position to experiment with SMS code, if at all (my two little daughters don't let me much spare time , and my experience in assembler is just a little of 68040 programming several years ago). But it is certainly quite interesting all the issues related to programming the Emotion Engine, and I don't think the PS2 is going to die on November this year (I am not going to purchase any new console in some years). I have just installed a HD to my PS2 two months ago, and soon after discovered that it is possible to program the PS2 without the Linux kit. So nowadays, I am digging up several web sites just getting acquainted with the possibilities. I hope to install the development environment and your sources soon (my first attempt failed). Nevertheless, since you are talking about some performance issues, I would like to know if you have just given some of them a thought.
Are you talking about the VUs for IDCT? Perhaps is my ignorance, but is it not possible to program VU0 as a coprocessor interleaving its instructions with those of the CPU? Certainly, the program would be much less legible, but it should improve performance, even needing to transform between integer and floating point (there are FPU instructions to this purpose, aren't they?), since at first glance all the divisions seem avoidable. Sorry if this idea is stupid, I haven't read the source code yet .
Originally Posted by EEUG
Another idea, that I don't know if it is possible with the free development environment, would be to flow data through the two VUs (both in VLIW mode) and come back to the CPU (I don't know if it is possible yet) in order to perform the IDCT; meanwhile the CPU could be doing something else. At least the source code of your test example (this one I have seen already), seemed quite well structured for a similar approach. At least _idctRowCondDC seems quite fit for VU's SIMD instructions. Regarding _idctSparseColPut, well, I will have to understand what it is doing first .
Finally, just to make myself clear, I am not asking you to make anything but sharing your thoughts .
@Serdna: ...no, I didn't tried VUs yet (busy with MP3 related stuff). I've just tried to transfer and rearrange macroblocks to scratchpad RAM using DMA controller while computing IDCT by main CPU. Well, IDCT computation was quite fast, so, CPU has to wait each time for DMA controller to finish data transfer. Actually, it would improve performance if macroblock address is known in advance, so we can initialte data transfer long before it needed. But in MPEG4 case there're 6 possible scenarios of macroblock motion (see _MPEG_Motion function), so, it's rather difficult to calculate that address in advance. We can try to use VU0 in micro mode to perform IDCT (and maybe unquantization also) (only for motion compensation case), so the process will look like that: transfer DCT coefficients from SPR to VU0 memory using DMA controller and VIF0, launch IDCT microprogram, and continue with motion compensation. In this case both CPU and VU0 will operate in parallel, so, this can improve performance. Other option would be to use VU1 to perform YUV->RGB conversion, but that's just for possible mpeg1/2 decoding, because now that conversion is performed by IPU quite fast...
Edit: ...btw., my personal situation with time etc. is much like yours ...
Last edited by EEUG; 04-21-2006 at 07:27 AM.
@EEUG: Thanks! I will keep studying/searching and I will come back later .
I play my videos over the network using ps2client/radhost , unfortunately the higher quality ones tend to break up a lot.
I'd like to play a bit with the stream buffering logic. Can anyone point me in the right direction ? There is quite a lot of code :-)
@jfassad: ...it's all inside FileContext.c (STIO_xxx routines for "standard" I/O and CDDA_xxx ones for CDDAFS disks). My pesonal opinion is that there're limitations of either IOP itself or network driver. But maybe I'm wrong...
Should be a network driver issue. The same videos play OK from my PS2 HDD.
just did a svn checkout and the compilation now fails with the following error:
make: *** No rule to make target `obj/SMS_Utils.o', needed by `bin/SMS.elf'. Stop.
there is no SMS_Utils in the source.
...I forgot to add it. Now it's there...