May 30, 2008
Bugs bugs and more bugs
It seemed there was no stop to it! Thanks to my friend Jano, who’s using the SMTP component with SSL support via the new sessions, I got a flood of bugreports after 0.6.1 was released. But it didn’t end there, the bugs he reported lead to other bugs I found when testing the fixes!
All in all about 15 bugs got fixed, but lets get it over from the start.
First there was a problem with SMTP when SSL was turned on (either implicitly or via STARTTLS). There was a problem with sending bigger chunks of data, like mails with attachments. It turned out that due to an oversight on my side (albeit the OpenSSL docs aren’t exemplary either), I misunderstood the behavior of SSL_write(). It seems that when SSL_write() fails on non-blocking send with SSL_ERROR_WANT_WRITE error, the next call to the function MUST contain the exact same buffer pointer. This was a cheap check done by OpenSSL to ensure that the data you’re trying to send doesn’t change inbetween the two calls. Problem is, I was using ansistrings which got filled up (and thus relocated). This got promptly fixed by setting the SSL_CTX to ignore that check.
Another bug got reported right after I fixed this part. It seems that I completely forgot to handle the TLConnection.Session property in regards to visual assignment in Lazarus via OI. Not only did I make it so that it crashes on certain changes, I completely forgot to use the TComponent notification mechanism.
While fixing that extempore I found another, bigger one. My Sessions are NOT shared at all! I made a HUGE unforgivable logical oversight which caused each TLSession (and descendants) to only work for the last component they were assigned to. In other words, the sessions couldn’t be shared properly.
While fixing THAT bug, I found out that my TLSocket.Creator property isn’t delegated properly, resulting in wrong assignment. The idea behind .Creator is, that you will always know the highest-most protocol component by it, so you can go down the chain. For example, TLFtp uses TLtcp which makes the TLSockets. On your events you get the aSocket: TLSocket as argument, and if you ask it for it’s .Creator, you should get the FTP component.
So I fixed these three huge oversights, but the finish line was still far away!
I got another bugreport from my friend about SMTP. He wanted to automate mail sending, more precisely send a SMS (via smtp) and a normal e-mail automatically. Since he needed it inside a visual application he followed my visual SMTP example and automated the process. He found out that if the SMTP.PipeLine was turned off (default, means “emulate pipelining on server”), his commands got “stuck” after the first 2.
This was another logical bug on my part. In SMTP and FTP, I have a state-machine which takes care of the various states in these protocols. It is implemented as a stack, I push the status in, wait for server response and then remove it. The problem was, that I reported success or failure via OnSuccess/OnFailure BEFORE I removed the given status from the stack. If you sent another command inside OnSuccess/OnFailure event handler, the command would get pushed on the stack, but when the execution returned to me, removed again. This basically causes a de-sync, because now SMTP/FTP is waiting for a command not issued.
There was one additional bug fixed during the marathon, not reported but noticed while I tested SSL on both SMTP and HTTP examples. In HTTP I noticed that in Windows, HTTPS download of a page failed, it just got stuck in the middle. I was quite sure it had to do with the windows gui eventer I use in case of visual lNet, because it worked in Linux, and the console version worked in Windows too. Since I use WSAAsyncSelect for the visual eventer core functionality, I decided to properly study it’s documentation on MSDN and see what’s up. It hit me when I read about the “re-enabling” functions. You see, in WSAAsyncSelect, you tell windows “watch this socket for event xxx”. When the given event happens (like “can read”), you get a windows message. The problem is, that after it’s reported, WSAAsyncSelect DOESN’T watch for the event anymore, until you call a “re-enabling” function. However with SSL_read, it’s a problem, because it doesn’t call recv() always, it sometimes just processess the rest of its internal buffers. This causes a de-sync of the event flip-flop. I fixed it by a little hack, since I want the event to always be reported if there’s anything in the buffer, I did a recv() with MSG_PEEK after each read event reported. This ensures that WSAAsyncSelect watches for reads again without removing anything from the recv buffer.
So.. that’s about it. There were some minor example fixes along the way but nothing to write about. I hope this doesn’t undermine lNet’s viability for you. I’d also like to thank my friend Jano Kolesár for all the testing on SMTP.