So I had the brief pleasure of looking at the Global-MYKO server while it was experiencing attacks.
Nothing new - people flooding sockets and the like. People have done this since forever; it's been at least 5 or 6 years since I last had to deal with this (since, obviously I've moved over to my own server files and never had to worry about any of this again).
They, like virtually anyone using the official 1.298 files at this point are using SOACS. Unfortunately, they couldn't get ahold of osmanx in a timely manner so they resorted to contacting me (as a final last resort).
Contacting me apparently didn't fly well with osmanx, as he's no longer supporting them because of this. Ouch.
Anyway, back to the real story here -- so essentially the way the server works is it allocates 1500 sessions and 1500 session IDs. The session IDs are stored in a list, which it'll take from/restore as sessions are connected or disconnect.
In addition to the obvious denial of service from filling up the sockets, the server would start losing available session IDs (despite there being no sessions connected to use them).
Meaning that eventually (with the attacks, pretty quickly actually), it would stop accepting connections altogether because there's no session IDs to pull from -- yet the connections themselves are already disconnected. Those session IDs are just lost to the world. This is just the beginning of the WTF-ery; it actually gets worse when looking into it, as I'll soon explain.
At this point in time you'd expect the server not to have any basic issues with handling sockets/sessions, but apparently this isn't the case.
So, he reduces the rate of this problem by enforcing a whitelist (which is perfectly fine by itself, but not to specifically treat this issue) but that doesn't really address it so much as it deters it. Additionally, he's clearly experienced the crashes from it as for every related pointer check he's added checks to verify each session doesn't point to a known bad debugger value (e.g. 0xfeeefeee / 0xdddddddd for freed/deleted pointers, etc.). The 1.298 server files were built in debug mode, so it will use these -- but they shouldn't be relied on. If we're at this point where the session pointer is broken, it means things have already gone horribly wrong, and it's not just these pointers that can break here.
Checking these merely masks the problem.
So what *is* the problem?
Let's take a look at how the official code works (SOACS logic is slightly different, but it merely just adds the various pointer checks -- so we don't care about that).
The server has 1 thread for accepting connections. This is called AcceptThread.
Then it has multiple threads for handling packets/disconnects. These are called ReceiveWorkerThread.
Finally it has multiple threads for handling AI connections. These are called ClientWorkerThread.
The latter is worth noting for future because it's very similar to ReceiveWorkerThread (for player connections), however AI sessions do not use session IDs.
So let's start by looking at AcceptThread, i.e. when a new connection is handled.
In the client source, this is a fair bit cleaner/easier to understand, and for this example is identical, so let's look at that:
Code:
DWORD WINAPI AcceptThread(LPVOID lp)
{
CIOCPort* pIocport = (CIOCPort*) lp;
WSANETWORKEVENTS network_event;
DWORD wait_return;
int sid;
CIOCPSocket2* pSocket = NULL;
char logstr[1024];
memset( logstr, NULL, 1024 );
struct sockaddr_in addr;
int len;
while(1)
{
wait_return = WaitForSingleObject( pIocport->m_hListenEvent, INFINITE);
if(wait_return == WAIT_FAILED)
{
TRACE("Wait failed Error %d\n", GetLastError());
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Wait failed Error %d\r\n", GetLastError());
LogFileWrite( logstr );
return 1;
}
WSAEnumNetworkEvents( pIocport->m_ListenSocket, pIocport->m_hListenEvent, &network_event);
if(network_event.lNetworkEvents &FD_ACCEPT)
{
if(network_event.iErrorCode[FD_ACCEPT_BIT] == 0 )
{
sid = pIocport->GetNewSid();
if(sid < 0) {
TRACE("Accepting User Socket Fail - New Uid is -1\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accepting User Socket Fail - New Uid is -1\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
pSocket = pIocport->GetIOCPSocket( sid );
if( !pSocket ) {
TRACE("Socket Array has Broken...\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Socket Array has Broken...\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
len = sizeof(addr);
if( !pSocket->Accept( pIocport->m_ListenSocket, (struct sockaddr *)&addr, &len ) ) {
TRACE("Accept Fail %d\n", sid);
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accept Fail %d\r\n", sid);
LogFileWrite( logstr );
pIocport->RidIOCPSocket( sid, pSocket );
pIocport->PutOldSid( sid );
goto loop_pass_accept;
}
pSocket->InitSocket( pIocport );
if( !pIocport->Associate( pSocket, pIocport->m_hServerIOCPort ) ) {
TRACE("Socket Associate Fail\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Socket Associate Fail\r\n");
LogFileWrite( logstr );
pSocket->CloseProcess();
pIocport->RidIOCPSocket( sid, pSocket );
pIocport->PutOldSid( sid );
goto loop_pass_accept;
}
pSocket->Receive();
}
loop_pass_accept:
continue;
}
}
return 1;
}
Stepping through it from the time a connection is detected:
Code:
sid = pIocport->GetNewSid();
if(sid < 0) {
TRACE("Accepting User Socket Fail - New Uid is -1\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accepting User Socket Fail - New Uid is -1\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
The first thing it'll do when it detects a new connection is it'll call CIOCPort::GetNewSid().
The intention of this is to retrieve the next available session ID to use for this session and remove it from the pool.
Note: This happens before we've even accepted the connection. This is important to note because soacs' internal whitelist hooks the accept() call. We have not accepted the connection yet, but we have allocated an ID for it.
We'll get into this more later, but for now CIOCPort::GetNewSid() does this:
Code:
signed int __thiscall CIOCPort::GetNewSid(CIOCPort *this)
{
signed int result; // eax
signed int ret; // esi
if ( !this->m_SidList.empty() )
{
ret = m_SidList.front();
m_SidList.pop_front(); // std::list<int,std::allocator<int>>::pop_front(&this->m_SidList);
result = ret;
}
else
{
CIOCPort::RefreshSidList(this);
result = -1;
}
return result;
}
Or cleaned up some more, it'd look like this in the original source (what's important to note is that in the source we have, RefreshSidList() does not exist):
Code:
int CIOCPort::GetNewSid()
{
if (m_SidList.empty())
{
RefreshSidList();
return -1;
}
int ret = m_SidList.front();
m_SidList.pop_front();
return ret;
}
So, if the list is empty call RefreshSidList(), the purpose of which is to try and fix desync issues (yes, rather than fix the problem, they wrote this to try and deal with it when it happened) by reclaiming session IDs for unused sessions. There's a whole level of WTF involved here because it's mgame, so obviously they never even got this much right -- and in fact it makes the problem *worse*. But I'll get into that after.
After it calls RefreshSidList() it will return -1 to indicate it did not find a session ID to use.
If the list contains session IDs, we'll store the first session ID in the list, remove this from the pool and then return it to AcceptThread.
Back to AcceptThread:
Code:
if(sid < 0) {
TRACE("Accepting User Socket Fail - New Uid is -1\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accepting User Socket Fail - New Uid is -1\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
So if it fails and didn't find a session ID, we'll log that and then proceed to waiting for & handling the next connection event.
When we do correctly get assigned a session ID however, our next step is associating that with a session (note: I'll likely interchangeably use socket and session to refer to the same thing - the allocated session which the server will assign the socket to).
Code:
pSocket = pIocport->GetIOCPSocket( sid );
if( !pSocket ) {
TRACE("Socket Array has Broken...\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Socket Array has Broken...\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
CIOCPSocket2::GetIOCPSocket() is also (almost) identical in the client source.
I've tweaked the source below to match it -- the only change here is a compiler optimisation with it storing m_SockArrayInActive[index] and checking against the stored value, instead of checking m_SockArrayInActive[index] both times.
I note this because it's actually a very important change in the event of race conditions, which I'll get to in a bit.
Code:
CIOCPSocket2* CIOCPort::GetIOCPSocket(int index)
{
if( index > m_SocketArraySize ) {
TRACE("InActiveSocket Array Overflow[%d]\n", index );
return NULL;
}
CIOCPSocket2 *pIOCPSock = (CIOCPSocket2 *)m_SockArrayInActive[index];
if ( pIOCPSock ) {
TRACE("InActiveSocket Array Invalid[%d]\n", index );
return NULL;
}
m_SockArray[index] = pIOCPSock;
m_SockArrayInActive[index] = NULL;
pIOCPSock->SetSocketID( index );
return pIOCPSock;
}
In other words, it will first check if the session ID is even valid.
If it's valid, it'll pull the session from its inactive pool. It's important to note (again, for later) that rather than an actual pool, each session is mapped 1:1 with a session ID, so session ID 0 will always point to the first session in the array.
It'll then move the session into its corresponding slot in the active pool, attach the session ID to the session (pretty pointless considering it's never going to be mapped to a different session but oh well), and finally return that session to AcceptThread.
Code:
if( !pSocket ) {
TRACE("Socket Array has Broken...\n");
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Socket Array has Broken...\r\n");
LogFileWrite( logstr );
goto loop_pass_accept;
}
If no session was found, it'll log this and continue on.
Note that it won't return the session ID to the pool at this point.
While it "should", it'll only fail if the session is in use (session ID already in use, which actually CAN happen) or the session ID is just wrong (should never happen). In these cases, there's not actually much point to restoring it as it will just keep using this broken session and failing to accept these connections until the end of time. So this is actually preemptively handling these potential (actually occurring) issues by sheer luck really. Again, I'll get more onto these issues in a bit.
So we're assigned a session ID, and we have a session to go with it. Now we can accept the connection.
Code:
len = sizeof(addr);
if( !pSocket->Accept( pIocport->m_ListenSocket, (struct sockaddr *)&addr, &len ) ) {
TRACE("Accept Fail %d\n", sid);
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accept Fail %d\r\n", sid);
LogFileWrite( logstr );
pIocport->RidIOCPSocket( sid, pSocket );
pIocport->PutOldSid( sid );
goto loop_pass_accept;
}
At this point we call CIOCPSocket2::Accept(), which looks like this:
Code:
BOOL CIOCPSocket2::Accept( SOCKET listensocket, struct sockaddr* addr, int* len )
{
m_Socket = accept( listensocket, addr, len);
if( m_Socket == INVALID_SOCKET) {
int err = WSAGetLastError();
TRACE("Socket Accepting Fail - %d\n", err);
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Socket Accepting Fail - %d\r\n", err);
LogFileWrite( logstr );
return FALSE;
}
struct linger lingerOpt;
lingerOpt.l_onoff = 1;
lingerOpt.l_linger = 0;
setsockopt(m_Socket, SOL_SOCKET, SO_LINGER, (char *)&lingerOpt, sizeof(lingerOpt));
return TRUE;
}
So this calls accept() (note: the soacs whitelist applies here) to accept the connection. If it failed, it'll log & return false.
If it succeeded, it'll set the linger timeout (more info https://stackoverflow.com/questions/...n-its-required) and return true.
Here's where things start getting fun. If we failed to accept the connection (e.g. because soacs blocked it):
Code:
if( !pSocket->Accept( pIocport->m_ListenSocket, (struct sockaddr *)&addr, &len ) ) {
TRACE("Accept Fail %d\n", sid);
char logstr[1024]; memset( logstr, NULL, 1024 );
sprintf( logstr, "Accept Fail %d\r\n", sid);
LogFileWrite( logstr );
pIocport->RidIOCPSocket( sid, pSocket );
pIocport->PutOldSid( sid );
goto loop_pass_accept;
}
It will log this, call CIOCPort::RidIOCPSocket() (restore the session back to the inactive session pool) and CIOPort::PutOldSid() (restore the session ID back to the pool), then wait for the next new connection event.
CIOCPort::RidIOCPSocket():
Code:
void CIOCPort::RidIOCPSocket(int index, CIOCPSocket2 *pSock)
{
if( index < 0 || (pSock->GetSockType() == TYPE_ACCEPT && index >= m_SocketArraySize) || (pSock->GetSockType() == TYPE_CONNECT && index >= m_ClientSockSize) ) {
TRACE("Invalid Sock index - RidIOCPSocket\n");
return;
}
if( pSock->GetSockType() == TYPE_ACCEPT ) {
m_SockArray[index] = NULL;
m_SockArrayInActive[index] = pSock;
}
else if( pSock->GetSockType() == TYPE_CONNECT ){
m_ClientSockArray[index] = NULL;
}
}
(we only care about TYPE_ACCEPT sockets -- these are client sockets. TYPE_CONNECT are AI server sockets, as Ebenezer is making a connection to AI server as opposed to accepting a connection from the client)
CIOCPort::PutOldSid():
Code:
void CIOCPort::PutOldSid(int sid)
{
if( sid < 0 || sid > m_SocketArraySize ) {
TRACE("recycle sid invalid value : %d\n", sid);
return;
}
list<int>::iterator Iter;
Iter = find( m_SidList.begin(), m_SidList.end(), sid );
if( Iter != m_SidList.end() )
return;
m_SidList.push_back(sid);
}
This checks if the session ID is valid and doesn't already exist in the list. If not, it'll add it back to the end of the list.
Bookmarks