How do we fix slow journal access abused by the fake traffic on the OJS? πŸ‘» » Open Journal Theme

How do we fix slow journal access abused by the fake traffic on the OJS? πŸ‘»

Case study: How we fix the slow OJS access

In this case, one of our customers from the journal monthlyreviewarchives.org has a problem with their OJS platform, the problem they face every day is that the site access is very slow at certain hours, so sometimes it makes their site can’t be accessed at the same time. very. one of the factors that cause their site to be slow is that there are thousands of submissions contained in OJS and in their archives so when loading the site’s archive page it will be very slow.

They come to us and consult about the problem of their site which takes a long time to finish loading the archive page and sometimes users can’t access the monthlyreviewarchives.org site at all. After we tested, we found that sometimes their site can only be accessed for a matter of minutes and this happens on the archive pages of their journals and the front page of their site which contains many archives containing many articles.

This issue is crucial for the authors. Authors never know if their submission is included in the journal or not because they can never, load the front page.
Fortunately, at openjournaltheme, we have diversification so that we can solve problems in a variety of approaches.

We have a diverse team of specialists, server specialists, coding specialists, and OJS specialists. After doing our research, this article will explain our findings regarding the problems experienced by monthlyreviewarchives.org

About the monthlyreviewarchives.org

The Monthly Review, established in 1949, is an independent socialist magazine published monthly in New York City (US). The publication is the longest continuously published socialist magazine in the United States. Monthly Review Press, an allied endeavor, was launched in 1951.

Monthly Review spoke for critical but spirited socialism, independent of any political organization. In an era of Cold War repression, the magazine published pioneering analyses of political economy, imperialism, and Third World struggles, drawing on the rich legacy of Marxist thought without being bound to any narrow view or party line. 

The journal is unique in its field since it discusses third-world issues or other left-wing economic discussions on economic and has a huge list of article collections. Currently, the journal has 5.313 articles and 813 archives. The journal was released in 1949. The journal seems a veteran in the journal publication.

Analyze the problem 

First, we conducted an analysis related to what factors could cause their site to be very slow, we conducted several test methods to conduct trials to find out what factors caused their site to be very slow. It often shows 500 error displays connection closed and Dr. Jamil as a manager for the site reported to us that the site gets outages 5-6 times per day, mostly at night for some reason.

Because on their server there are many websites that use the server, this is one of the reasons the monthly review archive site is very slow.

Activity to solve the problem

Separate the journal from another website

Before we initiate the analysis we suggest our client use dedicated hosting for the journal that is separate from another website. Since the used website is used by many webs such as the main page of the site that is using WordPress and other platforms. This is because after doing observation of the current server, the server actually optimized for the server of the general purpose such as WordPress or other common instances.

Since OJS uses the specific features, the first activity we suggested that it should be moved to a more dedicated environment, and the client agree to migrate to our hosting. A hosting that is built uniquely to suit the optimization of the OJS system. The separation is also important due to it allows us to freely do some advanced configuration in the server without disturbing the other site.

The size of the cover image on the archive is too large and contains too many archives

Previously on the archive page in the monthlyreviewarchives.org journal, it was very slow, apart from the slow server processing client requests, after we checked the details, it turned out that the cover file size on each archive was slow when the cover had a very large size.

The issue cover is good in resolution but lacks in terms of its size.
After we doing some observation, the uploaded issue’s cover on average has a file size of more than 1 MB

The journal manager uploads the issue cover that although it looks great on big resolution, unfortunately, OJS will shrink the size to only show the size in 200 x 300 size but it loads the original file with the size is more than 1MB. Unfortunately, OJS does not currently provide a thumbnail generation feature for any images uploaded by the user and will be resized automatically. So be aware of the image size uploaded to your OJS since it will affect the load speed of your OJS.

So when any visitor visits the current issue or the archive page, it may download in total to load the page for about 20MB in just one page only and woow. This is just a waste of unnecessary bandwidth for both client and server.

With the huge size of the cover image, it will block any other user that accesses the same page with let’s say uncertainty level of site loading.

So the file size needs to be resized and we recommend to our client that they need to upload a JPEG/JPG file type since it is more efficient in file size than using a PNG file type.

After the cover image gets resized, we then activate the static caching on the server so the journal gets significant improvement as we have expected.

Lack of server resources

Slow website access when someone tries to visit a journal site is a serious problem for a journal. monthlyreviewarchives site has this problem at certain hours, many of its visitors complain that the site will be very slow at certain hours. After we checked it happened because the CPU on the monthlyreviewarchives.org server could not handle client requests.

The monthlyreviewarchive server also has many active sites, so it seems that this also affects slow site access journal monthlyreviewarchive.org

After we choose the closest user for the visitor geo IP we need to try the performance of the new server compared with the previous server.
The following is the capacity of the previous server capabilities after we tested using A/B tools from apache:

After the results of this test, we came to the following conclusions:
the server could run 2945 requests per 400 seconds with a concurrency of 100 users

After migration, we get a huge increase in the user request number.

After we optimized the new server and implemented a caching system, the server was able to handle 10.000 requests in 33 seconds

Implement the caching system on the server level

The caching system on our hosting architecture is provided by default and easily activated. This feature is highly useful to improve the access and the resource of the system. So no duplicate query taken by the second or later user that accesses the same page on the time range configured by the user of our system:

The caching feature provided ease to be able to turn on by the user.

We implement two kinds of caching on our hosting service, one is dynamic caching, and the second is static content that has different purposes. Dynamic caching is a mechanism that will cache a query to the PHP and data retrieval to Mysql so it will keep a static file for some range of time, the user will get the benefit of loading a static file rather than accessing MySQL and PHP access.

Static caching is caching that will cache a static file such as images, or CSS files so the server will communicate with the browser to keep the data on the client side on some of the cache duration that is configured by the server.

By our previous simulation on the real case, this feature highly increases the speed of access of ojs accessibility.

Read more here: How we managed to increase the speed of OJS by more than 300%

We benchmark the site using an external third-party service: GTMetrik to see how the site performs and this is the result :

The result from the previous server :

After migrating to our hosted service:

This is the comparison between the previous server and our server :

Server ConcurrentRequestRequest per secondsGTMetrix Grade/Performance/Finish load time
Previous server1002945400B / 88% / 1.7s
OJT server 10010.00033A / 99% / 804 ms

Seems like it is a big win for our team knowing that it works and our team can sit tight to inform our customer that his site now is running well. After the later day…πŸ‘»

In a couple of days, we are very happy to know that the server running well on our server environment. We have updated the issue to our client that the problem with the site is after 4 days on our site have resolved.

However, after someday in some midnight, we get a notification for the system that the system load for about more than 4 hours in the CPU usage for more than 100%. This was without being known by the client because actually the journal site could be accessed but in the backend, the server was struggling to be able to serve the visitors.

We try to access the server, and although the server loads the page, it took so much time to fully load the page. In about 2 to 3 minutes to get successfully loaded. Honestly, we were confused, although we have optimized the site with caching mechanism, it seems did not work and the system load constantly never changed in 12 hours.

We were guessing that there was a lot of visitor traffic that access the site, however, since this is just a rough guess we were still not sure that this was true. To know that our assumption is correct, we ask permission from the clients to share their visitor analytical data.

As we are aware, the client uses Google Analytics to gain information about the visitor so we decide to make some analysis of the Google Analytics report. Here was the report from Google Analytics at that current time :

In GS, it seems like there is no spike in user access in 28 days so the visitor or request of the site is generally normal.

This was weird because the server showed a high load and the load is coming from the mysqld and PHP fpm that handle the journal site. This means that the system load was not caused by some background tool in the operating system.

At that time we were surprised and confused about what caused this issue.

After doing a deep analysis and made some discussions with our internal team, we were aware that GS is not accurate as a tool to gain the overall traffic of the site. This is also confirmed with the official information from GS here. So we throw out GS data for this anomaly.

That means that we need data that show overall traffic. One of our team suggested that using the access log of the site would be more accurate to count the whole visitor of the site. So we used the file as a base data before making any more decisions on improvements.

By using the access log here was what we found:

Based on the data in the report the site receives, requests achieve up to 63.000 hits per day with only 770 visitors on 16 July 2022, which means that each user has made more than 80 requests in some hour… that’s ridiculous because in the GS report user that visits the site on average is less than 150 users per day. As you can see on the above screenshot, the access log reported that the visitor that accessed the site was more than a thousand a day on average.

We conclude that there are a lot of traffic attacks, DDOS, and bots access to the site. Let’s continue to the later section.

The site was attacked by thousands of visitor bots

We found it seems that visitor bots were causing the servers on the monthlyreviewarchives.org site to overuse CPU and this was one of the reasons why the site was so slow.

What is bots traffic?

Bot traffic is any access that is performed by non-human traffic accessing the site. The target of the bot varies ranges from news sites, journal sites, or other publicly accessible resources.

There are actually two kinds of bot. One is a good bot and the second is a bad bot. Here is some info about the difference between these types:

image source: https://www.publift.com/blog/everything-to-know-about-bot-traffic

Although we have been handling a hundred clients of journals, this kind of information is new to us to acknowledge that journals can be a victim of DDOS attacks by bad bots. So server optimization is not enough. We need to re-engineer our infrastructure by improving the security mechanism.

Back to the journal the bad bots that access our client site is suffering from the system load usage spike.

The server runs on over CPUs usage for more than 12 hours and it rarely gets decreased.
As the day goes by, the CPU usage is getting increases and increases.

For the OJS system the bad bots is a serious threat that can affect some point :

  • Sudden spikes and abnormal increases in page views
  • Higher bandwidth usage
  • Lower conversion rates mean that it will lose more quality submissions from the credible author.
  • No one can read the article because the site never finishes loading. It will decrease the potential citation of the article.
  • Poor website performance
  • Increased strain on data centers and higher server costs
  • bots can make a data/article scrapping that will negatively absorb the resources of the server

By knowing the disadvantages, it is really important that the hosting that is used by the journal is designed to prevent such problems.

Improving our hosting service with bot protection

Although we have armed our server infrastructure with a security mechanism that protects some protection from abuse tries on the OJS system, seems like it is not enough. The cloud infrastructure also mentioned that they protecting their service with DDOS but seems like this is useless. We need to come out with a more advanced solution.

We learn that the IP and the pattern of bots were actually similar. By learning this, we implement a tool to overcome this problem. The tool that we have implemented added a security system so that our server system can perform the process of blocking access to bots that match the bot visitor criteria on the Monthy Review site.

To be able to do this, the bot blocker should be set up at the server level not at the application level so the web server knows that the access is from the real legitimate user before sending any request to the OJS or Mysql query. Unfortunately, this approach cannot be achieved by developing a plugin so we cannot share our solution as a plugin with the community because it won’t be effective to block at the application level.

The system will do a smart analysis with the assigned IP address that is known as bad bots and update it regularly to prevent new bots that come with a new IP address to make any unnecessary access to our client site.

The Result

Since migrating the server and optimizing the OJS system on openjournaltheme using our smart caching system and improved security measurement on our servers including bot blocking mechanisms, Openjournaltheme has made the monthlyreviewarchive website faster to access, and we believe it will have a very positive impact on their site visitors, with access easy site and increase the number of writers and visitors on their site.

The server significantly becomes faster and responsive on any user access :

The system load of the server is at the acceptable level and on average it is very low means that the server resource is highly efficient.
System load significantly reduces after we implemented protection from fake traffic from the bots.

Using the Openjournaltheme server that has the implemented dynamic and static caching system, a high-security provision, periodic backup, monthlyreviewarchive archives high benefit from hosting migration to Openjournalteheme without impacting the publishing system’s submission operations. Openjournaltheme helps maintain the highest access to their system by working to implement a caching and security system on all pages in the journal monthlyreviewarchives.org.

Now we can see the bright smile of our client knowing that the problem that he had suffered is already solved 😊

Read more about OJS hosting services on openjournaltheme

Why does the load speed matter for your journal?

  1. Google will reward sites that have high access speeds, this will have an impact on the SEO of your journal site
  2. Authors can find their articles more easily because the process of loading articles on the archive becomes faster
  3. Visitors will be more comfortable looking for an article because the load is faster
  4. Any indexing may exclude the journal from their indexing since the site cannot be reached because of the performance issue.

Our service

The services we provide are always oriented to the needs of our users, in this case, we take various approaches both in terms of server optimization and security enhancement in OJS as we have described. This effort is our contribution so that we can help each of our clients’ specific needs in an effort to spread knowledge more widely and to help them focus more on the research activity rather than hurdled by some unnecessary technical issue.

Need an expert service for your journal? Contact us here.

Tags :

Need More ServicesΒ  or Question?

Openjournaltheme.com started in 2016 by a passionate team that focused to provide affordable OJS, OMP,  OPS,  Dspace, Eprints products and services. Our mission to help publishers to be more focus on their content research rather than tackled by many technical OJS issues.

Under the legal company name :
Inovasi Informatik Sinergi Inc.

Secure Payment :

All the client’s financial account data is stored in the respective third-party site (such as Paypal, Wise and Direct Payment).
*Payment on Credit card can be done by request
Your financial account is guaranteed protection. We never keep any of the clients’ financial data.