Skip to content

Extend the Algolia Search Bundle to Multiple Databases

Cédric Kui4 min read

Creating a search engine from scratch may be tricky and lengthy. That is why, when I wanted to implement one in my Symfony app, I chose to use Algolia through its SaaS model.

Thanks to their Symfony bundle, I managed to map my Person and Company entities from my main database to their indexes with a couple of line commands. From then on, my users could carry out research among the persons and companies registered in my online directory.

Here come the problems

Then I wanted to index news articles. Unfortunately, my articles were stored in a second database, which was not handled by the Algolia bundle. I decided to create my own Symfony command by using Algolia’s API Client for PHP. In it, I queried all published articles in my table and indexed them one by one with the API client.

However, as I had tens of thousands of articles stored in my table, my server crashed when I tried to index them all in one go. So I added logs to keep track of the already indexed articles and an argument to the command to specify where to restart.

class IndexArticleCommand extends Command
{
    private $articleRepository;

    private $algoliaApplicationId;

    private $algoliaApiKey;

    public function __construct(ArticleRepository $articleRepository, $algoliaApplicationId, $algoliaApiKey)
    {
        $this->articleRepository = $articleRepository;
        $this->algoliaApplicationId = $algoliaApplicationId;
        $this->algoliaApiKey = $algoliaApiKey;
    }

    protected function configure()
    {
        $this
            ->setName('algolia:article:index')
            ->setDescription('Index all published articles in Algolia')
            ->addArgument('indexName', InputArgument::REQUIRED, 'What is the name of your index?')
            ->addArgument('startId', InputArgument::OPTIONAL, 'What content do you want to start indexing at? If not set, you start at 0.')
        ;
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $indexName = $input->getArgument('indexName');
        $startId = $input->getArgument('startId') ? $input->getArgument('startId') : 0;

        $algoliaClient = new AlgoliaClient(
            $this->algoliaApplicationId,
            $this->algoliaApiKey
        );
        $algoliaIndex = $algoliaClient->initIndex($indexName);

        $publishedArticles = $this->articleRepository->getAllPublishedArticles($startId);

        foreach ($publishedArticles as $article) {
            $algoliaIndex->addObject(
                [
                    'title' => $article->getTitle(),
                    'body' => $article->getBody(),
                    'publishedAt' => $article->getPublishedAt(),
                    'image' => $article->getCoverImage(),
                    'objectID' => $article->getId()
                ]
            );
        }

        $output->writeln('Article #' . $article->getId() . ' indexed');
    }
}

What now?

So the articles were indexed and I was able to see them in my Algolia dashboard. Though I was not able to automatically update my Article index as my Article entity and my Article index were not mapped. In other words, when I updated a company for instance from my app administration page, my Company index on Algolia was automatically updated as well, but it was not the case to update my Article index.

To make things even worse, news articles were written and updated by journalists on their Ruby on Rails app that I had no control over at all. That is why I had to be creative. Luckily enough, a lot of information was stored in the Article table such as the last modification date. Thanks to that, I could write a new Symfony command that queried all articles that were modified in the last minute and updated the Article index accordingly.

class UpdateArticleIndexCommand extends Command
{
    private $articleRepository;

    private $algoliaApplicationId;

    private $algoliaApiKey;

    public function __construct(ArticleRepository $articleRepository, $algoliaApplicationId, $algoliaApiKey)
    {
        $this->articleRepository = $articleRepository;
        $this->algoliaApplicationId = $algoliaApplicationId;
        $this->algoliaApiKey = $algoliaApiKey;
    }

    protected function configure()
    {
        $this
            ->setName('algolia:article:update')
            ->setDescription('Update recently modified articles in Algolia')
        ;
    }

    private function formatArticleInArray(Article $article) {
        return [
            'title' => $article->getTitle(),
            'body' => $article->getBody(),
            'publishedAt' => $article->getPublishedAt(),
            'image' => $article->getCoverImage(),
            'objectID' => $article->getId()
        ]
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $indexName = $input->getArgument('indexName');

        $algoliaClient = new AlgoliaClient(
            $this->algoliaApplicationId,
            $this->algoliaApiKey
        );
        $algoliaIndex = $algoliaClient->initIndex($indexName);

        $oneMinuteAgo = new \DateTime()->modify('-1 minute');

        $lastModifiedArticles = $this->articleRepository->getModifiedArticlesSince($oneMinuteAgo);

        foreach ($lastModifiedArticles as $article) {
            //If the article is published
            if ($article->isPublished() == True) {
                //If it exists in the index, we update it
                if ($algoliaIndex->search($article->getTitle())) {
                    $algoliaIndex->saveObject($this->formatArticleInArray($article));
                    $output->writeln('Article #' . $article->getId() . ' updated');
                //It it doesn't, we create it
                } else {
                    $algoliaIndex->addObject($this->formatArticleInArray($article));
                    $output->writeln('Article #' . $article->getId() . ' indexed');
                }
            //If the content is now unpublished, we delete it
            } else {
                $algoliaIndex->deleteObject($article->getId());
                $output->writeln('Article #' . $article->getId() . ' deleted');
            }
        }
    }
}

Conclusion

Algolia is a very powerful tool to quickly implement a search engine even though it may have some limits. However, those limits can be overcome thanks to their API client, as tedious as my solution may seem.