Pulling Data with the Pull (Push) Connector

Jacek Artymiak | 5th July 2013

The Pull Connector is the latest addition to our growing family of Push connectors. This new Push connector takes its name after the mechanism used to deliver the interactions you filter for: you pull data from our platform instead of us pushing it to you.

Even though the name of this connector might seem to be out of place for a Push connector, it makes sense to classify it as another Push connector, because it uses the same robust Push subsystem that powers other DataSift Push Connectors.

We designed it specifically for the clients who are firewalled from the public internet and prefer to keep and process data in house. The Pull Connector provides the following benefits:

  • Firewalls and network security policies are no longer an issue.
    With Pull, there is no need to set up public endpoints. It simplifies firewall and network management on your side.
    For example, you no longer need to ask your operations team to loosen up the firewall rules to enable connections from DataSift to a host that will receive data. They will not have to give up a precious public IP address or think of ways of redirecting traffic to a shared IP address.
    Also, a change of the IP address of the host receving data does not require a call to /push/update.
  • Data collection and processing at your own pace.
    The Pull Connector uses the Push data queuing subsystem. Your data is stored for an hour in a Push queue, giving you freedom to collect it as often as you want (up to twice per second per Push subscription ID) and to request as much of it as you want, in batches of up to 20MB.
  • You can retrieve data again, if necessary.
    If you need to request data again, you can go back in time for up to an hour using the queue cursor mechanism. It lets you retrieve data from the queue again in case it gets lost. You have up to one hour to retrieve it, which should give you plenty of time to handle technical problems.

When you combine the robust foundations of the Push subsystem, the freedom to collect data at your own pace, and the ease of setting up a data collection and processing system without having to make changes to your organization's network and security setup, the Pull Connector becomes a very attractive solution.

And we saved the best for last, even though the Pull Connector introduces a new endpoint, /pull, for data collection, we implemented it using the same REST API you are already familiar with. You set it up just like any other Push connector and then call /pull to get your data.

Previous post: New Release of the Query Builder

Next post: Using Managed Sources