DataWeave’s Telecom Recharge Plans API
Several months ago we released our Telecom recharge plans API. It soon turned out to be one of our more popular APIs, with some of the leading online recharge portals using it extensively. (So, the next time you recharge your phone, remember us :))
In this post, we’ll talk in detail about the genesis of this API and the problem it is solving.
Before that—-and since we are into the business of building data products—some data points.
As you can see, most mobile phones in India are prepaid. That is to say, there is a huge prepaid mobile recharge market. Just how big is this market?
The above infographic is based on a recent report by Avendus [pdf]. Let’s focus on the online prepaid recharge market. Some facts:
- There are around 11 companies that provide an online prepaid recharge service. Here’s the list: mobikwik, rechargeitnow, paytm, freecharge, justrechargeit, easymobilerecharge, indiamobilerecharge, rechargeguru, onestoprecharge, ezrecharge, anytimerecharge
- RechargeItNow seems to be the biggest player. As of August 2013, they claimed an annual transactions worth INR 6 billion, with over 100000 recharges per day pan India.
- PayTM, Freecharge, and Mobikwik seem to be the other big players. Freecharge claimed recharge volumes of 40000/day in June 2012 (~ INR 2 billion worth of transactions), and they have been growing steadily.
- Telcos offer a commission of approximately 3% to third party recharge portals. So, it means there is an opportunity worth about 4 bn as of today.
- Despite the Internet penetration in India being around 11%, only about 1% of mobile prepaid recharges happen online. This goes to show the huge opportunity that lies untapped!
- It also goes to show why there are so many players entering this space. It’s only going to get crowded more.
What does all this have to do with DataWeave? Let’s talk about the scale of the “data problem” that we are dealing with here. Some numbers that give an estimate on this.
There are 13 cellular service providers in India. Here’s the list: Aircel Cellular Ltd, Aircel Limited, Bharti Airtel, BSNL, Dishnet Wireless, IDEA (operates as Idea ABTL & Spice in different states), Loop Mobile, MTNL, Reliable Internet, Reliance Telecom, Uninor, Videocon, and Vodafone. There are 22 circles in India. (Not every service provider has operations in every circle.)
Find below the number of recharge plans we have in our database for various operators.
In fact, you can see that between the last week and today, we have added about 300 new plans (including plans for a new operator).
The number of plans varies across operators. Vodafone, for instance, gives its users a huge number of options.
The plans vary based on factors such as: denomination, recharge value, recharge talktime, recharge validity, plan type (voice/data), and of course, circle as well as the operator.
For a third party recharge service provider, the below are a daily pain point:
- plans become invalid on a regular basis
- new plans are added on a regular basis
- the features associated with a plan change (e.g, a ‘xx mins free talk time’ plan becomes ‘unlimited validity’ or something else)
We see that 10s of plans become invalid (and new ones introduced) every day. All third party recharge portals lose significant amount of money on a daily basis because: they might not have information about all the plans and they might be displaying invalid plans.
DataWeave’s Telecom Recharge Plans API solves this problem. This is how you use the API.
Sample API Request
Sample API Output
We aggregate plans from the various cellular service providers across all circles in India on a daily basis. One of our customers once mentioned that earlier they used to aggregate this data manually, and it used to take them about a month to do this. With our API, we have reduced the refresh cycle to one day.
In addition, now that this is process is automated, they can be confident that the data they present to their customers is almost always complete as well as accurate.
DataWeave helps businesses make data-driven decisions by providing relevant actionable data. The company aggregates and organizes data from the web, such that businesses can access millions of data points through APIs, dashboards, and visualizations.
An onions-to-onions analysis
Earlier this year in India the price of onions shot up to record levels. Onion prices continue to rise as analysts speculate on the reasons for the same.
We at DataWeave decided to do a little digging of our own by putting our platform to work. We aggregate commodities data published by a number of sources including agmarknet, rsamb, msamb, and, damb, on a daily basis. This data is also served through our Commodity Prices API.
Following are some insights we found:
The above curve shows the price of onions per quintal (100 kgs) starting from April 1st (beginning of the financial year). As we can observe, there is a steady trend of increase in onion prices/quintal at mandis across india since mid-May. However, we see a marked spike around mid-August. That’s when the panic began!
Above you can see the ‘interest over time’ for the search term ‘onion price’ on Google Trends, and it hits a spike on, well, August 14th!
Having learnt the trend in increase of prices per quintal we set out to transpose the supply data on the this graph to understand the possible effects of supply on the price increase. The results were startling to say the least.
The price rise can be clearly noticed right after the supply dwindled to record lows. Now whether this shortage in supply is a result of a crop failure or due to hoarding is a matter for speculation. Or is it?
While we did notice a huge interest about onion prices online, just how huge is this to, say ‘petrol price’ or, may be, ‘gold price’. Take a look.
Well then ..
DataWeave provides actionable data by aggregating, parsing, organizing and visualizing millions of data points from the Web. We serve this data through our data APIs. We build data products on top of the DataWeave platform.
Conquering the Data Mountain API by API
Let’s revisit our raison d’être: DataWeave is a platform on which we do large-scale data aggregation and serve this data in forms that are easily consumable. The nature of the data that we deal with is that: (1) it is publicly available on the web, (2) it is factual (to the extent possible), and (3) it has high utility (decided by a number of factors that we discuss below).
The primary access channel for our data are the Data API. Other access channels such as visualizations, reports, dashboards, and alerting systems are built on top of our data APIs. Data Products such as PriceWeave, are built by combining multiple APIs and packaging them with reporting and analytics modules.
Even as the platform is capable of aggregating any kind of data on the web, we need to prioritize the data that we aggregate, and the data products that we build. There are a lot of factors that help us in deciding what kinds of data we must aggregate and the APIs we must provide on DataWeave. Some of these factors are:
1. Business Case: A strong business use-case for the API. There has to be an inherent pain point the data set must solve. Be it the Telecom Tariffs AP or Price Intelligence API — there are a bunch of pain points they solve for distinct customer segments.
2. Scale of Impact: There has to exist a large enough volume of potential consumers that are going through the pain points, that this data API would solve. Consider the volume of the target consumers for the Commodity Prices API, for instance.
3. Sustained Data Need: Data that a consumer needs frequently and/or on a long term basis has greater utility than data that is needed infrequently. We look at weather and prices all the time. Census figures, not so much.
4. Assured Data Quality: Our consumers need to be able to trust the data we serve: data has to be complete as well as correct. Therefore, we need to ensure that there exist reliable public sources on the Web that contain the data points required to create the API.
Once these factors are accounted for, the process of creating the APIs begins. One question that we are often asked is the following: how easy/difficult is it to create data APIs? That again depends on many factors. There are many dimensions to the data we are dealing with that helps us in deciding the level of difficulty. Below we briefly touch upon some of those:
1. Structure: Textual data on the Web can be structured/semi-structured/unstructured. Extracting relevant data points from semi-structured and unstructured data without the existence of a data model can be extremely tricky. The process of recognizing the underlying pattern, automating the data extraction process, and monitoring accuracy of extracted data becomes difficult when dealing with unstructured data at scale.
2. Temporality: Data can be static or temporal in nature. Aggregating static datas sets is an one time effort. Scenarios where data changes frequently or new data points are being generated pose challenges related to scalability and data consistency. For e.g., The India Local Mandi Prices AP gets updated on a day-to-day basis with new data being added. When aggregating data that is temporal, monitoring changes to data sources and data accuracy becomes a challenge. One needs to have systems in place that ensure data is aggregated frequently and also monitored for accuracy.
3. Completeness: At one end of the spectrum we have existing data sets that are publicly downloadable. On the other end, we have data points spread across sources. In order to create data sets over these data points, these data points need to be aggregated and curated in order for them to be used. These data sources publish data in their own format, update them at different intervals. As always, “the whole is larger than the sum of its parts”; these individual data points when aggregated and presented together have many more use cases than those for the individual data points themselves.
4. Representations: Data on the Web exists in various formats including (if we are particularly unlucky!) non-standard/proprietary ones. Data exists in HTML, XML, XLS, PDFs, docs, and many more. Extracting data from these different formats and presenting them through standard representations comes with its own challenges.
5. Complexity: The data sets wherein data points are independent of each other are fairly simple to reason about. On the other hand, consider network data sets such as social data, maps, and transportation networks. The complexity arises due to the relationships that can exist between data points within and across data sets. The extent of pre-processing required to analyse these relationships makes these data sets is huge even on a small scale.
6. (Pre/Post) Processing: There is a lot of pre-processing involved to make raw crawled data presentable and accessible through a data API. This involves, cleaning, normalization, and representing data in standard forms. Once we have the data API, there can be a number of way that this data can be processed to create new and interesting APIs.
So, that at a high level, is the way we work at DataWeave. Our vision is that of curating and providing access to all of the world’s public data. We are progressing towards this vision one API at a time.
Amazon forays into India with aggressive discounts on books
With Amazon launching its India operations, the heat is turned up in the eCommerce space. Amazon had been testing waters through Junglee, so this was quite an expected event. The customers seem to be welcoming the overlords of eCommerce, perhaps eCommerce stores too, albeit less enthusiastically. There is a sudden burst of activity across the bigger players.
We, at DataWeave, collect data across a large number of product categories, including books, on a daily basis. So, a few days ago we decided to do some analysis once Amazon got into business with books.
We collected data for the top 5000 books from across 10 different eCommerce stores. We looked at analyzing things like: who is giving more discounts, what is the discount “sweet spot” for each major player, what are the popular categories, etc. We noticed some interesting things.
Discount Distributions across Stores
The below visualization shows the discount distribution for the top 5000 books across 6 eCommerce portals that sell books in India. We have divided the discounts being offered into 8 buckets. Along with the discount distribution, the visualization also depicts the “sweet spots” each of the online book sellers wants to be in with respect to the discounts being offered on books.
The first graphic above shows discount distribution when we don’t consider shipping charges. The second one shows the same when we add shipping charges to the selling price of each book and compute the effective discount.
Some interesting analysis that can be derived out of these visualizations is:
- One thing that comes out clearly is that shipping charges make a huge impact. When we don’t consider shipping charges, it appears that Flipkart is going aggressive with discounts with a significant portion of the donut covered by the 20—40% range.
- But add shipping charges to this, and the donuts (in the second graphic) look very different. This is because Amazon does not charge shipping charges as of now. You will in fact notice that there are a lot of books on Flipkart for which you won’t get any discount once you pay the shipping charges.
- Amazon prices a considerable chunk of its books in the 30-40% discount band (second graphic). None of the other sellers have this many books in the 30-40% band. Looks like they are going quite aggressive on customer acquisition by topping off the heavy discounts with free shipping (for now).
- Very few books are being offered at no discounts by Amazon. But other sellers like Flipkart, BookAdda, Crossword offer considerable number of books at no discount. In fact, significant portion of these books fall below their minimum order value for free shipping. So, there is a good chance of the shipping charges offsetting the discounts. So, unless you are buying a bunch of books at once, you might end up paying more than the price of the books!
- Sweet spots of these sellers when it comes to discounts being offered are:
- Amazon: between 10-40% discount range spread uniformly
- Flipkart: between 0-30% discount range spread uniformly
- HomeShop18: between 0-30% discount range. But more concentrated on the 10-20% discount range.
- BookAdda: about half of them are offered at no discounts. And they have a steep shipping charge (Rs. 60 for orders below Rs.399/-). Good for them, not so much for the customers.
- Infibeam: between a 10-30% discount range with more concentration on the 20-30% discount range.
- Crossword: Between a 0-10% discount range spread uniformly.
The above visualizations show the following metric: for how many of the top 5000 books is each store the cheapest. Again, if we don’t consider shipping charges, Flipkart still leads the pack. But Amazon again scores high once we include shipping charges. It also does not help the stores charging for shipping that the prices of most of the popular books fall below the stores’ minimum order value for free shipping. Notice how Infibeam fares better when shipping charges are considered than before — this is because Infibeam’s minimum order value (INR 250) as well as shipping charges (INR 30) are well below Flipkart and the others.
Flipkart is probably still selling books at a rapid rate (http://www.facebook.com/photo.php?v=988881816109), but if you are buying a bestseller or a new release, you are better off doing so on either Amazon or HomeShop18. About 70% of Flipkart’s popular books (across categories) are priced below their minimum order value for free shipping. The figures are similar for BookAdda and Crossword.
Amazon and HomeShop18 (along with eBay) do not charge for shipping as of now for orders of any value. Of course, free shipping does apply if you buy multiple books at a time on Flipkart and other stores. But till the time Amazon and others introduce shipping charges, we are likely to see a dip in Flipkart and Infibeam’s book sales.
DataWeave, AppWeave, PriceWeave
Since the last blog, a bunch of things have kept us busy: DataWeave’s new website, some new data APIs (for instance, http://www.dataweave.in/apis/usage/21/Telecom-Data-API), and, of course, the revamped PriceWeave with a lot of new features.
A special shout out to Shubhodeep and Abhishek for bringing our new website up and about. And a special shout out to Vikranth and Murthy for the new PriceWeave. Don’t forget to visit our team page and put faces to our names.
DataWeave is in the business of aggregating unstructured and noisy data from a large number of sources on the web and presenting it users in readily usable forms. On top of our datasets, we build APIs and visualizations; we build dashboard and alerting systems; and we build reporting and analytics.
Recently, we also launched AppWeave, where we showcase mobile apps developed on top of DataWeave’s APIs. We want developers to use our API (it’s completely free for non-commercial usage!) and build interesting and useful apps powered by data. Get your API key today and build away! And please let us know (email@example.com) once you build your app—we would like to publicize it and showcase it on AppWeave.
DataWeave is the larger umbrella under which all our activities happen. It is our playground—-a lab of sorts where we experiment with lots of ideas, play with humongous amounts of data, and build things. The objective is to choose the ideas that make most sense, the features that are most useful, package these together, and build a complete product. So, from time to time, a product graduates from DataWeave.
PriceWeave is our first major product built in this manner. It is targeted for the retail vertical. PriceWeave provides competitive intelligence for eRetailers and brands. It comes with features such as: pricing opportunities, assortment intelligence, gaps in the product catalog, promotions and offers, new launches, and custom reports and analytics.
Please send us your feedback (firstname.lastname@example.org)