<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Wikipedia on Saleem Ansari</title>
    <link>/tags/wikipedia/</link>
    <description>Recent content in Wikipedia on Saleem Ansari</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <copyright>(c) 2024 Saleem Ansari</copyright>
    <lastBuildDate>Mon, 03 Feb 2014 00:00:00 +0000</lastBuildDate>
    <atom:link href="/tags/wikipedia/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A simple Scala parser to parse 44GB Wikipedia XML Dump</title>
      <link>/2014/02/03/a-simple-scala-parser-to-parse-44gb-wikipedia-xml-dump/</link>
      <pubDate>Mon, 03 Feb 2014 00:00:00 +0000</pubDate>
      <guid>/2014/02/03/a-simple-scala-parser-to-parse-44gb-wikipedia-xml-dump/</guid>
      <description>I had to parse a Wikipedia XML Dump ( 44GB XML file uncompressed ). The XML dump is available here, and I have also created a smaller sample file to run this code: sample wiki.xml file.&#xA;Below is the XML event based parser using Scala&amp;rsquo;s XMLEventReader:&#xA;package xml import scala.io.Source import scala.xml.pull._ import scala.collection.mutable.ArrayBuffer import java.io.File import java.io.FileOutputStream import scala.xml.XML object wikipedia extends App { val xmlFile = args(0) val outputLocation = new File(args(1)) val xml = new XMLEventReader(Source.</description>
    </item>
  </channel>
</rss>
